The awk program is more powerful than even sed. The language for awk is based on the C programming language. The basic usage is:
awk 'program' filenames..
but the program is a series of patterns and actions to take on lines matching those patterns.
pattern { action }
pattern { action }
Here we will touch on some simple uses.
The awk program splits each line into fields, that is, strings of non-blank characters separated by blanks or tabs. The fields are called $1, $2, ..., $NF. The variable $0 represents the whole line.
[alice@localhost alice]$ who aswapna pts/0 Dec 2 03:41 (jade.boisestate.edu) jlowe pts/1 Nov 26 22:34 (24-116-128-35.cpe.cableone.net) jcollins pts/2 Nov 29 11:30 (eas-joshcollins.boisestate.edu) tcole pts/6 Dec 1 11:01 (meteor.boisestate.edu) ckrossch pts/10 Nov 29 07:35 (masquerade.micron.com) drau pts/11 Nov 30 17:53 (sys-243-163-254.nat.pal.hp.com) tcole pts/12 Dec 1 12:15 (meteor.boisestate.edu) alice pts/13 Dec 2 04:08 (kohinoor.boisestate.edu) aswapna pts/15 Dec 1 23:07 (jade.boisestate.edu) alice pts/8 Nov 29 17:38 (kohinoor.boisestate.edu) cwaite pts/4 Nov 16 07:27 (masquerade.micron.com) cwaite pts/7 Nov 16 07:30 (masquerade.micron.com) alice pts/20 Dec 1 15:34 (208--714-14694.boisestate.edu) alex pts/21 Nov 16 11:10 (144--650-3036.boisestate.edu) yghamdi pts/32 Nov 30 22:43 (24-117-243-152.cpe.cableone.net) jhanes pts/22 Nov 18 19:41 twhitchu pts/26 Nov 28 16:17 cwaite pts/29 Nov 30 09:14 (masquerade.micron.com) kchriste pts/18 Dec 1 16:23 (sys-243-163-254.nat.pal.hp.com) njulakan pts/15 Dec 1 22:22 [alice@localhost alice]$
Let's say we are interested in the name and time of login only. We can select the first and fifth column using awk.
who | awk '{print $1, $5}'
jlowe 22:34 jcollins 11:30 tcole 11:01 ckrossch 07:35 drau 17:53 tcole 12:15 alice 04:08 alice 17:38 cwaite 07:27 cwaite 07:30 alice 15:34 alex 11:10 yghamdi 22:43 jhanes 19:41 twhitchu 16:17 cwaite 09:14 kchriste 16:23 njulakan 22:22 [alice@localhost alice]$
Now suppose we want to sort them by the time of login. We can do that with the command.
who | awk '{print $5, $1}' | sort
[alice@localhost alice]$ who | awk '{print $5, $1}' | sort 04:08 alice 07:27 cwaite 07:30 cwaite 07:35 ckrossch 09:14 cwaite 11:01 tcole 11:10 alex 11:30 jcollins 12:15 tcole 15:34 alice 16:17 twhitchu 16:23 kchriste 17:38 alice 17:53 drau 19:41 jhanes 22:22 njulakan 22:34 jlowe 22:43 yghamdi [alice@localhost alice]$
cat /etc/passwd | awk -F: '{print $1}' | sort
awk '{print NR, $0}'
to add line numbers to any input stream.
awk '{printf "%4d\t%s\n", NR, $0}'
cat data | awk 'NF%2 != 0'
cat data | awk 'length($0) > 72'
cat data | awk 'length($0) > 72 {print "Line", NR, "too long:" substr($0,1,60)}'
The following example computes the sum and average of the first column in the input.
awk '{s = s + $1}\ END {print s, s/NR}'
Here is an example of using the above construct.
[alice@localhost doublyLinkedLists]: wc -l *.c *.h 26 Job.c 133 List.c 42 main.c 20 Node.c 131 TestList.c 12 common.h 27 Job.h 46 List.h 23 Node.h 460 total [alice@localhost doublyLinkedLists]: wc -l *.c *.h | awk '{print $1}' 26 133 42 20 131 12 27 46 23 460 [alice@localhost doublyLinkedLists]: wc -l *.c *.h | awk '{print $1}' | awk '{s = s+$1}\ > END {print s, s/NR}' 920 92 [alice@localhost doublyLinkedLists]:
awk has arrays, full programming language statements and much more. Please see the book on AWK [2] to learn more.