Awk Tips
- This example demonstrates the simplest general form of an Awk program:Syntax: awk <search pattern> {<program actions>}
awk '/gold/' coins.txtawk '/gold/ {print $5,$6,$7,$8}' coins.txt
awk '{if ($3 < 1980) print $3, " ",$5,$6,$7,$8}' coins.txt
The next example prints out how many coins
are in the collection:
awk 'END {print NR,"coins"}' coins.txt
Suppose the current price of gold is $425,
and I want to figure out the approximate total value of the gold pieces in the
coin collection. I invoke Awk as follows:
awk '/gold/ {ounces += $2} END {print "value = $" 425*ounces}' coins.txt
awk -f <awk program file name>Given an ability to write an Awk program in this way, then what should a "master" "coins.txt" analysis program do? Here's one possible output:
Summary Data for Coin Collection: Gold pieces: nn Weight of gold pieces: nn.nn Value of gold pieces: n,nnn.nn Silver pieces: nn Weight of silver pieces: nn.nn Value of silver pieces: n,nnn.nn Total number of pieces: nn Value of collection: n,nnn.nnThe following Awk program generates this information:
# This is an awk program that summarizes a coin collection. # /gold/ { num_gold++; wt_gold += $2 } # Get weight of gold. /silver/ { num_silver++; wt_silver += $2 } # Get weight of silver. END { val_gold = 485 * wt_gold; # Compute value of gold. val_silver = 16 * wt_silver; # Compute value of silver. total = val_gold + val_silver; print "Summary data for coin collection:"; # Print results. printf ("\n"); printf (" Gold pieces: %2d\n", num_gold); printf (" Weight of gold pieces: %5.2f\n", wt_gold); printf (" Value of gold pieces: %7.2f\n",val_gold); printf ("\n"); printf (" Silver pieces: %2d\n", num_silver); printf (" Weight of silver pieces: %5.2f\n", wt_silver); printf (" Value of silver pieces: %7.2f\n",val_silver); printf ("\n"); printf (" Total number of pieces: %2d\n", NR); printf (" Value of collection: %7.2f\n", total); }This program has a few interesting features:
printf("<format_code>",<parameters>)
There is one format code for each of the parameters in the list. Each format code determines how its corresponding parameter will be printed. For example, the format code "%2d" tells Awk to print a two-digit integer number, and the format code "%7.2f" tells Awk to print a seven-digit floating-point number, with two digits to the right of the decimal point.
Note also that, in this example, each string printed by "printf" ends with a "\n", which is a code for a "newline" (ASCII line-feed code). Unlike the "print" statement, which automatically advances the output to the next line when it prints a line, "printf" does not automatically advance the output, and by default the next output statement will append its output to the same line. A newline forces the output to skip to the next line.
* Awk is invoked as follows:
awk [ -F<ch> ] {pgm} | { -f <pgm_file> } [ <vars> ] [ - | <data_file> ]-- where:
ch: Field-separator character. pgm: Awk command-line program. pgm file: File containing an Awk program. vars: Awk variable initializations. data file: Input data file.An Awk program has the general form:
BEGIN {<initializations>} <search pattern 1> {<program actions>} <search pattern 2> {<program actions>} ... END
Search Patterns
The simplest kind search pattern that can
be specified is a simple string, enclosed in forward-slashes ("/"). For example:
/The/ /^The/ - beginning of the line /The$/ - ends with "The" /\$/ - to search "$" /[Tt]he/ /(^Germany)|(^Netherlands)/ - OR /wh./ - wild card $1 ~ /^France$/ - first field is "France"
NR is, as explained in the overview, a count of the lines searched by AwkFor example:
Variable declaration: var == 0
Awk's built-in variables include the field variables -- $1, $2, $3, and so on ($0 is the entire line) -- that give the text or values in the individual text fields in a line, and a number of variables with specific functions:
* Awk also permits the use of arrays. The
naming convention is the same as it is for variables, and, as with variables,
the array does not have to be declared. Awk arrays can only have one dimension;
the first index is 1. Array elements are identified by an index, contained in
square brackets. For example:
some_array[1], some_array[2], some_array[3] ...One interesting feature of Awk arrays is that the indexes can also be strings, which allows them to be used as a sort of "associative memory". For example, an array could be used to tally the money your friends owe you, as follows:
debts["Kimmie"], debts["Michael"], debts["Hugh"] ...
There are several predefined arithmetic functions:
length() Length sqrt() Square root. log() Base-e log. exp() Power of e. int() Integer part of argument.
{print length, $0}
Awk, not surprisingly, includes a set of string-processing operations:
substr() As mentioned, extracts a substring from a string. substr(<string>,<start of substring>,<max length of substring>) split() Splits a string into its elements and stores them in an array. split(<string>,<array>,[<field separator>]) index() Finds the starting point of a substring within a string. index(<target string>,<search string>)
Awk supports control structures similar to those used in C, including:
if ... else while for