Michael O.
3min Read

Linux Basics – awk

Linux Basics - awk

It’s not a noise, it’s a command!

The awk command is one of those strange commands, much like the extinct bird with the same name. You either love it or you hate it. Either way, it is an impressive command and what many do not realize is that it is also a complete, mini programming language designed for processing text.

The strange name comes from the first letters of the last names of the authors – Alfred Aho, Peter Weinberger, and Brian Kernighan. The awk command is included by default in most modern versions of Linux and is a powerful tool when it comes to extracting text fields from sources such as log files.

Used correctly, it can also save many unneeded iterations when processing text by using the awk built-in functions and loops.


The awk program flow

awk

Read

awk reads a line from the input stream (file, pipe, or stdin) and stores it in memory.

Execute

awk commands are applied sequentially on the input. By default, awk executes commands on every line. We can restrict this by providing patterns.

Repeat

This process repeats until the file reaches its end.

The awk Syntax

The basic syntax of awk is:

awk '/search_pattern/ { action_to_take_on_match; another_action; }' file_to_parse

Let us work with our sample.txt file again which contains the following lines:

~$ cat sample.txt 
The quick
brown
fox
jumped over
the lazy
dog
.

In its simplest form, it behaves much like grep except with a slightly different syntax. The awk command also assumes each space is a column separator.

Thus:

~$ awk '/the/' sample.txt 
the lazy
# THUS
# COL 1  |  COL 2
  the    |  lazy
# PRINT COLUMN 2
~$ awk '/the/ {print $2}' sample.txt 
lazy
# PRINT ALL LINES WITH 2 OR MORE COLUMNS
~$ awk '$2' sample.txt 
The quick
jumps over
the lazy
# NOW WE ADD A FULL LINE "The quick brown fox jumps over the lazy dog." TO THE END OF sample.txt
~$ awk '$2' sample.txt 
The quick
jumps over
the lazy
The quick brown fox jumps over the lazy dog.
# MATCH ONLY LINES WHERE THE 2ND COLUMN STARTS WITH THE LETTER q
~$ awk '$2 ~ /^q/' sample.txt 
The quick
The quick brown fox jumps over the lazy dog.

Now we will use awk to print only the 2nd column of each line. This results in blank lines where lines have only a single column:

awk '{print $2}' sample.txt 
quick


over
lazy



quick

To remove the blank lines, we can filter them out with grep or sed, but let us use one of the previous awk examples for this:

~$ awk '$2' sample.txt |awk '{print $2}'
quick
over
lazy
quick

In closing

As you can see, there is much you can do with the awk command. The full scope of this command will not be discussed here as there are many very good online guides to help you learn this text processing language.

It is a valuable tool in any Linux Administrator or Engineers toolbox and makes life much easier when it comes to large-scale automated processing of masses of text to extract that bit of data you need in exactly the correct place or format.

Happy Hosting!


The Author

Michael O.

Michael is the founder, managing director, and CEO of HOSTAFRICA. He studied at Friedrich Schiller University Jena and was inspired by Cape Town's beauty to bring his German expertise to Africa. Before HOSTAFRICA, Michael was the Managing Director of Deutsche Börse Cloud Exchange AG, one of Germany's largest virtual server providers.

More posts from Michael