AWK is a powerful text-processing tool in Unix/Linux used for pattern scanning and processing. It is widely used to manipulate data in files and perform tasks like searching, filtering, and transforming text-based data.
AWK operates line by line, where each line is split into fields (based on a delimiter, usually spaces or tabs), and actions are performed on those fields based on specific patterns.
Basic Syntax of AWK Command:
awk 'pattern { action }' file_name
- pattern: Defines the condition to match (can be a regular expression or a specific condition).
- action: Defines what to do when the pattern is matched (typically applies to the fields in the input text).
- file_name: The name of the file to be processed (or standard input if no file is specified).
AWK treats each line in the input file as a record and splits it into fields (usually space-separated). The default variable NF
represents the number of fields in a line, and $1
, $2
, …, $NF
represent the respective fields in a line.
Examples of AWK Commands:
1. Print All Lines of a File
awk '{ print }' file.txt
- This command prints all lines from the file
file.txt
.
2. Print Specific Fields (Columns)
If you have a text file with columns, you can print specific columns:
awk '{ print $1, $3 }' file.txt
- This prints the 1st and 3rd fields (columns) of each line from
file.txt
.
3. Print Lines Matching a Pattern
You can use AWK to print lines that match a certain pattern. For example, printing lines containing “apple”:
awk '/apple/ { print }' file.txt
- This prints all lines from
file.txt
that contain the word “apple”.
4. Print Lines Where the 3rd Field is Greater Than a Value
For numeric comparison, you can check conditions on fields:
awk '$3 > 50 { print $1, $3 }' file.txt
- This command prints the 1st and 3rd fields of lines where the 3rd field is greater than 50.
5. Sum of a Specific Field
You can calculate the sum of a particular field (e.g., summing values in the 2nd column):
awk '{ sum += $2 } END { print sum }' file.txt
- This sums all values in the 2nd column and prints the result.
6. Print Line Number with Each Line
You can print the line number along with each line using the NR
built-in variable:
awk '{ print NR, $0 }' file.txt
- This prints the line number (
NR
) and the entire line ($0
).
7. Using Field Separator (FS)
You can specify a custom field separator using the -F
option or the FS
variable. For example, if fields in a file are separated by commas (CSV format):
awk -F, '{ print $1, $2 }' file.txt
- This command uses a comma as the field separator and prints the 1st and 2nd fields.
Alternatively, you can set the FS
variable directly:
awk 'BEGIN { FS = "," } { print $1, $2 }' file.txt
8. Print Lines Based on Multiple Conditions
You can combine multiple conditions with logical operators like &&
(AND) and ||
(OR):
awk '$1 == "John" && $3 > 30 { print $1, $3 }' file.txt
- This command prints the 1st and 3rd fields where the 1st field is “John” and the 3rd field is greater than 30.
9. Print Lines Where a Field is Equal to a Specific Value
For string comparison, use the ==
operator. For example, printing lines where the 2nd field is “Manager”:
awk '$2 == "Manager" { print $1, $2 }' file.txt
- This command prints the 1st and 2nd fields where the 2nd field is “Manager”.
10. Format Output
You can use printf
for formatted output:
awk '{ printf "Name: %-10s Age: %-5s\n", $1, $2 }' file.txt
- This formats the output such that the name is left-justified with 10 characters and the age is left-justified with 5 characters.
AWK Built-in Variables:
$0
: Represents the entire current line.$1
,$2
, …,$NF
: Represent individual fields (columns) in the current line.$NF
refers to the last field.NR
: Represents the current record (line) number.NF
: Represents the number of fields in the current line.FS
: The field separator (default is space or tab).OFS
: Output field separator (default is a space).ORS
: Output record separator (default is a newline).RS
: Input record separator (default is a newline).
Advanced Example: Using BEGIN and END Blocks
AWK allows you to run code before processing (in the BEGIN
block) and after processing (in the END
block).
Example:
awk 'BEGIN { print "Start of File" } { print $1, $2 } END { print "End of File" }' file.txt
- The
BEGIN
block runs before any data processing, and theEND
block runs after processing all lines.
Conclusion
AWK is a powerful tool for text and file processing in Unix/Linux. With its ability to filter, format, and manipulate data, it is an essential tool for system administrators, developers, and data analysts. The examples provided here are just a small subset of what AWK can do, and there are many more advanced features like custom functions, regular expressions, and complex scripts.