[HN Gopher] Understanding Awk
       ___________________________________________________________________
        
       Understanding Awk
        
       Author : todsacerdoti
       Score  : 332 points
       Date   : 2021-09-30 15:27 UTC (7 hours ago)
        
 (HTM) web link (earthly.dev)
 (TXT) w3m dump (earthly.dev)
        
       | adamgordonbell wrote:
       | Thanks for sharing this. I'm the author.
       | 
       | When I wrote my introduction to JQ someone mentioned JQ was
       | tricky but super-useful like AWK. I nodded along with this, but
       | actually, I had no idea how Awk worked.
       | 
       | So I learned how it worked and wrote this up. It is a bit long,
       | but if you don't know Awk that well, or at all, I think it should
       | get the basics across to you by going step by step through
       | examining the book reviews for The Hunger Games trilogy.
       | 
       | Let me know what you think. And also let me know if you have any
       | interesting Awk one-liners to share.
        
         | choffman wrote:
         | I really appreciate you writing this guide. As a long time
         | Linux user, I've always wanted to learn AWK, but it seemed too
         | daunting. Three minutes into your guide and I immediately saw
         | how I could use it in my day-to-day usage.
        
           | adamgordonbell wrote:
           | Thank you! It took me longer to write then I expected it
           | would. I was originally just going to do some small examples
           | of each idea.
           | 
           | But once I got the idea of aggregating the book review data
           | from amazon I felt I had to see it through.
        
         | foobarian wrote:
         | The funny thing is, by and large my only use case for awk is to
         | print out whitespace delimited columns where the amount of
         | whitespace is variable. Surprisingly hard to do with other Unix
         | tools.
         | 
         | Neat discussions around that sort of thing at least here:
         | https://news.ycombinator.com/item?id=23427479
        
           | goohle wrote:
           | ls -l | tr -s ' ' | cut -d ' ' -f 5
        
             | foobarian wrote:
             | Exactly! Exactly! And now fix it to work with tabs :-)
        
               | tyingq wrote:
               | And leading whitespace. Compare:                 $ printf
               | " one two  three"  | tr -s ' ' | cut -d ' ' -f 1
               | $ printf " one two  three"  | awk '{print $1}'       one
        
               | goohle wrote:
               | ps ax | sed 's/^\s\+//; s/\s\+/ /g;' | cut -d ' ' -f 4
        
               | goohle wrote:
               | echo -e '1\t2\t3\t4\t5' | expand -t 1 | cut -d ' ' -f 3
        
           | tyingq wrote:
           | The syntax isn't nearly as nice, but Perl can be handy if
           | you're doing something more after splitting into columns. And
           | it's usually already there / installed, like awk. For just
           | columns:                 $ printf "a b  c d   e\n1 2  3 4 5"
           | | perl -lanE 'say "$F[2] $F[4]"'       c e       3 5
        
             | adamgordonbell wrote:
             | It surprized me that AWK had dictionaries and no
             | declaration of vars that make it feel like a modern
             | scripting langauge even though it was written in the 70s.
             | 
             | It turns out though that this is because Perl and later
             | Ruby were inspired by AWK and even support these line by
             | line processing idioms with BEGIN and END sa well.
             | ruby -n -a -e 'puts "#{$F[0] $F[1]}"'              ruby -ne
             | '         BEGIN { $words = Hash.new(0) }
             | $_.split(/[^a-zA-Z]+/).each { |word|
             | $words[word.downcase] += 1 }              END {
             | ...
        
           | flandish wrote:
           | A long while ago I wrote up a little processor to determine
           | field lengths in a given file - I forgot the original reason.
           | ( https://github.com/sullivant/csvinfo )
           | 
           | However, I feel I really should have taken the time to learn
           | Awk better as it could probably be done there, and simply!
           | (It was a good excuse to tinker with rust, but that's an
           | aside.)
        
             | tyingq wrote:
             | For some idea, a one liner to find the (last) longest
             | username and length in /etc/passwd:                 $ awk
             | -F: '{len=length($1);if(len>max){max=len;user=$1}}END{print
             | user,max}' /etc/passwd
        
               | flandish wrote:
               | Thanks for that reply! It's good to work with an example.
        
             | genewitch wrote:
             | I'll mark this on my GitHub when I get back on a computer,
             | I take public datasets and make graphs and transforms and
             | reports. The big survey companies have weird data records
             | and having to write a parser is my least favorite part. I
             | think other people who ingest my content don't appreciate
             | the effort, but that's a near universal feeling I think,
             | heh.
        
           | adamgordonbell wrote:
           | choose from your link does look nice for simple column
           | selection.                  echo -e "foo   bar   baz" |
           | choose -1 -2
           | 
           | vs awks                  echo -e "foo   bar   baz" | awk '{
           | print $2, $3}'
           | 
           | I love the effort people are putting into reinventing the
           | core unix tools.
           | 
           | I think I'll stick with Awk for now though.
        
             | foobarian wrote:
             | The problem with new tools is
             | 
             | $ choose
             | 
             | bash: choose: command not found...
        
           | twic wrote:
           | If i don't use awk, i throw tr -s ' ' into the pipeline, and
           | then the delimiter is a single space, so you can just cut.
        
         | kevinwang wrote:
         | As someone who's never used awk before, I really enjoyed this
         | write-up and I think it was very well written!
        
           | mousepilot wrote:
           | chiming in, I had a feeling that the article and the comments
           | here would contain some jewels and both have exceeded
           | expectations.
        
       | nrclark wrote:
       | I'm always happy when I see posts that promote AWK. It's a very
       | underappreciated tool in my opinion. I was a Linux user for 20
       | years before I got familiar with it. AWK is super powerful for
       | text processing, and I like that it's included in Busybox for use
       | on the embedded systems that I design.
       | 
       | For any complex text processing, it's way better and more robust
       | than having a super long pipeline of a bunch of sed/grep.
       | 
       | Most recently, I used awk in a script that parses /proc/mount to
       | grab the mountpoint of a partition, or print something different
       | if the partition isn't mounted. Doable with a bunch of sed/grep
       | and some shell logic? Definitely. But easier and cleaner in AWK,
       | and equally easy to inline in a shell-script.
        
         | throwaway894345 wrote:
         | I do a lot of work with structured data--json, yaml, etc. For
         | me, this is how I feel about jq. One of my favorite use-cases
         | is querying Kubernetes resources. E.g., `kubectl get secret
         | <secret-name> -o json | jq -r '.data | map_values(@base64d)'`
         | (fetch a secret and decode all of its values).
        
         | kbenson wrote:
         | I've never bothered to learn much AWK, but that's mostly
         | because Perl is my bread and butter language and has been for
         | 20 years, and focusing on knowledge of that seemed a better
         | investment (especially since with a few judicious flags, Perl
         | is a passable AWK replacement even for very small one liners).
         | 
         | That said, if you just want to supplement your knowledge of
         | other shell tools and pull out something that can do some
         | obvious text munging, AWK has always looked attractive for the
         | task to me.
        
           | chasil wrote:
           | The problem is that awk is in POSIX, and perl is not.
           | 
           | There are two common sources of awk for Windows, for example,
           | that drop one exe to provide the interpreter:
           | 
           | http://unxutils.sourceforge.net/
           | 
           | https://frippery.org/busybox/
           | 
           | Perl simply wasn't designed to do that.
        
             | newaccount2021 wrote:
             | But perl is available by default in almost every free *nix,
             | and for most people, Windows isn't a requirement
        
         | Lio wrote:
         | Yep awk is lovely and well worth the time to learn.
         | 
         | This is probably not important for embedded but doesn't a
         | pipeline of small scripts (which could be in awk) give you
         | better threading support?
         | 
         | Xargs, GNU parallel or even make then scale that out really
         | quickly.
        
         | SavantIdiot wrote:
         | Came here to say this. Glad to see /bin getting respect.
         | 
         | To anyone processing huge quantities of text and text files,
         | someone very likely had the same problem you faced back in the
         | 1980's and there's a Unix/GNU tool for it already.
        
           | dylan604 wrote:
           | I was introduced to *nix from processing very large text
           | files that text editors I was familiar with choked and died.
           | Someone showed me sed/awk/grep, and it took seconds to
           | process when other GUI editors couldn't open the file. Never
           | looked back.
        
         | GekkePrutser wrote:
         | Not having to parse the output at all is even better. I really
         | like the way Powershell can pass structured data like this.
         | 
         | I'm a huge Linux/Unix fan but sometimes a rethink really works
         | out. I hope Linux will get something similar. I know Powershell
         | is available for Linux but without an adapted userland there's
         | not much benefit
        
         | invisible wrote:
         | For some purposes, awk+xargs can replace hours of work to write
         | a tool to automate some process. It's my go-to for ops work
         | that I don't expect to live very long and just needs to
         | _happen_.
         | 
         | Also, happy 1337 karma day :).
        
           | 5e92cb50239222b wrote:
           | > awk+xargs can replace hours of work
           | 
           | Including machine hours of work.
           | 
           | Wasn't there a famous story of replacing a Hadoop cluster
           | with an awk script (which was a couple orders of magnitude
           | faster)?
           | 
           | Oh yes, there was:
           | https://news.ycombinator.com/item?id=17135841
        
             | dapids wrote:
             | In fairness it's xargs that is providing the command
             | parallelization, not awk, but I agree both combined are a
             | good match.
        
             | genewitch wrote:
             | If one considers the idea of map reduce to be taking a set
             | of data and ending up with a subset that is relevant, I've
             | used tons of simple things to do that, and never Hadoop.
             | 
             | I think parsing logs to find pain areas or potential
             | exploit/exfil is a map reduce job, for instance, and grep
             | or awk can manage that just fine.
        
       | freedomben wrote:
       | Nice article. Seems we went through a very similar progression!
       | :-D
       | 
       | If anyone is interested in learning more, I built a conference
       | talk to teach awk, and a set of exercises also that has gotten
       | pretty positive feedback:
       | 
       | Presentation: https://youtu.be/43BNFcOdBlY
       | 
       | Exercises (for you to try): https://github.com/FreedomBen/awk-
       | hack-the-planet
       | 
       | Exercises (me solving): https://youtu.be/4UGLsRYDfo8
        
       | stevebmark wrote:
       | There are things I've come to dislike and avoid when programming
       | in general:
       | 
       | - Avoid programming in strings (especially in Bash, where nested
       | quotes are full of pitfalls)
       | 
       | - Avoid magic switches that change behavior (like -F)
       | 
       | - Avoid terse or cryptic variable names (like $NF)
       | 
       | - Avoid terse and magical syntax (sorry Perl, happy to leave you
       | behind me)
       | 
       | - Avoid programs that are hard to read
       | 
       | - Avoid programs that are difficult to debug while writing them
       | 
       | - Avoid programs that ignore types
       | 
       | For these reasons, I prefer to avoid awk for anything except the
       | most trivial of tasks. I think the prevalence of scripting
       | languages and the speed of execution and debugging today has made
       | awk not as necessary as it may have been in the 70s. And as to
       | the first point, I'm aware you can write awk scripts in files,
       | and I feel like if your script has gotten complex enough that you
       | need a file, you're creating something unmaintainable and
       | unreadable that would be better suited in a different programming
       | language.
       | 
       | Edit: I should add this article is great and a good introduction
       | to awk, regardless of my personal taste for the tool.
        
         | throwaway38941 wrote:
         | I've been doing systems work for 20 years. Here's why most of
         | those things are actually good:
         | 
         | - Strings are subtly complex, but strings are not variables.
         | You can assign a string, and later handle it as a variable, and
         | not deal with any of the specifics of string-iness. Likewise,
         | you can take a variable, and later treat it as a string (for
         | loosely or not-typed variables).
         | 
         | - Magic switches are not magic, they are options. Virtually
         | every program takes options. Sometimes they impact a lot of
         | things, sometimes a little. Only the context determines how
         | much is "too much".
         | 
         | - Terse/cryptic variables allow you to write complex
         | expressions in a compact form. This allows you to read more in
         | a small space, making it easier to reason about or form complex
         | expressions. Human languages are flush with these, as is
         | mathematics. But you have to balance the terse, cryptic and
         | magical with guilelessness, or it becomes a mess.
         | 
         | - Terse and magical syntax is, again, a feature, not a bug.
         | Using magical syntax I can do in a few characters what would
         | take me many lines with a traditional language, and as we all
         | know, increased number of lines correlates to bugs, in addition
         | to simply making it harder to grok.
         | 
         | - Types aren't ignored, but they may be very loosely enforced.
         | If you want to write a quick program to get something done,
         | typing is a curse. If you want to write a very thorough
         | program, typing is a blessing. In many cases, loosely or
         | untyped programs actually work _better_ than their typed
         | cousins, because they allow for more unexpected behaviors
         | without failing. Failing early and often may be a modern trend,
         | but... it literally means things fail more, and this is often
         | not desirable.
         | 
         | Caveats:
         | 
         | - Programs that are hard to read do indeed suck, and it takes
         | lots of experience to make some kinds of programs easier to
         | read. But that's not an indictment of the program, it's an
         | indictment of the person who wrote it. We don't indict English
         | when somebody writes a document that's impossible to
         | comprehend.
         | 
         | - Interestingly, some of the more popular languages are the
         | worst to debug. Perl is probably one of the easiest languages
         | to debug, not inconsequently because of how good the
         | interpreter is at suggesting to the user what the actual
         | problem was and almost exactly how to fix it.
        
         | [deleted]
        
         | jrumbut wrote:
         | The thing that prevents awk from being a major part of my daily
         | routine is that it (amazingly) has poor CSV support. Consider
         | the following:
         | 
         | col1,col2,col3
         | 
         | 1,2,3
         | 
         | 4,"hello, \"world\"",6
         | 
         | "7 buckets",,9
         | 
         | To get the usual awk experience with this very common file
         | format, exactly the type of thing you want to parse with awk,
         | you first need to install gawk, then use a big FPAT regex that
         | needs to be adjusted for any new CSV variant.
         | 
         | I would love to see awk with "CSV mode", where it intelligently
         | handles formats like this if you just pass a flag. I think awk
         | would do well to differentiate itself with excellent 2d dataset
         | parsing functionality, but at least catchup up to the average
         | scripting language would be great.
         | 
         | I'm half expecting someone to say "just pass -csv it does what
         | you want" and if so I'll be very excited.
        
           | nmz wrote:
           | You can just use https://github.com/Nomarian/Awk-
           | Batteries/blob/master/Units/... and use as so
           | awk -f ./ucsv.awk -e '{print $5}'
           | 
           | Also this
           | 
           | > 4,"hello, \"world\"",6
           | 
           | Is incorrect per https://tools.ietf.org/html/rfc4180 so you
           | should just fix it with a sed -i 's/\\\"/""/g' and then just
           | parse as normal.
           | 
           | https://github.com/Nomarian/Awk-Batteries/wiki/Formats
        
           | sk5t wrote:
           | 'miller' and 'xsv' are pretty good tools for wrangling CSV.
           | (And regexp is kind of a terrible tool for it, too many edge
           | cases.)
        
             | jrumbut wrote:
             | Yeah, I don't want to have to write a CSV library each
             | time, that's what I'm trying to get at.
             | 
             | I just end up using Python/Perl but I do have a soft spot
             | for awk so it would be cool if good support was built-in.
        
               | sk5t wrote:
               | Who's writing a library? Just use xsv or miller to
               | extract the bits you want from the CSV, change the
               | delimiter or escapes to something more convenient, etc.,
               | then feed that to awk or other CSV-unaware text
               | processors.
        
               | jrumbut wrote:
               | I was agreeing with your point about regexes, that it's
               | good to avoid trying to deal with all the corner cases
               | yourself when you're just trying to write a small script.
        
               | sk5t wrote:
               | Ah, understood! CSV is funny, it seems like a more
               | trivial thing than it really is, and its human
               | readability sort of invites broken approaches in a way
               | that something like Parquet would not.
               | 
               | XML is somewhere in the middle--I've seen some horrible
               | abuses of CDATA sections way back when--but at least
               | there are accepted ways to prove what's invalid.
        
           | nickcw wrote:
           | There is an answer to CSV mode a bit further down the page
           | 
           | https://news.ycombinator.com/item?id=28708145
           | 
           | ...but if your files are CSV, there is a CSV extension for
           | gawk                   @include "csv"         BEGIN { CSVMODE
           | = 1 }
        
             | jrumbut wrote:
             | Well there you go, for the sake of my pride at least it's
             | an extension.
             | 
             | It's funny searches for awk CSV seem to yield a bunch of SO
             | questions where the answers are increasingly cumbersome
             | regexes instead of this extension.
             | 
             | Of course, you can't count of this extension being widely
             | installed, but it's great for my own desktop.
        
               | nmz wrote:
               | that's because the extension only works in gawk. its not
               | portable anywhere else.
        
         | [deleted]
        
         | m463 wrote:
         | I use awk for one-liners, no more.
         | 
         | Looking at my command history, I mostly use awk to extract a
         | field like this:                  <something> | awk '{print
         | $3}'
         | 
         | (I know "cut" is supposed to do the same thing, but it was
         | never reliable for me - maybe tabs/spaces?)
        
           | likpok wrote:
           | Consider the input a b
           | 
           | Awk will treat it as having two columns (by default), while
           | cut will treat each space as it's own column.
           | 
           | Awk is also a little nicer for whitespace; cut makes
           | specifying the delimiter (with say "-d\ ") a little more
           | vexing.
        
           | chasil wrote:
           | Here is a GAWK program of mine that implements outgoing SMTP.
           | While not a one-liner, this is much shorter and less tedious
           | than trying to do it in C.                   $ cat
           | /bin/awkmail         #!/bin/gawk -f              BEGIN {
           | smtp="/inet/tcp/0/smtp.yourco.com/25";         ORS="\r\n";
           | r=ARGV[1]; s=ARGV[2]; sbj=ARGV[3]; # /bin/awkmail to from
           | subj < in              print "helo " ENVIRON["HOSTNAME"]
           | |& smtp;         smtp |& getline j; print j         print
           | "mail from: " s                    |& smtp;  smtp |& getline
           | j; print j         if(match(r, ","))         {
           | split(r, z, ",")          for(y in z) { print "rcpt to: "
           | z[y]    |& smtp;  smtp |& getline j; print j }         }
           | else { print "rcpt to: " r               |& smtp;  smtp |&
           | getline j; print j }         print "data"
           | |& smtp;  smtp |& getline j; print j              print
           | "From: " s                         |& smtp;  ARGV[2] = ""   #
           | not a file         print "To: " r
           | |& smtp;  ARGV[1] = ""   # not a file         if(length(sbj))
           | { print "Subject: " sbj  |& smtp;  ARGV[3] = "" } # not a
           | file         print ""                                 |& smtp
           | while(getline > 0) print                 |& smtp
           | print "."                                |& smtp;  smtp |&
           | getline j; print j         print "quit"
           | |& smtp;  smtp |& getline j; print j              close(smtp)
           | } # /inet/protocol/local-port/remote-host/remote-port
        
             | meltedcapacitor wrote:
             | Cheap fix: the space after MAIL FROM: and RCPT TO: is not
             | standard compliant.
        
             | m463 wrote:
             | IMHO, that's too big for awk, why not python?
             | 
             | for example:                   #!/usr/bin/python
             | import smtplib         from email.mime.text import MIMEText
             | msg = 'hi'         subj='read this!'
             | smtp_server='mail.example.com'
             | smtp_from='me@example.com'
             | smtp_to='you@example.com'              m = MIMEText(msg)
             | m['To'] = smtp_to         m['From'] = smtp_from
             | m['Subject'] = subj              s =
             | smtplib.SMTP(smtp_server)         s.sendmail(smtp_from,
             | [smtp_to], m.as_string())         s.quit()
             | 
             | of course, you seem to think in gawk so if that works for
             | you that's what you should continue doing!
             | 
             | by the way, I hacked this example from another script which
             | attached a logfile:                   with
             | open(arg.logfile) as f:             log_contents = f.read()
             | m = MIMEText(log_contents)
             | 
             | you can also use:                   from email.mime.image
             | import MIMEImage         from email.mime.text import
             | MIMEText         from email.mime.multipart import
             | MIMEMultipart
             | 
             | and then:                   m = MIMEMultipart()
             | m.attach(MIMEText('\n\n%s\n\n'%xkcd_img_title))
             | m.attach(MIMEImage(xkcd_img))
        
               | chousuke wrote:
               | Your script doesn't even do the same thing. You are
               | importing a library that implements SMTP, which is
               | missing the point.
               | 
               | The AWK script doesn't need libraries, so it can actually
               | be useful in places where you have awk but not Python.
        
             | jrumbut wrote:
             | That's a beautiful use of the language, it reminds me of
             | some of the awk CGI efforts out there.
             | 
             | For example: https://www.gnu.org/software/gawk/manual/gawki
             | net/html_node/...
        
         | ChuckMcM wrote:
         | I take it you LOVE ada :-)
         | 
         | There is a lot of wisdom in the things you avoid, however I
         | would ask one question, "How often do you use it?"
         | 
         | For me, the best systems are those that can be wordy and
         | prescriptive but as you get to know them you can use more short
         | hand so they "get out of the way" as it were. A good example of
         | that philosophy is keyboard short cuts. When I'm learning a
         | program I'm happy to pause and sling the mouse around to find
         | the thing I need in the labeled menu stack with an appropriate
         | name which also tells me what the keyboard short cut is for
         | that thing. Then as I get better I can just use the short cut
         | and my workflow gets faster. Once I've internalized the keymap
         | my flow is held up by how fast I can think, not by how fast I
         | can take my hand off the keyboard, move the mouse, click and
         | then put it back on the keyboard.
         | 
         | Awk is one of those things that once you internalize what it
         | can do, you can use it for a lot of stuff, and you can do it
         | quickly.
        
       | ketanmaheshwari wrote:
       | One tip I have to make large-ish awk programs readable is to name
       | the columns in the BEGIN section. Then, you'd use $colname
       | instead of $1, $2, etc. for instance:
       | 
       | BEGIN{ item_type = 1; item_name = 2; price = 3; sale = 4; #etc }
       | 
       | Now, in place of $1, you'd say $item_type which significantly
       | improves overall readability of the code.
        
         | jayknight wrote:
         | I've also used this to address columns by name for files with
         | lots of columns that I'm too lazy to count:
         | https://unix.stackexchange.com/a/359699
        
         | dredmorbius wrote:
         | You can also put a similar code block at the start of a general
         | processing entry. This applies on both flat (uniform record)
         | and hierarchical (multiple record-type) data.
         | 
         | E.g.:                 {          name = $1          dob = $2
         | grade = $3          # ...               # Do stuff with name /
         | dob / grade, etc.       }
         | 
         | If the data are structured, so that there are multiple record
         | types (typically defined by prefix or some other regex) you can
         | put variable assignments within each block.
         | /^rectype1/ { var1 = $1; var2 = $2, ... }        /^rectype2/ {
         | varA = $1; varB = $2, ... }
         | 
         | I prefer to leave BEGIN blocks for defining constants or tables
         | and such.
        
         | ulucs wrote:
         | Nice tip, so basically like excel with tables
        
         | dima55 wrote:
         | If you want to do that, use vnlog instead. You're 90% there
         | already.
         | 
         | https://github.com/dkogan/vnlog/
        
       | ufo wrote:
       | One thing that I would love to hear about is suggestions of how
       | to make my files/output more awk-friendly.
        
         | adamgordonbell wrote:
         | This isn't your question but if your files are CSV, there is a
         | CSV extension for gawk                   @include "csv"
         | BEGIN { CSVMODE = 1 }
        
         | tejtm wrote:
         | Tab separated values all the things
        
       | buzzwords wrote:
       | Thanks for this tutorial and everyone else that posted some great
       | tips and links. I find myself needing to use awk once in a blue
       | moon and every time it eats a lot of my time. I hope I remember
       | your tutorial next time I need it.
        
       | jrochkind1 wrote:
       | This is a great model of how to do a tutorial.
        
       | 1vuio0pswjnm7 wrote:
       | Its common as in the OP to see awk recommended for something as
       | simple as extracting a column from tab or space-separated values.
       | IMO, its quite a bit of typing to do on the fly at a command
       | prompt. Performance-wise, it could be significantly slower that
       | other utilities that are equally as ubiquitous as awk.
       | echo one two three|awk '{print $2}'
       | 
       | Are there other ways to do this. Are they faster.
       | cat > awc            #!/bin/sh            test $# -eq 1||exit
       | exec tr \\40 \\11|exec cut -f "$1"|exec tr \\11 \\40         ^D
       | echo one two three|awc 2
       | 
       | Test it on a file to see if it is faster than awk.
       | time awk '{print $2}' file         time awc 2 < file
        
       | fmakunbound wrote:
       | For those kinds of tasks I use Awk to process the data into a
       | SQLite database. Then I do the queries on that since it's easier
       | and more advanced things (grouping, having) are much easier
       | declaratively.
        
         | mongol wrote:
         | Yes! Another recent thread recently discussed best practice and
         | whether something like that exist. I believe this is a good
         | example.
        
       | iefbr14 wrote:
       | It's awksome :)
        
       | bright_day wrote:
       | kkkkkk
        
       | calvinmorrison wrote:
       | Can't recommend the gawk manual enough, and "The awk manual"
       | enough
       | 
       | https://www.gnu.org/software/gawk/manual/gawk.pdf
       | 
       | and
       | 
       | http://www.cs.unibo.it/~sacerdot/doc/awk/nawkA4.pdf
       | 
       | enough
        
         | chasil wrote:
         | The original language specification, written by the authors, is
         | now free online. Chapter 2 covers the whole language in a
         | little over 40 pages.
         | 
         | https://archive.org/download/pdfy-MgN0H1joIoDVoIC7/The_AWK_P...
        
           | calvinmorrison wrote:
           | have a copy on my bookshelf! Didn't have a pdf though nice.
           | 
           | The gawk one is useful if you're into some of the gnuism
           | specifics
        
         | dredmorbius wrote:
         | Severely underrated comment.
         | 
         | Having relied heavily on the (unofficial, non-GNU) gawk manpage
         | extensively (it's quite good), I instantly started learning
         | very useful features reading the GNU docs. (I still need to
         | fully internalise those). Yes, the full manual is very much
         | better than the manpage.
         | 
         | (Also recommend _The AWK Programming Language_ mentioned here,
         | though I 'd suggest the GNU manual adds to that as well.)
        
       | corpMaverick wrote:
       | I find it amusing that AWK is coming back. I used it extensively
       | back on the day, but let it go when I picked up Perl 4 and then
       | Perl 5. So Perl is no longer king for unix scripting. It was
       | replaced by other languages; but it seems like there is a niche
       | that they were not able to fill since AWK is back.
        
       | xphos wrote:
       | This was one of the best awk tutorials I've read its very concise
       | and digestible. I sometimes use awk but the more complex things
       | get the more i feel like i cannot use it. This tutorial made me
       | feel otherwise
        
       | ChuckMcM wrote:
       | And if you learned awk(1) first, then when you saw perl for the
       | first time it immediately made sense to you as a 'super awk'.
        
         | abzug wrote:
         | That happened to me. AWK -> Perl -> Ruby.
        
       | theophrastus wrote:
       | At some point in every bioinformatics lecture i always manage
       | something akin to: "Learn awk! (or perl) You'll need it. Your
       | data will come from various disparate sources, and you need to
       | get them into some well-defined useful format from the get go."
        
       | cafard wrote:
       | Thanks! I had been putting it off, but after looking at the
       | article, I wrote a little but useful script with a line of awk in
       | it.
        
       | naikrovek wrote:
       | so this isn't related to the article so much, but to something
       | the article reminded me about: why do people use /usr/bin/env to
       | find a program rather than setting the PATH within the script to
       | a known-good value then using that to locate things?
       | 
       | the path that /usr/bin/env returns is (essentially) a global
       | variable that can change underneath you, right? I mean that just
       | screams "variable that may be changed by others" to me.
       | 
       | I've never understood why /usr/bin/env exists.
        
         | dredmorbius wrote:
         | Portability.
         | 
         | The /usr/bin/env trick will work on a wide range of systems, in
         | which even common utilities might have numerous locations:
         | /bin, /sbin, /usr/bin, /usr/bin/local, /opt, or others. If
         | you're writing scripts for portability and ohers, this has
         | value.
         | 
         | That said, /usr/bin/env fails on Android/Termux AFAIU.
        
       | dmux wrote:
       | I've never gone further than thinking about it, but I've always
       | been curious as to how simple it would be to use Awk as an
       | interpreter for a really simple Tcl-like language:
       | set a 1         set b 2              define add (n,m) $n + $m
       | set result [add a b]
       | 
       | I think it would be simple enough to come up with some Awk
       | pattern/actions to parse the above and execute the commands.
        
       | Stratoscope wrote:
       | I used to love Awk! I still do, even if I don't use it much any
       | more.
       | 
       | Awk has a reputation for being hard to read (as noted in
       | stevebmark's comment), but when I was using it actively, I tried
       | to treat it as a serious programming language and write readable
       | programs in it.
       | 
       | Several years ago I tracked down a couple of my old Awk programs
       | from around 1990 and posted them here:
       | 
       | https://github.com/geary/awk
       | 
       | SHANEY.AWK is an implementation of the infamous Mark V. Shaney:
       | 
       | https://www.clear.rice.edu/comp200/09fall/textriff/sci_am_pa...
       | 
       | This was probably the first program that made me really impressed
       | with Awk. People were writing rather complicated Shaney
       | implementations in C, and I thought, "this could be really simple
       | in Awk." And it was!
       | 
       | LJPII.AWK is the Awk program I'm most proud of. This was in the
       | days when we had tiny screens and no multiple monitors and you
       | always printed out your code to read it. In my circles we also
       | fond of inserting "separator lines" between functions, in various
       | formats such as this one:                 // - - - - - - - - - -
       | - - - - - - - -
       | 
       | So I wrote LJPII to print source code in "two up" format (two
       | pages side by side in landscape mode) on my LaserJet II. It also
       | converted the separator lines into graphical boxes, and tried to
       | avoid splitting a function across multiple pages. It wasted some
       | paper but made nicely readable printouts.
       | 
       | I wish I still had some of my old printouts, but they are long
       | gone. One of these days I will have to see if I can update the
       | code to work with the LaserJet emulation in my Brother printer!
       | (It should mostly work, but I wrote this in the old Thompson Awk
       | for DOS, so there are a couple of non-standard things in it.)
       | 
       | Looking at the code again, it's amusing to see some old Windows
       | Hungarian notation which was popular/notorious back then, for
       | example an "f" prefix for a boolean (flag) value, and "af" prefix
       | for an array of flags.
       | 
       | Hungarian aside, I tried to make this code as readable as I
       | could.
       | 
       | Random fun fact! Someone who used to be an avid Awk programmer is
       | Will Hearst (William Randolph Hearst III). It's been many years
       | since I talked with him, so no idea if he still does any Awk
       | programming.
        
       | whymarrh wrote:
       | "If you like this you might also like" https://ferd.ca/awk-
       | in-20-minutes.html
       | 
       | I too am happy to see more Awk material in the world, once I
       | learned a bit about it I started reaching for it more and more.
        
       | MisterTea wrote:
       | > _Awk is a record processing tool_
       | 
       | Actually, AWK is a domain specific programming language. When you
       | start treating AWK as such then you can really gain an
       | appreciation for it. I too treated it as a dumb one liner
       | relegated to ingesting cryptic regexp one liners in shell
       | scripts. After reading the original AWK book it completely
       | changed my outlook on the language. I had no idea you could
       | define functions or perform basic math so one could use it for
       | very basic tabular operations such as spread sheets. AWK can even
       | be used as a standalone language outside of shell scrips by
       | writing a program, insert a shebang on the first line calling
       | awk, and mark the file as executable.
        
         | adamgordonbell wrote:
         | shebangs and more complex scripts are covered in the article.
         | 
         | But yes, I agree that the original AWK book is really good.
         | After covering some basics and the language reference, it has
         | some fun projects that you can build with AWK.
        
       | EvanKelly wrote:
       | Lots of great AWK tutorials in here that are more in depth, but
       | I'll share another. I always go back to Brian Kernighan's
       | personal help file:
       | 
       | https://www.cs.princeton.edu/courses/archive/spring19/cos333...
       | 
       | Brian Kernighan has a knack for explaining languages very
       | precisely and elegantly.
        
         | cf100clunk wrote:
         | And for the flash card type of learners it is good to see the
         | "HANDY ONE-LINE SCRIPTS FOR AWK" page is still available. See
         | the links in the Credits section at the bottom for more great
         | reading:
         | 
         | https://www.pement.org/awk/awk1line.txt
         | 
         | That author also edited the "USEFUL ONE-LINE SCRIPTS FOR SED"
         | page:
         | 
         | https://www.pement.org/sed/sed1line.txt
        
       | zabzonk wrote:
       | Well, this is OK I guess. But if you really want to learn Awk you
       | want the book "The AWK Programming Language", mostly written by
       | Brian Kernighan (he's the K in AWK and in K&R), and as usual for
       | all of his books, it's brilliant.
        
       | dang wrote:
       | Significant past threads. I had to leave a ton of submissions
       | out! Any others that are particularly good?
       | 
       |  _Awk: The Power and Promise of a 40-Year-Old Language_ -
       | https://news.ycombinator.com/item?id=28441887 - Sept 2021 (118
       | comments)
       | 
       |  _Awk is the coolest tool you don 't know_ -
       | https://news.ycombinator.com/item?id=27039608 - May 2021 (20
       | comments)
       | 
       |  _CGI with Awk on OpenBSD Httpd (2020)_ -
       | https://news.ycombinator.com/item?id=27037113 - May 2021 (22
       | comments)
       | 
       |  _The State of the Awk_ -
       | https://news.ycombinator.com/item?id=25142867 - Nov 2020 (58
       | comments)
       | 
       |  _Awk: `Begin { ` Part 1_ -
       | https://news.ycombinator.com/item?id=24940661 - Oct 2020 (106
       | comments)
       | 
       |  _Show HN: Awk-JVM - A toy JVM in Awk_ -
       | https://news.ycombinator.com/item?id=23612910 - June 2020 (27
       | comments)
       | 
       |  _Running Awk in parallel to process 256M records_ -
       | https://news.ycombinator.com/item?id=23394024 - June 2020 (101
       | comments)
       | 
       |  _The State of the AWK_ -
       | https://news.ycombinator.com/item?id=23240800 - May 2020 (86
       | comments)
       | 
       |  _Awk in 20 Minutes (2015)_ -
       | https://news.ycombinator.com/item?id=23048054 - May 2020 (126
       | comments)
       | 
       |  _Show HN: An eBook with hundreds of GNU Awk one-liners_ -
       | https://news.ycombinator.com/item?id=22758217 - April 2020 (48
       | comments)
       | 
       |  _Learn Awk by Example (2019)_ -
       | https://news.ycombinator.com/item?id=22455779 - March 2020 (29
       | comments)
       | 
       |  _Awk As A Major Systems Programming Language, Revisited (2018)_
       | - https://news.ycombinator.com/item?id=22304017 - Feb 2020 (80
       | comments)
       | 
       |  _Why Learn Awk? (2016)_ -
       | https://news.ycombinator.com/item?id=22108680 - Jan 2020 (235
       | comments)
       | 
       |  _Learn Just a Little Awk (2010)_ -
       | https://news.ycombinator.com/item?id=21101478 - Sept 2019 (69
       | comments)
       | 
       |  _Awk by Example_ - https://news.ycombinator.com/item?id=20308865
       | - June 2019 (21 comments)
       | 
       |  _Removing duplicate lines from files keeping the original order
       | with Awk_ - https://news.ycombinator.com/item?id=20037366 - May
       | 2019 (154 comments)
       | 
       |  _GNU Awk 5.0_ - https://news.ycombinator.com/item?id=19671983 -
       | April 2019 (49 comments)
       | 
       |  _Learn just a little Awk (2010)_ -
       | https://news.ycombinator.com/item?id=17322412 - June 2018 (244
       | comments)
       | 
       |  _The Awk Programming Language (1988) [pdf]_ -
       | https://news.ycombinator.com/item?id=17140934 - May 2018 (207
       | comments)
       | 
       |  _Learn to use Awk with hundreds of examples_ -
       | https://news.ycombinator.com/item?id=15549318 - Oct 2017 (116
       | comments)
       | 
       |  _Awk for multimedia_ -
       | https://news.ycombinator.com/item?id=15410259 - Oct 2017 (24
       | comments)
       | 
       |  _Awk driven IoT_ - https://news.ycombinator.com/item?id=14735752
       | - July 2017 (35 comments)
       | 
       |  _Skip grep, use awk_ -
       | https://news.ycombinator.com/item?id=14692233 - July 2017 (130
       | comments)
       | 
       |  _Awk vs. Perl (2009)_ -
       | https://news.ycombinator.com/item?id=14647022 - June 2017 (71
       | comments)
       | 
       |  _The Awk Programming Language (1988) [pdf]_ -
       | https://news.ycombinator.com/item?id=13451454 - Jan 2017 (103
       | comments)
       | 
       |  _Show HN: 3D shooter in your terminal using raycasting in Awk_ -
       | https://news.ycombinator.com/item?id=10896901 - Jan 2016 (55
       | comments)
       | 
       |  _Awk in 20 Minutes_ -
       | https://news.ycombinator.com/item?id=8893302 - Jan 2015 (85
       | comments)
       | 
       |  _An Awk Primer_ - https://news.ycombinator.com/item?id=7961848 -
       | June 2014 (28 comments)
       | 
       |  _A Crash Course In Awk_ -
       | https://news.ycombinator.com/item?id=6578960 - Oct 2013 (37
       | comments)
       | 
       |  _Why Awk for AI? (1997)_ -
       | https://news.ycombinator.com/item?id=5725291 - May 2013 (53
       | comments)
       | 
       |  _Ask HN: Do people build websites in Awk?_ -
       | https://news.ycombinator.com/item?id=5041323 - Jan 2013 (12
       | comments)
       | 
       |  _Why you should learn just a little Awk - A Tutorial by Example_
       | - https://news.ycombinator.com/item?id=2932450 - Aug 2011 (76
       | comments)
       | 
       |  _Announcing my first e-book "Awk One-Liners Explained"_ -
       | https://news.ycombinator.com/item?id=2674284 - June 2011 (24
       | comments)
       | 
       |  _AWK-ward Ruby_ - https://news.ycombinator.com/item?id=2486231 -
       | April 2011 (31 comments)
       | 
       |  _Music with AWK_ - https://news.ycombinator.com/item?id=2294909
       | - March 2011 (15 comments)
       | 
       |  _Exercise #1: Learning awk Basics_ -
       | https://news.ycombinator.com/item?id=2210085 - Feb 2011 (20
       | comments)
       | 
       |  _Why you should learn at least a little bit of Awk_ -
       | https://news.ycombinator.com/item?id=1738688 - Sept 2010 (62
       | comments)
       | 
       |  _Don 't MAWK AWK - the fastest and most elegant big data munging
       | language_ - https://news.ycombinator.com/item?id=815529 - Sept
       | 2009 (22 comments)
        
       ___________________________________________________________________
       (page generated 2021-09-30 23:00 UTC)