2024-07-02 - Dinosaur Hunting Part 2 ==================================== Recently i scoured my recipe collection for recipes with ALL UPPERCASE TEXT. I wrote a quick algorithm in AWK to sort recipes by percentage of uppercase letters versus lowercase letters. I called this "dinosaur hunting." My algorithm has a weakness. Suppose there is an otherwise normal recipe file that has only one paragraph with all uppercase letters. This can fall below the 30% uppercase letter threshold, so it would not be reported. I wrote a new algorithm find recipes with at least 3 consecutive lines that are all uppercase. $ cat >caps2.awk <<_EOF__ BEGIN { FS="" } !/^MMMMM/ { if (lfn != FILENAME) { wasallcaps = 0 } lcase = 0 ucase = 0 for (i = 1; i <= NF; i++) { if (match($i, /[a-z]/)) { lcase++ } else if (match($i, /[A-Z]/)) { ucase++ } } if (ucase == 0 || lcase > 0) { wasallcaps = 0 } else { wasallcaps++ if (wasallcaps > 2) { dinosaurs[FILENAME] = 1 } } lfn = FILENAME } END { for (fn in dinosaurs) { print fn } } __EOF__ Then i ran this script against all recipe files: $ find moar/ascii -type f | xargs awk -f caps2.awk >clis This revealed around 20 dinosaurs missed by my original algorithm. tags: bencollver,retrocomputing,technical Tags ==== bencollver retrocomputing technical