2024-07-02 - Dinosaur Hunting Part 2
====================================
Recently i scoured my recipe collection for recipes with ALL
UPPERCASE TEXT. I wrote a quick algorithm in AWK to sort recipes by
percentage of uppercase letters versus lowercase letters. I called
this "dinosaur hunting."
My algorithm has a weakness. Suppose there is an otherwise normal
recipe file that has only one paragraph with all uppercase letters.
This can fall below the 30% uppercase letter threshold, so it would
not be reported.
I wrote a new algorithm find recipes with at least 3 consecutive
lines that are all uppercase.
$ cat >caps2.awk <<_EOF__
BEGIN {
FS=""
}
!/^MMMMM/ {
if (lfn != FILENAME) {
wasallcaps = 0
}
lcase = 0
ucase = 0
for (i = 1; i <= NF; i++) {
if (match($i, /[a-z]/)) {
lcase++
} else if (match($i, /[A-Z]/)) {
ucase++
}
}
if (ucase == 0 || lcase > 0) {
wasallcaps = 0
} else {
wasallcaps++
if (wasallcaps > 2) {
dinosaurs[FILENAME] = 1
}
}
lfn = FILENAME
}
END {
for (fn in dinosaurs) {
print fn
}
}
__EOF__
Then i ran this script against all recipe files:
$ find moar/ascii -type f | xargs awk -f caps2.awk >clis
This revealed around 20 dinosaurs missed by my original algorithm.
tags: bencollver,retrocomputing,technical
Tags
====
bencollver
retrocomputing
technical