2024-07-02 - Dinosaur Hunting With AWK
======================================
The 80,000 or so recipes in MOAR came from a dump of the web site
formerly at soar.berkeley.edu. Most of the recipes were collected
from the BBS scene going back into the days of yore when dinosaurs
roamed cyberspace and real programmers wrote code using bits of
shells and strings. Some hardware and software DID NOT SUPPORT
LOWERCASE LETTERS AT ALL. Consequently, some of the recipes used
ALL CAPITAL LETTERS. Some recipes were normal except either the
ingredients were all uppercase, or the instructions were all
uppercase.
I PERSONALLY FIND DINOSAUR LANGUAGE DIFFICULT TO READ, SO I RESOLVED
TO FIND THESE RECIPES AND FIX THEM ONCE AND FOR ALL.
I wrote a quick awk script to report the percentage of capital
letters in each recipe file.
$ cat >caps.awk <<_EOF__
BEGIN {
FS=""
}
{
for (i = 1; i <= NF; i++) {
if (match($i, /[a-z]/)) {
lcase[FILENAME]++
} else if (match($i, /[A-Z]/)) {
ucase[FILENAME]++
}
}
}
END {
for (fn in ucase) {
lnum = lcase[fn]
unum = ucase[fn]
if (unum > 0) {
pct = int(100 * unum / lnum)
printf "%d\t%s\n", pct, fn
}
}
}
__EOF__
Then i ran this script against all recipe files:
$ find moar/ascii -type f | xargs awk -f caps.awk | sort -n >clis
Using trial and error i found that files with more than 30% uppercase
letters were good candidates to be fixed. This identified
659 dinosaurs. It took some doing, but now these recipes are fixed
to be more readable on MOAR.
tags: bencollver,retrocomputing,technical
Tags
====
bencollver
retrocomputing
technical