2024-07-02 - Dinosaur Hunting With AWK ====================================== The 80,000 or so recipes in MOAR came from a dump of the web site formerly at soar.berkeley.edu. Most of the recipes were collected from the BBS scene going back into the days of yore when dinosaurs roamed cyberspace and real programmers wrote code using bits of shells and strings. Some hardware and software DID NOT SUPPORT LOWERCASE LETTERS AT ALL. Consequently, some of the recipes used ALL CAPITAL LETTERS. Some recipes were normal except either the ingredients were all uppercase, or the instructions were all uppercase. I PERSONALLY FIND DINOSAUR LANGUAGE DIFFICULT TO READ, SO I RESOLVED TO FIND THESE RECIPES AND FIX THEM ONCE AND FOR ALL. I wrote a quick awk script to report the percentage of capital letters in each recipe file. $ cat >caps.awk <<_EOF__ BEGIN { FS="" } { for (i = 1; i <= NF; i++) { if (match($i, /[a-z]/)) { lcase[FILENAME]++ } else if (match($i, /[A-Z]/)) { ucase[FILENAME]++ } } } END { for (fn in ucase) { lnum = lcase[fn] unum = ucase[fn] if (unum > 0) { pct = int(100 * unum / lnum) printf "%d\t%s\n", pct, fn } } } __EOF__ Then i ran this script against all recipe files: $ find moar/ascii -type f | xargs awk -f caps.awk | sort -n >clis Using trial and error i found that files with more than 30% uppercase letters were good candidates to be fixed. This identified 659 dinosaurs. It took some doing, but now these recipes are fixed to be more readable on MOAR. tags: bencollver,retrocomputing,technical Tags ==== bencollver retrocomputing technical