(TXT) View source
       
       # 2024-07-02 - Dinosaur Hunting Part 2
       
       Recently i scoured my recipe collection for recipes with ALL
       UPPERCASE TEXT.  I wrote a quick algorithm in AWK to sort recipes by
       percentage of uppercase letters versus lowercase letters.  I called
       this "dinosaur hunting."
       
       My algorithm has a weakness.  Suppose there is an otherwise normal
       recipe file that has only one paragraph with all uppercase letters.
       This can fall below the 30% uppercase letter threshold, so it would
       not be reported.
       
       I wrote a new algorithm find recipes with at least 3 consecutive
       lines that are all uppercase.
       
           $ cat >caps2.awk <<_EOF__
           BEGIN {
               FS=""
           }
           !/^MMMMM/ {
               if (lfn != FILENAME) {
                   wasallcaps = 0
               }
               lcase = 0
               ucase = 0
               for (i = 1; i <= NF; i++) {
                   if (match($i, /[a-z]/)) {
                       lcase++
                   } else if (match($i, /[A-Z]/)) {
                       ucase++
                   }
               }
               if (ucase == 0 || lcase > 0) {
                   wasallcaps = 0
               } else {
                   wasallcaps++
                   if (wasallcaps > 2) {
                       dinosaurs[FILENAME] = 1
                   }
               }
               lfn = FILENAME
           }
           END {
               for (fn in dinosaurs) {
                   print fn
               }
           }
           __EOF__
       
       Then i ran this script against all recipe files:
       
           $ find moar/ascii -type f | xargs awk -f caps2.awk >clis
       
       This revealed around 20 dinosaurs missed by my original algorithm.
       
       tags: bencollver,retrocomputing,technical
       
       # Tags
       
 (DIR) bencollver
 (DIR) retrocomputing
 (DIR) technical