gopher.black

       ----------------------------------------
       Using ptx to generate one-time pads
       March 15th, 2018
       ----------------------------------------
       
       I have been working my way through coreutils [0] recently when
       I came across ptx. 
       
         $ apropos ptx
       
         ptx (1) - produce a permuted index of file contents
       
       What the hell does that mean? I know...
       
         $ man ptx
       
         PTX(1)                User Commands                  PTX(1)
       
         NAME
          ptx - produce a permuted index of file contents
       
         SYNOPSIS
          ptx [OPTION]... [INPUT]...   (without -G)
          ptx -G [OPTION]... [INPUT [OUTPUT]]
       
         DESCRIPTION
          Output a permuted index, including context, of the words
          in the input files.
       
          With no FILE, or when FILE is -, read standard input.
       
          Mandatory arguments to long options are mandatory for
          short options too.
       
          ...
       
       Oh that totally clears it... nope. Still no clue.
       
       So I asked on Mastodon and a few people had some suggestions in
       particular someone was able to shoot me over to a blog post [1]
       which tries to clear up what a 'purmuted index' even is. And
       that's the key. So check this out: 
       
       A while back before we had badass search engines and hyperlinked
       doom shenanigans manually finding the reference to a word in
       a document SUUUUUUUUCKED. So they made this index in the back that
       listed all the key terms alphebetically in the middle column of
       a page. To the left of that word it would list whatever sentence
       led up to it. To the right they'd list the sentence fragment that
       followed the term. Finally, the page number. With that you could
       jump to the page and eye-ball search it yourself.
       
       It's been around since systemV and it's pretty much useless,
       right? Well, foxy, I think I came up with a fun hobby use-case.
       
       Pick a book with a publically available canonical plain-text
       source. Oh, I dunno, head over to Project Gutenburg [2] or
       something and wrestle yourself up some Joyce (or ILLEGAL GERMAN
       NOVELS!!!!! [3]). We're gonna shove that badboy into ptx like
       a champ. Here we go...
       
         $ curl https://www.gutenberg.org/files/4300/4300-0.txt > ulysses.txt
         $ ptx ulysses.txt
       
         SCREEN EXPLODES WITH TEXT FOR SEVERAL MINUTES!!!!!
       
       That's not how that works. Back to manpage!
       
                 Hmmm...
                     ...assumes latin-1 charset...
                 ...ignore case, perhaps...
                      ...[.?!][]\"')}]*\\($\\|\t\\|  \\)[ \t\n]*...
                 ...Emacs next-error, grumble...
                      ...-w, width, ahha...
                         
                       ROFF! NO FUCKING WAY!
       
       One of the output formats for ptx is freaking roff! Syncronicity,
       baby! [4] Lets try something a little smaller.
       
         $ curl http://www.gutenberg.org/cache/epub/1065/pg1065.txt > theraven.txt
         $ ptx -O -f -w 66 theraven.txt > theraven-index.txt
       
       That sorta works. Ugh, but I'm getitng tired. Here's the plan for
       what's next:
       
         - Figure out how to format this stuff so I can awk it
         - awk so that the text key and one more word to the right are
           the output. Two words with a space between, that's it.
         - sort unique that bad-boy by each column in turn so both pairs
           of words are unique.
         - Use whatever words are in your primary list to write a plain
           text message. If your source document is large enough that's
           virtually any word you'd like to use.
         - Use awk to replace your words with the one to the right via
           a lookup file
         - Send secret message to a friend. The knowledge of which book
           is your cypher is all that's necessary to repeat the process
           in reverse.
       
       Huzzah for secret codes.
       
       If I get some time this weekend I'll look at writing a script to
       automate this for you. Provide a book and a message and indicate
       whether to encode or decode. Oh what fun that would be for some
       private crypto. Thinking you could do this in perl? Wanna show me
       up? Put your illogical collection of special characters where your
       mouth is, buddy!
       
 (TXT) [0] GNU Core Utilities
 (HTM) [1] Reading a Permuted Index
 (DIR) [2] Project Gutenberg on Gopher
 (HTM) [3] Project Gutenberg Blocks Access to Germany
 (TXT) [4] dbucklin - Formatting for Gopher with GNU troff