= Text has styles Or: Gopher content isn't as simple as it first appears! Plain text isn't plain. I'm not talking about encoding formats - that's a whole 'nother ball of wax. I mean, well-structured plain text has *visual* cues to indicate title, author, section titles, lists, paragraphs, block quotes, code listings, etc. In that sense, it has 'styles'. What the heck are you talking about today, Ratfactor, you crazy animal? Let's consider a concrete example: the RFC. == RFC Request For Comments (RFCs) are the publications by which Internet Standards are proposed (in truth, they usually *become* the canonical standards). They are (almost) always submitted as ASCII text and follow the rules described in RFC 2223. I'll quote a few formatting rules: - pages are limited to 58 lines - lines are limited to 72 chars (cols) - do not attempt to justify right margin - single-space between words - separate paragraphs with one blank line It goes on to specify headers and footers (single-line, RFC ####, centered title, date, page number, etc.), the format and content of the first (cover) page, and standard and required sections in the body of the document. RFCs are plain text, but it would be an unbelievable pain to format them by hand. So they are (almost always) authored using a member of the *roff typesetting family. (Fun aside, RFC 2223 was written in 1997 and at that time, even the Lords of the Internet could not get nroff to do the right thing between page breaks, so they describe piping the nroff output through a Perl script called 'fix.pl' to complete the formatting process!) RFCs are formatted plain text. But they aren't formatted the way I would format a plain text email message to a friend. And they're not how I'd format a post to this phlog. So RFCs have a 'style' all their own - even though they're what we refer to as "plain text". So text has styles. Let's explore this. But first, a bit of a meander through the world of document markup languages - starting with the one we've just been discussing, troff. == Troff I became quite interested in *roffs (which I'll refer to as just 'troff' for the remainder of this post) a while back because they are part of the standard Unix text formatting and publishing/printing ecosystem. (By the way, 'roff' comes from the phrase "run off" as in "run the document off on the printer" and the original programs were created to generate a typeset document for very specific (and very expensive) printers back in the 1960s and 1970s. The programs became increasingly generalized and powerful as they progressed from nroff ("newer") to troff ("typesetter") to ditroff ("device independant") and taken under the GNU wing as groff.) For a while, I had a good time sending really high quality documents through my laser printer from various groff macros. I could also produce PDFs, PostScript files, and even formatted text documents. I kinda thought I'd finally discovered the One True Plain Text Document Format. But after actually *using* raw troff formatting and then, ms, mm, and mom macros for a while, I realized that I hated it. Yeah the quality was great and the tool was fast and ubiquitous. But in the age of Markdown, troff is a freakishly old-fashioned and painful format to type. It's noisy and cryptic and really gets in the way. The macros help (mom, in particular, is really nice if your document is a good fit for what it provides), but not enough. The _real_ problem with troff, though, is that it's more about the appearance of things rather than being about the semantic structure of documents. (The macro packages do help with this.) This makes perfect sense given its origins. == Markdown I believe that the document source file should be as human-readable (and typeable! and memorable!) as possible. Markdown provides this. Markdown is limited and has some serious shortcomings, so while I adore it over, say, Microsoft Word, I was never happy with using it as the One True Plain Text Document Format. The real positive impact that Markdown's popularity has done above all else, I think, is that it has re-introduced the idea that "plain" text _has_ structure, even if it's entirely ad hoc. By taking unwritten rules for text formatting from sources such as email and Usenet, John Gruber and Aaron Swartz codified and popularized in Markdown the rules we were already playing by. The reason for the popularity isn't important - it only matters that it is. Whatever Markdown's failings and limitations, it has undoubtably made the Web a better place. I've made several mentions of Markdown's problems, but I should probably make it a little clearer what I mean before continuing on. I could make a list, but it mostly comes down to three categories: 1. The spec is ambiguous, so we have many different interpretations (how to make part of a word italic, for example) 2. It's _too_ limited for "real world" documents (tables, footnotes), so we have many different extensions to Markdown to fill the gaps 3. The link syntax is stupid and I hate it Okay, that 3rd one is mine. But you'll find plenty of supporting material for the first two if you look hard enough on the Web. The point it, Markdown is simply _not_ enough, on its own, for documents of any complexity. That's a strength as well as a weakness. After that, to be completely clear, let me state again that I really like Markdown and (especially) what it has done for popularizing a readable source format for textual documents! == Alternatives to Markdown Having established an actual need for an alternative to Markdown, what do we have? A curated and opinionated list: troff no fun to read/write, emphasis on typesetting HTML perfect except no fun to read/write TeX (and LaTeX) powerful; noisy and cryptic RTF TeX without any of the advantages creole a bit noisy and wiki-centric reST reStructuredText, fine but a bit eccentric EtText inspiration for Markdown, emphasis on HTML Textile another language from the Markdown era Org-mode essentially tied to Emacs, specialized Texinfo Great for 1986, never took off outside GNU docbook comprehensive format for books!; XML :-( AsciiDoc docbook without the XML! (Keep in mind that these comments and comparisons are based on ONE use-case: having a source format that is nearly effortless to type (like Markdown) and yet is capable of enough complexity to satisfy _most_ document needs: a memo, an article, a blog/phlog post, a novel, a technical book, etc. When I dismiss HTML, it is not because it isn't capable (it's _very_ capable) but it's no fun to type. I've been hand-writing the stuff for over twenty years now and I know it well, but it's NO FUN TO TYPE when I just want to create some dang content.) When you look at the initial release date for a lot of these, you find that the idea really came to a head in the early 2000s with reStructuredText in 2001, AsciiDoc and Textile in 2002, Org-mode in 2003, Texy and Markdown in 2004 (and a huge number of less notable others throughout - heck, I had my own little goofy line-based format to generate parts of my website around that time...). It's so interesting how ideas for inventions occur almost simultaneously in different parts of the world across human history: math concepts, cars, radios, etc. "It takes a thousand men to invent a telegraph, or a steam engine, or a phonograph, or a photograph, or a telephone or any other important thing - and the last man gets the credit and we forget the others. He added his little mite - that is all he did. These object lessons should teach us that ninety-nine parts of all things that proceed from the intellect are plagiarisms, pure and simple; and the lesson ought to make us modest. But nothing can do that." -- Mark Twain Okay, enough asides. So you may have noticed that I listed AsciiDoc last and did not list any negatives. Savvy readers may have surmised that I have picked my horse in the race. You would be right. == AsciiDoc So what makes AsciiDoc so compelling? Without going into an exhaustive history and waxing on and on about AsciiDoc, I'd like to mention some highlights: - high quality implementations are available (AsciiDoctor) - with a large number of output document types - it's equivalent to DocBook, made for authoring books! - O'Reilly books are (or were) authored with it - it's as readable as Markdown That final statement is, of course, completely subjective. I don't *love* every little bit of AsciiDoc syntax. But it doesn't get in my way. The biggest factors for me are the high quality implementations for generating HTML and the fact that it has all of the semantic structure needed to create honest-to-goodness published technical books! These, to me, are the hallmarks of a One True Plain Text Document Format. So I've been slowly-but-surely converting just about everything I wrote to AsciiDoc. (To that end, I picked Hugo as a static generator for my website (scrapping my 4th generation home-grown static site generator) because it can use AsciiDoctor as the backend HTML renderer (and with some modifications it can render HTML that doesn't make me want to stab my hand with a fork when I "view source".)) What I'd like to do is use the AsciiDoc format for generating Gopher content: 1. It's a proven format 2. I can start with a subset of AsciiDoc and add as needed 3. I can seamlessly publish my Web site content to Gopher But wait, everything we've been talking about so far is for producing HTML output (or PDF or PostScript, etc.). Why would I need to do anything at all to my AsciiDoc content? Couldn't I just upload the raw page source to Gopher and call it a day? Yeah, kinda. You could, in theory, read the content just fine. But there's a difference between text that is marginally readable in a Gopher client and text which was purpose-crafted to be beautifully readable at a fixed column width containing ASCII art and cute little section breaks and other niceties. Let's come full circle and see if we can tie this all together. == Back to "plain text has styles" and let's talk about Gopher In my _Gopher Logging in Eleven Lines of Shell_ phlog post [1] I made an ill-fated attempt to create a tiny streamlined process for publishing entries. (There were a couple of problems with it, but the number one was that using GNU 'fmt' completely wrecked my code example and ASCII directory tree. fmt does a beautiful job with prose (including indents), but doesn't know how to leave other types of things alone. 'fold' has the same problem. 'par', on the other hand, is a replacement for 'fmt' and seems to be far more capable. I will be experimenting with it to post this today.) One thing that became clear to me is that I have already defined certain 'styles' for my Gopher phlog posts, such as tab-indented source code examples. And that I'm still experimenting with the 'style' of other elements such as section titles, lists, title/header, etc. And Gopher's tricky because many clients assume fixed-column layouts, so we have to be really careful about long lines. Now I have some opinions about that state of affairs - but would take me way off topic and I think this is getting long enough for a Phlog entry as it is. Anyway, when I *do* come up with my personal styles of Gopher content, it would be really neat to be able to run all of my source documents through a renderer and have them all come out with the same styles. To do that, my source documents need to store the *semantic* information about the document structure and elements in a common format. Ah, so, enter Ratfactor's choice for the One True Plain Text Document Format: AsciiDoc! And now, at last, I can truly demonstrate what I mean when I say plain text 'styles' (and not just 'formatting'). == Example Imagine there is a hypothetical language TSS: Text Style Sheets. Let's imagine we have a working renderer already built and we're feeding it two elements: a 'style sheet' (.tss) and an AsciiDoc source file (.adoc). Our imaginary renderer (actually yours truly and his keyboard) will produce Gopher-formatted text before your very eyes! First, here's the content of our article source, 'clowns.adoc': ---------------------------------------------------------- = Clown Clowns are closely related to parametric polymorphism. For example, note that the type of wiggler as specified would be the parametrically polymorphic type bing -> [bang] -> Barf. function clown(){ if(honk==toot) mime_style = false; emit_gags(); return 1; } Of course, there is a more common definition: > A clown is a comic performer who employs slapstick or similar types of physical comedy, often in a mime style. (Wikipedia) == More clowning around Now we shall consider the clown as a mammal... And here's a (syntax hypothetical) 'simple.tss': ---------------------------------------------------------- body { width: 40; } title { text-align: center; template: "*** $title ***"; margin-bottom: 2; } h2 { margin-top: 2; text-transform: uppercase; } blockquote { margin-left: 1t; } And here is the glorious output of me, the human Gopher phlog renderer applying the styles of 'simple.tss': ---------------------------------------------------------- *** Clown *** Clowns are closely related to parametric polymorphism. For example, note that the type of wiggler as specified would be the parametrically polymorphic type bing -> [bang] -> Barf. function clown(){ if(honk==toot) mime_style = false; emit_gags(); return 1; } Of course, there is a more common definition: A clown is a comic performer who employs slapstick or similar types of physical comedy, often in a mime style. (Wikipedia) MORE CLOWNING AROUND Now we shall consider the clown as a mammal... (Note how the body of the document is now constrained to 40 columns, the title is centered, the heading is changed to all-caps and has a margin of two lines above it, and so on.) I was planning to have multiple examples, but I think this should be enough to get the idea across. Anyway, that stuff takes work, whew! I'm not married to the name 'TSS', to the syntax (a la CSS), or anything of the rest of it (except AsciiDoc). As a matter of fact, I have long been a big critic of CSS as it is used on the Web, but this usage is actually *much* more closely aligned with CSSs strengths, so it may not be a bad choice. At first, this seemed like a fairly simple thing, but even just in typing the above example, I can see how the complexity of the 'style' specification language could quickly spiral out of control! It would be smart to take a look at existing template languages and things like troff's macros for ideas. Clearly, an opinionated AsciiDoc-to-Gopher converter without user styles would be an order of magnitude easier to create. If, hypothetically speaking, I were to start on such a project, that's where I'd start. == Conclusion Ratfactor is a crazy sewer rat. * * * Community notes: 1. I finally tried out solderpunk's VF-1 gopher client. Fantastic! I'm not sure if it will oust lynx or not, but it's full of really great ideas and I love the keyboard navigation. :-) 2. Thanks to tomasino for adding me to the Phlog Roll on gopher.black! [2] I didn't realize you were speaking of a published list when you mentioned it on IRC the other day. That's really cool and I appreciate the vote of confidence. :-) [1] gopher://sdf.org/0/users/ratfactor/phlog/2018-08-13-Gopher-Logging-in-Eleven-Lines-of-Shell [2] gopher://gopher.black/1/moku-pona/