telefisk.org

       From: gopher-bounce@complete.org
       Date: Sun Aug 10 22:39:32 2008
       Subject: [gopher] Re: Gopherness
       
       JumpJet Mailbox <jumpjetinfo@yahoo.com>
       writes:
       
       > --- On Mon, 8/4/08, Nuno J. Silva
       > <nunojsilva@ist.utl.pt> wrote:
       >>
       >> "Jay Nemrow" <jnemrow@quix.us> writes:
       >>
       >>> On Mon, Aug 4, 2008 at 10:19 AM, Kyevan
       >>> <kyevan@sinedev.org> wrote:
       >>>
       >>>> What about older clients, though? Modern clients will probably
       >>>> handle UTF-8 at least well enough to not explode, but older clients
       >>>> might not.  Generally, it seems safest to stick to the subset that
       >>>> is ASCII when reasonable, only using UTF-8 or such when it's
       >>>> actually needed. ... is a perfectly readable replacement for
       >>>> U+2026, even if it's not "typographically correct." On the other
       >>>> hand, if you're trying to post a text in, say, a mix of Arabic, and
       >>>> Klingon, go right ahead and use UTF-8.
       >>
       >> There are also these iso* charsets which just use 8 bit to encode the
       >> text, not allowing a greater collection of characters, and using
       >> those you wouldn't be able to mix charsets.
       <snip/>
       >> On the other hand, even if the choice was utf8 (so the documents would
       >> be ASCII or utf8), I'd keep iso* support, just in case (therefore my
       >> question is 'should we use the same sort of character encoding when
       >> publishing non-english documents? if yes, which one?' and not 'what
       >> should a client support?').
       >>
       >> What's the actual scenario? Is there any client which crashes due to
       >> utf8? Which clients are not able to render it correctly? And what
       >> about iso* charsets support?
       >
       > How would we print a Gopher retreived text document on, for example,
       > an older (or mini-mainframe) computer which only uses a Daisy Wheel
       > Printer or Teletype Printer (which ONLY supports ASCII characters)?
       
       If the documents (in any of the mentioned encodings) have non-ASCII
       characters, the behaviour is undefined (e.g., if the machine ignores the
       8th bit, another characters will be rendered instead of the desired ones).
       
       But there's nothing we can do about that, except writing some script to
       replace the existing non-ASCII characters with some ASCII description.
       
       Avoiding the use of non-ASCII characters is, of course, a good
       idea. But, if there's some document in a non-western language, or a
       language which requires another alphabet, it's impossible to use ASCII
       in that situation.
       
       <snip/>
       
       -- 
       Nuno J. Silva (aka njsg)
       LEIC student at Instituto Superior Técnico
       Lisbon, Portugal
       Homepage: http://njsg.no.sapo.pt/
       Gopherspace: gopher://sdf-eu.org/11/users/njsg
       Registered Linux User #402207 - http://counter.li.org
       
       -=-=-
       Ooh, mommy, mommy, what I have now doesn't work in this extremely
       unlikely circumstance, so I'll just throw it away and write something
       completely new.
               -- Linus Torvalds