[HN Gopher] Emojis paved the way for UTF-8 everywhere
       ___________________________________________________________________
        
       Emojis paved the way for UTF-8 everywhere
        
       Author : velmu
       Score  : 152 points
       Date   : 2020-11-17 15:39 UTC (7 hours ago)
        
 (HTM) web link (developers.ibexa.co)
 (TXT) w3m dump (developers.ibexa.co)
        
       | ChrisArchitect wrote:
       | didn't seem like it was emojis paving the path but web-based
       | email and internationalization of websites. Just the whole move
       | to web in various key areas like email meant it became just less
       | of a hair-pulling nightmare for developers to have to deal with
       | encoding between countries and platforms. Throw in the dawn of
       | smartphones (and emojis came along with that yes) and that was
       | more problems on top of that, people moving between
       | desktop/mobile/web etc. UTF-8 took care of alot of the headache.
        
       | Andrew_nenakhov wrote:
       | In russian segment of internet, various cyrillic encodings (win,
       | dos, mac, koi-8) were a huge problem, and only UTF-8 finally
       | solved it long before emojis became a thing.
       | 
       | It is known that developers from English-speaking countries are
       | generally oblivious to encoding problems. Probably they could get
       | by on ASCII far longer than the rest of the world, so no wonder
       | that they might confuse cause and effect in this case.
        
       | hprotagonist wrote:
       | I am 100% convinced, tangentially, that mobile OS point releases
       | use emoji as user bait to have a stronger guarantee of regular
       | security updates.
       | 
       | "update now to get access to :burrito: and :taco:! (also fix the
       | following 12 CVEs that 90% of our userbase doesn't know about or
       | read)"
        
         | xxpor wrote:
         | If this was ever a conscious decision, it was genuinely
         | brilliant. Much better to have a carrot rather than the stick
         | of "you'll get hacked (except probably not)"
        
           | black_puppydog wrote:
           | You mean much better to have a :carrot: ? :slight_smile:
           | 
           | In any case, this must be the most ridiculous carrot yet. :D
        
             | pen2l wrote:
             | As a millennial whose only chatting activities are confined
             | to irc and doesn't really get emojis -- could someone
             | please articulate this phenomenon in terms that would be
             | meaningful to me?
             | 
             | I've seen sentences sometimes where words are actually
             | replaced with emojis... is this how some subset of people
             | actually communicate online or that's just for some effect
             | of irony?
        
               | eat_veggies wrote:
               | It can be a way of adding intonation and other affective,
               | out-of-band communication back into text. I never really
               | see people use them purely to _replace_ words one-for-
               | one. But interspersed throughout a message, emojis can,
               | as you say, add irony, but also communicate that a
               | message that could be taken as ironic _isn 't_, or
               | communicate some other subtext.
               | 
               | Text is a flattening of speech, and emojis can add some
               | of those missing dimensions back -- and, like our IRL
               | verbal cues, tics, and gestures, they can be hard to
               | decode if you're not "in" on the game.
        
               | zck wrote:
               | > I never really see people use them purely to replace
               | words one-for-one.
               | 
               | This article does that. Here it might be more often than
               | the author would normally do, but I've seen things like
               | that non-ironically.
               | 
               | > To stay relevant in the age of social media you had to
               | support emoji or you were in the .
               | 
               | EDIT: HN stripped the emoji. The end of that sentence was
               | "or you were :skull: in the :droplet:", read as "or you
               | were dead in the water".
        
               | lovegoblin wrote:
               | > Here it might be more often than the author would
               | normally do, but I've seen things like that non-
               | ironically.
               | 
               | Maybe not ironically, but doing that definitely adds
               | _informality_ - exactly the kind of out-of-band context
               | that GP is talking about.
        
               | ziml77 wrote:
               | I've only seen this on Twitter and assumed it was to fit
               | posts within the character limit.
        
               | 542458 wrote:
               | IMHO it's a few things:
               | 
               | It's an ingroup signal, indicating (by "correctly" using
               | emoji) that you're part of a specific subculture to the
               | recipient. "Praying_emojii flame_emojii YASS" indicates
               | to me that the writer is young and hip. "Do you want to
               | have a BBQ bbq_emojii ?" says to me that the writer is
               | older and less hip.
               | 
               | It can be hard to communicate tone through writing.
               | Emojis allow one to instantly mark a piece of writing as
               | informal/non-serious with minimal effort. This includes
               | irony - "eggplant_emojii" is often a non-serious reply
               | indicating that "I am jokingly acting like this this is
               | sexual or attractive"
               | 
               | It's a proxy for longer writing. "Thumbsup_emoji" is a
               | substitute for some marginally harder to articulate
               | feelings of "looks good / I like it"
               | 
               | Of course, there are many subcultures that use emoji in
               | different ways and as proxies for other things as well.
               | At a previous employer we'd often just send "taco_emoji?"
               | to ask who was buying lunch. It's the sort of thing that
               | can be used/abused in many different ways.
        
               | hprotagonist wrote:
               | ironic effect is, of course, communication all by itself.
               | 
               | My use of emoji in messaging applications is primarily
               | limited to quick rebus-like reaction replies to meme
               | images, a quick and dirty reaction to a message in slack
               | expressing some vague emotional response, or making
               | complicated fart-jokes with my partner that rely on a lot
               | of out of band information.
               | 
               | (i also use IRC on the daily, and have been known to use
               | emoji there too, so these communication forms are not
               | disjoint)
        
               | lovegoblin wrote:
               | I really strongly recommend 'Because Internet:
               | Understanding the New Rules of Language' by Grechen
               | McCulloch. It's a lighthearted linguistic look at how our
               | written communication has changed over the last couple
               | decades since the advent of mainstream internet access.
        
         | capableweb wrote:
         | :burrito: and :taco: are not emojis as such though, they are
         | more alike good old "smileys" that turns characters into
         | images. Supporting emojis would be to support the UTS#51
         | https://www.unicode.org/reports/tr51/ natively.
        
           | [deleted]
        
           | hprotagonist wrote:
           | that's what i meant, but i was using the shortcode in my
           | message here.
           | 
           | "add a glyph to the emoji keyboard" is more precise.
        
             | elFarto wrote:
             | I'm not sure HN will even allow emojis:
             | 
             | Burrito: Taco:
             | 
             |  _edit_ No, it stripped those : '(.
        
               | madeofpalk wrote:
               | wait they remove emojis from comments?
        
               | airstrike wrote:
               | Not _every_ emoji [?]
        
               | deathanatos wrote:
               | I'm _guessing_ that that 's b/c that's one of those that
               | is inside the BMP?
        
               | seba_dos1 wrote:
               | Just as they remove plenty of other Unicode characters
               | here.
        
               | [deleted]
        
               | [deleted]
        
         | gumby wrote:
         | What an insightful observation!
         | 
         | It has a lot of follow on implications too.
        
         | skocznymroczny wrote:
         | Can I have an update that brings back pistols instead of water
         | guns?
        
           | samatman wrote:
           | Easily+, for yourself.
           | 
           | Emoji are just fonts, and with some search engine sleuthing,
           | you can find an OG pistol emoji out there, pop open your
           | system emoji font in an editor, and replace the water pistol
           | with a pistol pistol.
           | 
           | Everyone else will still see the Nerfed version, of course.
           | 
           | +For some value of easy
        
           | Andrew_nenakhov wrote:
           | But but water guns protect you from launching a mass-shooting
           | attack! Don't expose yourself to such hateful symbols!
        
         | Wowfunhappy wrote:
         | On my Mac, I spent an evening figuring out how Apple's emoji
         | picker works and backporting all the emoji's instead. But I'm
         | not entirely sure what this says about me.
         | 
         | https://forums.macrumors.com/threads/updating-maverickss-emo...
        
         | clon wrote:
         | With 10%, I believe you are overguesstimating the share of
         | users that know/care about CVE-s.
        
           | Sargos wrote:
           | That was his point. Users don't know/care about CVEs but love
           | want/want the new emojis. Everyone wins.
        
       | CodesInChaos wrote:
       | Emoji probably contributed to widespread support of supplemental
       | planes (fixing systems which treated UTF-16 as UCS-2), but I
       | doubt they contributed much to UTF-8's popularity.
        
         | masklinn wrote:
         | > fixing systems which treated UTF-16 as UCS-2
         | 
         | Or treated UTF8 as the nonsense that MySQL's utf8 is (it's
         | 3-bytes utf8 aka only the BMP, and silently drops anything from
         | the first non-BMP codepoint).
        
           | mfontani wrote:
           | Indeed. To ensure MySQL stores "real" UTF-8 one has to use
           | "utf8mb4" instead of "utf8", which just rolls off the tongue
           | and backwards (in)compatibility seems to be the reason why
           | one can't just DWIM things backwards... "utf8mb4 or bust" it
           | is, then!
        
       | FullyFunctional wrote:
       | "And it's amusing to see Apple using new emojis as a carrot to
       | get people to install the latest security patches."
       | 
       | OMG, that makes so much sense. I was the opposite, grumbling
       | about not caring about that silliness, not realizing the
       | psychology of things.
       | 
       | Having suffered through the dark ages with Microsoft increasingly
       | ruining the world with an endless stream of proprietary crap (I
       | still hate them for making people think tab width is
       | configurable), it's amazing to step back and witness how much
       | things have improved (on this narrow slice).
        
       | DangerousPie wrote:
       | Can confirm, at least anecdotally. As someone who runs a website
       | but doesn't have nearly enough time to do everything he wants to
       | do, upgrading to UTF-8 (and specifically utf8mb4) was never a
       | priority - until my users starting using emojis and breaking
       | things left, right and center.
        
       | kens wrote:
       | An entertaining article, but it's not historically accurate. If
       | you look at measured usage, UTF-8 took off around 2005 and was
       | the dominant web encoding by 2008. Emojis weren't added to
       | Unicode until 2010, at which point UTF-8 usage continued to
       | increase at exactly the same rate as before.
       | 
       | https://en.wikipedia.org/wiki/UTF-8#/media/File:Utf8webgrowt...
        
         | mpol wrote:
         | Hmm, in MySQL land you have utf8, which means utf8mb3, and
         | utf8mb4. Only the latter supports Emoji. It is only in the last
         | releases that utf8mb4 is supported and also the default
         | character set.
         | 
         | I work with WordPress a lot, and up to 3 years ago it was quite
         | common for MySQL setups at shared hosting providers to only
         | support utf8mb3. And Emoji support really did help here to move
         | it forward.
        
         | redisman wrote:
         | Does anyone know the real reason? Languages upgraded string to
         | default to utf8? Browsers changed their defaults?
        
         | crazygringo wrote:
         | Yes, very misleading clickbait title.
         | 
         | I was expecting some actual insightful anecdote about a pivotal
         | choice by a major tech company that made all the difference.
         | But nothing at all.
        
       | Freak_NL wrote:
       | Even more specifically, emoji paved the way for proper support of
       | Unicode characters from beyond the Basic Multilingual Plane
       | (BMP).
       | 
       | There are 16 of these planes. This first block of 65,536
       | characters is what you can encode with only two bytes (e.g.
       | UTF-16), and it includes most of what anyone alive needs to
       | encode their languages adequately enough. For a long while
       | anything encoded beyond this block had only limited support, and
       | plenty of bugs and limitations meant that using it was tricky
       | (well, it worked fine in LaTeX of course; via xelatex for
       | example). This was back in 2008/2009.
       | 
       | Characters encoded beyond the BMP in plane 1 and 2 included
       | things like esoteric CJKV additions (East Asian ideographs) not
       | usually in daily use, but part of historic documents.
       | 
       | Then came the emoji additions (a core set is part of the BMP and
       | came from Japanese telecom standards), and support is now
       | ubiquitous. Using UTF-8 is a no-brainer for most applications,
       | and a good things that is too!
        
         | chrisseaton wrote:
         | > you can encode with only two bytes (e.g. UTF-16)
         | 
         | UTF-16 is _variable width_ , not two bytes, and it can encode
         | any Unicode character.
        
           | toxik wrote:
           | OP probably meant UCS-2.
        
         | jesuscyborg wrote:
         | The historic planes beyond the basic multilingual plane are
         | usually referred to as the "astral planes" which includes
         | things like gothic, runes, alchemy, egyptian, and emoji
         | https://justine.storage.googleapis.com/astralplanes.txt
        
           | derefr wrote:
           | And the etymology of this being that Dungeons and Dragons has
           | a "Prime Material Plane" and an "Astral Plane", where the
           | Astral Plane connects the PMP to various "Outer Planes" made
           | of ridiculous not-oft-encountered stuff.
           | 
           | But whoever came up with this cute analogy got the analogy
           | _wrong_ -- the higher Unicode planes are analogous to the
           | "outer planes" themselves; while the "astral plane" would be
           | some sort of glue allowing you to access these outer planes
           | from _within_ the BMP. Like... surrogate-pair characters! One
           | could nickname the reserved surrogate-pair range in the BMP,
           | the  "astral projection" range ;)
        
             | kens wrote:
             | "Astral plane" predates Dungeons and Dragons by centuries.
             | Looking at old discussions, I couldn't find any evidence
             | that Unicode's usage is connected with D&D.
             | 
             | Early discussion of "astral character" or "astral plane"
             | for the Unicode supplementary planes at:
             | https://unicode.org/mail-arch/unicode-ml/Archives-
             | Old/UML024... Even earlier 1998 use:
             | https://www.unicode.org/L2/L1998/98354.pdf
        
             | Sniffnoy wrote:
             | The term "astral plane" is older than D&D, and I would
             | assume they took it from the more general usage, not the
             | specific usage in D&D.
             | https://en.wikipedia.org/wiki/Astral_plane
        
         | hinkley wrote:
         | UTF-8 is simple enough to implement and yet I've seen it done
         | improperly more than once.
         | 
         | The problem with UTF-8 is that the density is really good for
         | North America and Western Europe but drops off quite a bit for
         | other languages, and you have to trade CPU for bandwidth (eg,
         | gzip) to do much about it.
         | 
         | Japan has several encodings (though shiftJIS is the only one
         | that I can recall) that use escape characters to switch code
         | pages. As long as you don't switch too rapidly between kanji
         | and borrow words, it's more compact, but more complex to
         | implement (I would say less so than implementing gzip but if
         | you aren't using zlib, one of the most portable libraries in
         | existence, you have much bigger issues than character
         | encoding).
         | 
         | UTF-8 takes 3 bytes for all of the first block. Only the first
         | 2048 characters fit into 2 bytes, which is mostly European
         | languages.
        
           | [deleted]
        
           | Freak_NL wrote:
           | Outside of embedded software this really isn't that much of a
           | problem any more.
           | 
           | Taking a random Wikipedia page as sample I get 46kB (UTF-8)
           | versus 35kB (Shift-JIS). A random Japanese text from Project
           | Gutenberg is roughly  2/3  of the size of the UTF-8 text in
           | Shift-JIS.
           | 
           | Those are impressive enough numbers, but add just a single
           | photograph to the Wikipedia page and it doesn't matter at
           | all. Text is just pretty efficient, even if you use an
           | encoding that supports every language in the world.
        
           | crazygringo wrote:
           | First, that's because European languages have small
           | alphabets. It's not like Chinese or Japanese with their many
           | thousands of characters could have fit in those 2,048 spots
           | _anyways_. So it makes sense to allocate the small common
           | alphabets there.
           | 
           | Second, text is so comparatively tiny relative to photos,
           | video, code, etc. that it really doesn't matter at all
           | anyways.
           | 
           | Third, text is often zipped _as well_. It 's often zipped
           | over HTTP. It's zipped when it sits inside of an EPUB. It's
           | zipped when it sits inside a Word document. You can even
           | configure MySQL to zip text fields in a database. Basically,
           | whenever space _is_ an issue, you can fix it.
           | 
           | So it's hard to see how this is any problem in practice at
           | all, when phones and computers mostly ship with 32 GB of SSD
           | minimum.
        
         | bawolff wrote:
         | You're mixing up ucs-2 and utf-16.
        
           | Robin_Message wrote:
           | To expand on this comment, UCS-2 defines a fixed-length,
           | 2-byte encoding of Unicode. It can therefore only represent
           | the first 65536 characters in the Basic Multilingual Plane
           | (BMP).
           | 
           | UTF-16 allows representing characters outside of the BMP by
           | using a reserved area to split a single codepoint into two
           | surrogates that form a pair.
           | 
           | This makes UTF-16 complicated and in some ways worse than
           | UTF-8: the encoding is longer for many typical texts, but is
           | still not fixed-width. The bug you typically see is that
           | codepoints outside of the BMP are munged when clipping the
           | text to a certain length (or reversing it, but that doesn't
           | happen in real systems generally.)
        
             | seba_dos1 wrote:
             | The reason why some older mobile phones struggle with SMS
             | containing emojis instead of just displaying tofus in place
             | of unsupported characters is that there's no way to send
             | emojis in accordance to SMS standard - it defines the
             | encoding to be UCS-2. In order to put emojis in SMS, newer
             | phones send the messages as UTF-16 instead, technically
             | violating the standard, which can break some parsers that
             | only expect UCS-2 to be there.
        
             | [deleted]
        
             | a1369209993 wrote:
             | Nitpick: UCS-2 actually isn't fixed-length either, eg "x"
             | (small x+umlaut+ring above) is two code units (1E8D 030A)
             | or possibly three (0078 0308 030A).
        
               | ygra wrote:
               | UCS-2 uses a fixed number of (16-bit) code units to
               | represent a Unicode scalar value (code point). Of course,
               | to represent a grapheme cluster, more than one code point
               | may be needed, but that's true of Unicode in general.
        
               | a1369209993 wrote:
               | > that's true of Unicode in general.
               | 
               | Yes, that was rather my point: if you're using a Unicode-
               | based character encoding, you're going to have variable-
               | width characters regardless, so you might as well use
               | UTF-8.
               | 
               | > UCS-2 uses a fixed number of (16-bit) code units to
               | represent a Unicode scalar value (code point).
               | 
               | Sure, but that's a implementaion detail of the mapping
               | from characters (at the application level) to bytes (at
               | the physical(-ish) representation level).
        
             | ucarion wrote:
             | To that point, what are systems supposed to make of UTF-8
             | strings encoding codepoints in the surrogate pair range? Is
             | that well-defined?
             | 
             | In other words, to what extent are surrogate pairs a UTF-16
             | thing, rather than a Unicode thing that exists to
             | accommodate for UCS-2 -> UTF-16?
        
       | 1996 wrote:
       | Yes. We shouldn't dismiss the weight of non-technical people
       | voting with their dollars.
       | 
       | They may not care about i18n, but they do care about cute
       | emoticons.
       | 
       | So we get not just unicode support everywhere, but also character
       | pickers inside the keyboard.
       | 
       | And now we also get to benefit from unicode for things we find
       | pretty - for example, the famous powerline
       | https://github.com/powerline/powerline
       | 
       | Information density matters, and I can't wait for someone to
       | replace "old" color coding of files (.dircolors in batch) by 1
       | emoticon : a music note for music files, etc.
        
       | jcims wrote:
       | Now we just need emoji's for ip addresses so we can move to ipv6
        
         | djxfade wrote:
         | There's no place like 127*0*0*1
         | 
         | Edit: I didn't realize HN doesn't support emojis
        
       | smrtinsert wrote:
       | Homesite, I miss that program. Those were the days.
        
       | kaetemi wrote:
       | Not only helped to improve support for UTF-8, but also for those
       | pesky characters that take multiple codepoints...
        
         | masklinn wrote:
         | Skin tone modifiers & other composites arrived later to force
         | supporting those properly.
        
       | disown wrote:
       | Pretty sure the dominance of ASCII on the internet and the
       | efficieny/compatability of UTF-8 in relation to ASCII paved the
       | way for UTF-8 everywhere. It is the standard unicode encoding of
       | the internet.
       | 
       | If anything, I would say the UTF-8 paved the way for emojis, not
       | the other way around as the ubiquity of a unicode encoding
       | allowed for the existence of emojis. Can't encode emojies with
       | ASCII. You have to have unicode and its encoding first before you
       | can have emojis.
        
       | nradov wrote:
       | It's interesting to watch the evolution of written language in
       | action. I expect in 20 years we will routinely see emojis in
       | written English novels and news articles. In 50 years we'll see
       | them in textbooks and scientific journal articles.
        
         | AnIdiotOnTheNet wrote:
         | Yet another reason I'm glad for the inevitability of death.
         | 
         | I can't be the only one who thinks emoji are a terrible idea.
         | Granted, I also don't think logographic characters are a good
         | idea but at least they have thousands of years of use and
         | agreed upon semantic meaning behind them.
        
           | szhu wrote:
           | If everything you care to talk about can be easily described
           | using thousand-year-old ideas, then I can see why you are
           | against emojis. But this isn't true for many things people
           | want to talk about today.
           | 
           | Language is just an encoding for ideas, and emojis are a new
           | compression algorithm. Using a single character, you can now
           | convey certain thoughts and sentiments that you previously
           | needed many more characters to reference or explain.
           | 
           | "So then just explain it!" some might respond. "Why can't
           | people be bothered to spend even a little time to write down
           | what they think?" It's an accessibility issue. People have
           | limited time every day to get their ideas across, and they
           | deserve ways of conveying their ideas concisely. There is
           | precedent for this too -- this is why we have acronyms and
           | new words. "lol" and "minivan" don't have thousands of years
           | of agreed-upon semantic meaning behind them.
           | 
           | A final thought -- whether you think emojis are a terrible
           | idea might not be relevant to whether they should exist.
           | Letting people live their own lives to the fullest is much
           | more important than making sure you, I, a future historian,
           | or any other third party understands what they are saying.
           | But you don't have to worry about not being to understand
           | conversations. Given what you prefer, if someone wants to
           | address you as a target audience, then they probably won't
           | use emojis.
        
             | reaperducer wrote:
             | _this isn 't true for many things people want to talk about
             | today_
             | 
             | This makes me curious. What things can people talk about
             | with emojis that they can't talk about in a proper
             | language?
        
           | jhanschoo wrote:
           | What an overreaction. You'll find the proliferation of emojis
           | distributed appropriately according to the genre of writing.
           | For example, you'll still hardly see emoji in newswriting
           | where they don't have much to add to the semantic content,
           | but you already see it liberally used in places where
           | stickers and drawings are already expected: e.g. in edited
           | Instagram photos.
        
       | an_opabinia wrote:
       | The ascendency of the CJK market, followed by Google Chrome,
       | paved the way for UTF-8 everywhere.
       | 
       | The more interesting thing is why basically no one uses Eastern
       | ideograms in the West, except maybe the Korean ideogram for
       | crying (yuyu) and rarely, other kaomoji-like stuff. Some kanji
       | also tell visual stories, and most children learn them just fine,
       | so it's not as simple as accessibility. Borrowing kanji was also
       | anticipated by many sci fi writers and yet is not to be.
        
         | nneonneo wrote:
         | Out of curiosity, which Korean ideograph would that be? Korean
         | doesn't use ideograms (much) anymore; they use an alphabet
         | packed into syllabic blocks.
         | 
         | The character I can think of that kind of matches the
         | description is Jiong , which is a Chinese character.
        
           | kevin_thibedeau wrote:
           | It's a jamo component used to compose full hangul characters.
           | 
           | https://en.wikipedia.org/wiki/List_of_Hangul_jamo
        
           | masklinn wrote:
           | > Out of curiosity, which Korean ideograph would that be?
           | 
           | Yu. Having two of them looks like a crying face. Although tha
           | (th) is also a common component of crying face (th_th).
           | They're talking about kaomoji which use various non-latin or
           | fullwidth symbols (though you're right that they're largely
           | _not_ ideograms) to compose pretty extensive  "smileys" e.g.
           | the look of disapproval uses kannada, denko uses greek and
           | katakana, ...
        
             | mattnewton wrote:
             | Shameless plug a( deg [?]? deg)a
             | 
             | The Gboard keyboard on android has a tab for many of these
             | common "emoticon" faces / character sequences. If you open
             | the emoji picker on the keyboard and then tap the far right
             | bottom tab icon ":-)"
             | 
             | They can get very elaborate though, these are just very
             | basic common faces.
        
               | masklinn wrote:
               | > The Gboard keyboard on android has a tab for many of
               | these common "emoticon" faces / character sequences. If
               | you open the emoji picker on the keyboard and then tap
               | the far right bottom tab icon ":-)"
               | 
               | iOS also has that on the standard Japanese "Kana"
               | keyboard (and possibly others), under the "^_^" key.
        
               | reificator wrote:
               | Windows has this as well, just hit Windows + ; and go to
               | the ;-) tab.
        
         | jandrese wrote:
         | I'm a little sad that the cute Japanese [quote characters]
         | have not gotten traction. I'd love to be able to use those in
         | code.
        
           | jrochkind1 wrote:
           | Huh, I didn't know about those. I've been using euro-style <<
           | and >> though, to be able to copy-paste things that already
           | include " and ', and still delimit what I am quoting.
        
             | throw0101a wrote:
             | > _I 've been using euro-style << and >> though, to be able
             | to copy-paste things that already include " and ', and
             | still delimit what I am quoting._
             | 
             | That would really be handy on the CLI instead of doing a
             | bunch of escaping with backslashes.
        
               | SahAssar wrote:
               | That's just pushing the problem one level down, no?
        
             | Freak_NL wrote:
             | Guillemets are used in many languages like <<this>>, but
             | 'euro-style' is a bit of a misnomer. They are used all over
             | the world, and in many European languages different pairs
             | are used, such as guillemets the >>other<< way around, and
             | ,,this" matching pair.
        
               | skipnup wrote:
               | At least in Germany the closing quotation mark is the
               | other way around like ,,this"
        
               | [deleted]
        
               | microtherion wrote:
               | I believe it's ,,this", actually (U-201E to start, U-201C
               | to end), but the distinction between all those quotation
               | marks is hellishly difficult, and I bet native speakers
               | get it wrong all the time.
               | 
               | Once upon a time, I wanted to rely on these distinctions
               | in a TTS frontend to distinguish between 5" floppy disk
               | and "Mambo No. 5"
               | 
               | I soon realized that people use quotation marks and
               | dashes in such a random manner that insisting on treating
               | the semantics literally would create more confusion than
               | it would resolve.
        
               | bloak wrote:
               | See this rather nice map:
               | 
               | https://jakubmarian.com/map-of-quotation-marks-in-
               | european-l...
               | 
               | I like to call <<this>> the Swiss system, because in
               | Switzerland they use it for four different official
               | languages.
        
           | andrewl-hn wrote:
           | First of all, they ARE getting traction. Many Youtubers and
           | Twitch streamers started to use them in stream / video
           | titles. I haven't seen corner barackets at all ten years ago,
           | and these days I see them in use at least once a week.
           | 
           | Some programming languages also start adopting them, too.
           | Raku is the one I know (it allows French and German quotes,
           | too). Maybe Julia, too? I think some language communities
           | tend to be more open to widespread Unicode usage in source
           | code than others.
        
           | Isthatablackgsd wrote:
           | Those are called Corner Bracket. It took me a while to find
           | out that I have to have CJK font installed in my computer to
           | use the corner bracket. And the file size of CJK font family
           | are huge! More than 100MB.
        
             | boogies wrote:
             | I see them and the only CJK font on my PC is GNU Unifont,
             | which is only ~12MB for the TTF version IIRC, and smaller
             | for other formats.
        
               | Isthatablackgsd wrote:
               | Oh GNU Unifont is new for me, thanks for sharing that
               | information. I used other source for CJK (the one that
               | are 100MB)to ensure that I have every single possible
               | uncommon character/glyph installed without chasing for
               | more fonts. One source that have it all in one file. I
               | discovered this because other sources don't always have a
               | full set.
        
         | josefx wrote:
         | > followed by Google Chrome
         | 
         | I will bite, wtf has Chrome to do with UTF-8? As far as I can
         | find the last browser to struggle with it was IE5, IE6 was
         | released almost a decade before Chrome was a thing.
        
         | SpicyLemonZest wrote:
         | It's accessibility in the sense that computer input methods
         | popular in the West can't generate them. As far as I know,
         | there's no way to get my computer or phone keyboards to produce
         | Shui  without switching to one of the CJK input modes.
        
           | nneonneo wrote:
           | On Mac, at least, the "Emoji keyboard" accessible through
           | Cmd+Ctrl+Space in all standard text controls makes it
           | possible to add basically any character in Unicode if you
           | know its name. For example, you can type "water" to get [?]
           | (along with other characters, like the water droplet emojis).
           | I use this often to type the Greek beta symbol, for example.
        
           | layoutIfNeeded wrote:
           | On Windows you can use Alt + numpad keys for entering the
           | character code. https://en.m.wikipedia.org/wiki/Alt_code
        
       | whateveracct wrote:
       | And yet HN won't let me use them
        
         | [deleted]
        
         | grawprog wrote:
         | I'm glad about that. I find it weird reading through comment
         | threads or forums and seeing mobile phone emojis scattered
         | through. I find them distracting.
         | 
         | I'm not really too sure why. I don't mind them in personal
         | messages or texts and stuff, but seeing them on public pages
         | just kind of annoys me for some reason.
        
           | masklinn wrote:
           | > I'm glad about that.
           | 
           | I'm not because it's completely arbitrary about it e.g. you
           | can include , , [?], , box drawing, or Za[?][?][?]lg[?]o but
           | not trigrams, die faces, box elements, musical notes or
           | flags. They just whitelisted/blacklisted entire blocks and
           | called it a day.
           | 
           | Which obviously is par for the course when it comes to HN's
           | comment box, the markup system is even more half-assed.
        
             | tzs wrote:
             | I wish they would add U+2009 (thin space) to that list.
             | That's the standard way under the SI system to separate
             | digit groups, e.g., 1 234 567. HN just treats it as a
             | regular space.
             | 
             | (The SI standard for separating the integer part from the
             | fractional part is to use "." or ",", whichever is
             | customary in your location. Using thin space for grouping
             | removes the ambiguity that you get in places that use one
             | of "."/"," for grouping and the other for a decimal point).
        
               | jrochkind1 wrote:
               | I wonder if that's putting things through unicode
               | "canonical normalization", or than custom rules.
               | 
               | Let's see what it does with `U+00BC Vulgar Fraction One
               | Quarter Unicode Character`... 1/4
               | 
               | Nope it allows it instead of turning into `1/4`, so
               | that's not canonical normalization. I guess it's custom
               | rules? Or some other unicode transformation we're not
               | thinking of, or other third-party re-usable
               | transformation.
        
               | masklinn wrote:
               | > I guess it's custom rules? Or some other unicode
               | transformation we're not thinking of, or other third-
               | party re-usable transformation.
               | 
               | They just blacklisted (or whitelisted) blocks or
               | categories.
        
               | jrochkind1 wrote:
               | Converting a U+2009 THIN SPACE into an ordinary ascii
               | space is not black/whitelisting.
        
               | masklinn wrote:
               | True, there's almost certainly a whitespace normalisation
               | pass at one point as well, likely during / around the
               | processing of what little makup HN has.
        
             | grawprog wrote:
             | Sounds like they blacklisted things likely to clutter up
             | the comment threads and left things unlikely to be used.
             | 
             | Country flags seem like they could be used for political
             | trolling.
             | 
             | Die faces could lead to weird rolling threads or other
             | things.
             | 
             | Musical notes, you got me, can't really think of anything
             | too bad for those.
             | 
             | The markup's not great, but too much formatting is
             | distracting. I personally prefer the limited options. You
             | focus more on the content of your comment than making it
             | look pretty.
             | 
             | The only thing i really despise about hn's formatting is
             | the code blocks or whatever they are, the one on mobile
             | that vanishes off the side and you have to scroll
             | horizontally to read everything. I really can't stand when
             | people use those for quotes.
             | 
             | Other than that though, hn's formatting makes everything
             | uniform and fairly easy to read through. There's no fancy
             | nonsense getting in the way of things.
             | 
             | Actually, that's part of why those code block things piss
             | me off, they're probably the fanciest piece of formatting
             | you can do and all it does is obstruct information and make
             | me waste time while reading.
        
               | masklinn wrote:
               | > Sounds like they blacklisted things likely to clutter
               | up the comment threads and left things unlikely to be
               | used.
               | 
               | That's not really believable given how arbitrary it is.
               | 
               | > Die faces could lead to weird rolling threads or other
               | things.
               | 
               | As if tiles or playing cards could not be used that way.
               | 
               | > The markup's not great, but too much formatting is
               | distracting.
               | 
               | The problem is that despite having only two directives
               | half of HN's markup is actively detrimental: because
               | there is no escaping, no inline literals, and the parsing
               | is sub-par, in my experience the "emphasis" directive
               | causes issues more often than it helps. HN's markup would
               | be significantly improved by removing it entirely.
               | 
               | > I really can't stand when people use those for quotes
               | 
               | Which would be way less likely if HN actually supported
               | quotes.
        
               | grawprog wrote:
               | >> I really can't stand when people use those for quotes
               | 
               | >Which would be way less likely if HN actually supported
               | quotes.
               | 
               | But look how well this works ;p.
               | 
               | Sorry...couldn't resist.
               | 
               | I dunno, I like the 'hackish' nature of it.
               | 
               | You're right i'm sure the tiles or playing cards could be
               | used like that too, it may be arbitrary, I don't know.
               | But, those were just some reasons off the top of my head,
               | i'm sure when HN was being programmed a bit more thought
               | went into it, or maybe not, who knows?
               | 
               | My main point is, I like the simplicity of it all, sure
               | it could be better, but better doesn't necessarily lead
               | to better quality content.
               | 
               | There's a minimum amount of distractions, most users find
               | reasonable ways to communicate the context of the content
               | of their posts and scrolling through most threads tends
               | to be a mostly uniform experience where if users are
               | following a few established conventions, you can follow
               | the flow of things pretty well.
               | 
               | It's not perfect, it's not the best, but I feel like it
               | fits the general vibe and nature of the site. It gives HN
               | an identity among all the other news aggregators and
               | forums.
        
           | hprotagonist wrote:
           | how about very carefully rebased commit histories?
        
             | grawprog wrote:
             | I have to admit, i've never actually read through any
             | commit histories with emojis in them...
             | 
             | Don't get me wrong, i'm not going to get mad or lose my
             | mind or anything when I see an emoji somewhere, it just I
             | dunno it looks wrong or something.
        
               | mainstreem wrote:
               | I've seen interesting commit strategies prepending a
               | different emoji for, e.g., feature/bug changes.
        
               | masklinn wrote:
               | Switch your system / default font to B&W if you don't
               | like the colorised emoji.
        
             | Freak_NL wrote:
             | Oddly enough nobody in my company uses emoji in commit
             | messages, even though we have no policy that prevents it.
             | It just doesn't make sense there.
             | 
             | I see it on public repositories sometimes, but it never
             | really seems to add anything useful.
        
               | jefftk wrote:
               | The amp project uses them a lot, and has a system where
               | different kinds of commits get different leading emoji:
               | https://github.com/ampproject/amphtml/commits/master
               | 
               | I'm used to them at this point, and it's kind of nice
               | when scanning commits to be able to see what type they
               | are.
        
           | whateveracct wrote:
           | Imagine how much more expressive my comment would've been if
           | HN didn't strip the emoji [1] I had at the end though
           | 
           | [1] https://emojipedia.org/pensive-face/
        
             | oauea wrote:
             | It would make me instantly disregard your comment as
             | immature
        
               | whateveracct wrote:
               | hm that feels more like a problem you have than one
               | inherent to my comment tho
        
       ___________________________________________________________________
       (page generated 2020-11-17 23:00 UTC)