[HN Gopher] Emojis paved the way for UTF-8 everywhere ___________________________________________________________________ Emojis paved the way for UTF-8 everywhere Author : velmu Score : 152 points Date : 2020-11-17 15:39 UTC (7 hours ago) (HTM) web link (developers.ibexa.co) (TXT) w3m dump (developers.ibexa.co) | ChrisArchitect wrote: | didn't seem like it was emojis paving the path but web-based | email and internationalization of websites. Just the whole move | to web in various key areas like email meant it became just less | of a hair-pulling nightmare for developers to have to deal with | encoding between countries and platforms. Throw in the dawn of | smartphones (and emojis came along with that yes) and that was | more problems on top of that, people moving between | desktop/mobile/web etc. UTF-8 took care of alot of the headache. | Andrew_nenakhov wrote: | In russian segment of internet, various cyrillic encodings (win, | dos, mac, koi-8) were a huge problem, and only UTF-8 finally | solved it long before emojis became a thing. | | It is known that developers from English-speaking countries are | generally oblivious to encoding problems. Probably they could get | by on ASCII far longer than the rest of the world, so no wonder | that they might confuse cause and effect in this case. | hprotagonist wrote: | I am 100% convinced, tangentially, that mobile OS point releases | use emoji as user bait to have a stronger guarantee of regular | security updates. | | "update now to get access to :burrito: and :taco:! (also fix the | following 12 CVEs that 90% of our userbase doesn't know about or | read)" | xxpor wrote: | If this was ever a conscious decision, it was genuinely | brilliant. Much better to have a carrot rather than the stick | of "you'll get hacked (except probably not)" | black_puppydog wrote: | You mean much better to have a :carrot: ? :slight_smile: | | In any case, this must be the most ridiculous carrot yet. :D | pen2l wrote: | As a millennial whose only chatting activities are confined | to irc and doesn't really get emojis -- could someone | please articulate this phenomenon in terms that would be | meaningful to me? | | I've seen sentences sometimes where words are actually | replaced with emojis... is this how some subset of people | actually communicate online or that's just for some effect | of irony? | eat_veggies wrote: | It can be a way of adding intonation and other affective, | out-of-band communication back into text. I never really | see people use them purely to _replace_ words one-for- | one. But interspersed throughout a message, emojis can, | as you say, add irony, but also communicate that a | message that could be taken as ironic _isn 't_, or | communicate some other subtext. | | Text is a flattening of speech, and emojis can add some | of those missing dimensions back -- and, like our IRL | verbal cues, tics, and gestures, they can be hard to | decode if you're not "in" on the game. | zck wrote: | > I never really see people use them purely to replace | words one-for-one. | | This article does that. Here it might be more often than | the author would normally do, but I've seen things like | that non-ironically. | | > To stay relevant in the age of social media you had to | support emoji or you were in the . | | EDIT: HN stripped the emoji. The end of that sentence was | "or you were :skull: in the :droplet:", read as "or you | were dead in the water". | lovegoblin wrote: | > Here it might be more often than the author would | normally do, but I've seen things like that non- | ironically. | | Maybe not ironically, but doing that definitely adds | _informality_ - exactly the kind of out-of-band context | that GP is talking about. | ziml77 wrote: | I've only seen this on Twitter and assumed it was to fit | posts within the character limit. | 542458 wrote: | IMHO it's a few things: | | It's an ingroup signal, indicating (by "correctly" using | emoji) that you're part of a specific subculture to the | recipient. "Praying_emojii flame_emojii YASS" indicates | to me that the writer is young and hip. "Do you want to | have a BBQ bbq_emojii ?" says to me that the writer is | older and less hip. | | It can be hard to communicate tone through writing. | Emojis allow one to instantly mark a piece of writing as | informal/non-serious with minimal effort. This includes | irony - "eggplant_emojii" is often a non-serious reply | indicating that "I am jokingly acting like this this is | sexual or attractive" | | It's a proxy for longer writing. "Thumbsup_emoji" is a | substitute for some marginally harder to articulate | feelings of "looks good / I like it" | | Of course, there are many subcultures that use emoji in | different ways and as proxies for other things as well. | At a previous employer we'd often just send "taco_emoji?" | to ask who was buying lunch. It's the sort of thing that | can be used/abused in many different ways. | hprotagonist wrote: | ironic effect is, of course, communication all by itself. | | My use of emoji in messaging applications is primarily | limited to quick rebus-like reaction replies to meme | images, a quick and dirty reaction to a message in slack | expressing some vague emotional response, or making | complicated fart-jokes with my partner that rely on a lot | of out of band information. | | (i also use IRC on the daily, and have been known to use | emoji there too, so these communication forms are not | disjoint) | lovegoblin wrote: | I really strongly recommend 'Because Internet: | Understanding the New Rules of Language' by Grechen | McCulloch. It's a lighthearted linguistic look at how our | written communication has changed over the last couple | decades since the advent of mainstream internet access. | capableweb wrote: | :burrito: and :taco: are not emojis as such though, they are | more alike good old "smileys" that turns characters into | images. Supporting emojis would be to support the UTS#51 | https://www.unicode.org/reports/tr51/ natively. | [deleted] | hprotagonist wrote: | that's what i meant, but i was using the shortcode in my | message here. | | "add a glyph to the emoji keyboard" is more precise. | elFarto wrote: | I'm not sure HN will even allow emojis: | | Burrito: Taco: | | _edit_ No, it stripped those : '(. | madeofpalk wrote: | wait they remove emojis from comments? | airstrike wrote: | Not _every_ emoji [?] | deathanatos wrote: | I'm _guessing_ that that 's b/c that's one of those that | is inside the BMP? | seba_dos1 wrote: | Just as they remove plenty of other Unicode characters | here. | [deleted] | [deleted] | gumby wrote: | What an insightful observation! | | It has a lot of follow on implications too. | skocznymroczny wrote: | Can I have an update that brings back pistols instead of water | guns? | samatman wrote: | Easily+, for yourself. | | Emoji are just fonts, and with some search engine sleuthing, | you can find an OG pistol emoji out there, pop open your | system emoji font in an editor, and replace the water pistol | with a pistol pistol. | | Everyone else will still see the Nerfed version, of course. | | +For some value of easy | Andrew_nenakhov wrote: | But but water guns protect you from launching a mass-shooting | attack! Don't expose yourself to such hateful symbols! | Wowfunhappy wrote: | On my Mac, I spent an evening figuring out how Apple's emoji | picker works and backporting all the emoji's instead. But I'm | not entirely sure what this says about me. | | https://forums.macrumors.com/threads/updating-maverickss-emo... | clon wrote: | With 10%, I believe you are overguesstimating the share of | users that know/care about CVE-s. | Sargos wrote: | That was his point. Users don't know/care about CVEs but love | want/want the new emojis. Everyone wins. | CodesInChaos wrote: | Emoji probably contributed to widespread support of supplemental | planes (fixing systems which treated UTF-16 as UCS-2), but I | doubt they contributed much to UTF-8's popularity. | masklinn wrote: | > fixing systems which treated UTF-16 as UCS-2 | | Or treated UTF8 as the nonsense that MySQL's utf8 is (it's | 3-bytes utf8 aka only the BMP, and silently drops anything from | the first non-BMP codepoint). | mfontani wrote: | Indeed. To ensure MySQL stores "real" UTF-8 one has to use | "utf8mb4" instead of "utf8", which just rolls off the tongue | and backwards (in)compatibility seems to be the reason why | one can't just DWIM things backwards... "utf8mb4 or bust" it | is, then! | FullyFunctional wrote: | "And it's amusing to see Apple using new emojis as a carrot to | get people to install the latest security patches." | | OMG, that makes so much sense. I was the opposite, grumbling | about not caring about that silliness, not realizing the | psychology of things. | | Having suffered through the dark ages with Microsoft increasingly | ruining the world with an endless stream of proprietary crap (I | still hate them for making people think tab width is | configurable), it's amazing to step back and witness how much | things have improved (on this narrow slice). | DangerousPie wrote: | Can confirm, at least anecdotally. As someone who runs a website | but doesn't have nearly enough time to do everything he wants to | do, upgrading to UTF-8 (and specifically utf8mb4) was never a | priority - until my users starting using emojis and breaking | things left, right and center. | kens wrote: | An entertaining article, but it's not historically accurate. If | you look at measured usage, UTF-8 took off around 2005 and was | the dominant web encoding by 2008. Emojis weren't added to | Unicode until 2010, at which point UTF-8 usage continued to | increase at exactly the same rate as before. | | https://en.wikipedia.org/wiki/UTF-8#/media/File:Utf8webgrowt... | mpol wrote: | Hmm, in MySQL land you have utf8, which means utf8mb3, and | utf8mb4. Only the latter supports Emoji. It is only in the last | releases that utf8mb4 is supported and also the default | character set. | | I work with WordPress a lot, and up to 3 years ago it was quite | common for MySQL setups at shared hosting providers to only | support utf8mb3. And Emoji support really did help here to move | it forward. | redisman wrote: | Does anyone know the real reason? Languages upgraded string to | default to utf8? Browsers changed their defaults? | crazygringo wrote: | Yes, very misleading clickbait title. | | I was expecting some actual insightful anecdote about a pivotal | choice by a major tech company that made all the difference. | But nothing at all. | Freak_NL wrote: | Even more specifically, emoji paved the way for proper support of | Unicode characters from beyond the Basic Multilingual Plane | (BMP). | | There are 16 of these planes. This first block of 65,536 | characters is what you can encode with only two bytes (e.g. | UTF-16), and it includes most of what anyone alive needs to | encode their languages adequately enough. For a long while | anything encoded beyond this block had only limited support, and | plenty of bugs and limitations meant that using it was tricky | (well, it worked fine in LaTeX of course; via xelatex for | example). This was back in 2008/2009. | | Characters encoded beyond the BMP in plane 1 and 2 included | things like esoteric CJKV additions (East Asian ideographs) not | usually in daily use, but part of historic documents. | | Then came the emoji additions (a core set is part of the BMP and | came from Japanese telecom standards), and support is now | ubiquitous. Using UTF-8 is a no-brainer for most applications, | and a good things that is too! | chrisseaton wrote: | > you can encode with only two bytes (e.g. UTF-16) | | UTF-16 is _variable width_ , not two bytes, and it can encode | any Unicode character. | toxik wrote: | OP probably meant UCS-2. | jesuscyborg wrote: | The historic planes beyond the basic multilingual plane are | usually referred to as the "astral planes" which includes | things like gothic, runes, alchemy, egyptian, and emoji | https://justine.storage.googleapis.com/astralplanes.txt | derefr wrote: | And the etymology of this being that Dungeons and Dragons has | a "Prime Material Plane" and an "Astral Plane", where the | Astral Plane connects the PMP to various "Outer Planes" made | of ridiculous not-oft-encountered stuff. | | But whoever came up with this cute analogy got the analogy | _wrong_ -- the higher Unicode planes are analogous to the | "outer planes" themselves; while the "astral plane" would be | some sort of glue allowing you to access these outer planes | from _within_ the BMP. Like... surrogate-pair characters! One | could nickname the reserved surrogate-pair range in the BMP, | the "astral projection" range ;) | kens wrote: | "Astral plane" predates Dungeons and Dragons by centuries. | Looking at old discussions, I couldn't find any evidence | that Unicode's usage is connected with D&D. | | Early discussion of "astral character" or "astral plane" | for the Unicode supplementary planes at: | https://unicode.org/mail-arch/unicode-ml/Archives- | Old/UML024... Even earlier 1998 use: | https://www.unicode.org/L2/L1998/98354.pdf | Sniffnoy wrote: | The term "astral plane" is older than D&D, and I would | assume they took it from the more general usage, not the | specific usage in D&D. | https://en.wikipedia.org/wiki/Astral_plane | hinkley wrote: | UTF-8 is simple enough to implement and yet I've seen it done | improperly more than once. | | The problem with UTF-8 is that the density is really good for | North America and Western Europe but drops off quite a bit for | other languages, and you have to trade CPU for bandwidth (eg, | gzip) to do much about it. | | Japan has several encodings (though shiftJIS is the only one | that I can recall) that use escape characters to switch code | pages. As long as you don't switch too rapidly between kanji | and borrow words, it's more compact, but more complex to | implement (I would say less so than implementing gzip but if | you aren't using zlib, one of the most portable libraries in | existence, you have much bigger issues than character | encoding). | | UTF-8 takes 3 bytes for all of the first block. Only the first | 2048 characters fit into 2 bytes, which is mostly European | languages. | [deleted] | Freak_NL wrote: | Outside of embedded software this really isn't that much of a | problem any more. | | Taking a random Wikipedia page as sample I get 46kB (UTF-8) | versus 35kB (Shift-JIS). A random Japanese text from Project | Gutenberg is roughly 2/3 of the size of the UTF-8 text in | Shift-JIS. | | Those are impressive enough numbers, but add just a single | photograph to the Wikipedia page and it doesn't matter at | all. Text is just pretty efficient, even if you use an | encoding that supports every language in the world. | crazygringo wrote: | First, that's because European languages have small | alphabets. It's not like Chinese or Japanese with their many | thousands of characters could have fit in those 2,048 spots | _anyways_. So it makes sense to allocate the small common | alphabets there. | | Second, text is so comparatively tiny relative to photos, | video, code, etc. that it really doesn't matter at all | anyways. | | Third, text is often zipped _as well_. It 's often zipped | over HTTP. It's zipped when it sits inside of an EPUB. It's | zipped when it sits inside a Word document. You can even | configure MySQL to zip text fields in a database. Basically, | whenever space _is_ an issue, you can fix it. | | So it's hard to see how this is any problem in practice at | all, when phones and computers mostly ship with 32 GB of SSD | minimum. | bawolff wrote: | You're mixing up ucs-2 and utf-16. | Robin_Message wrote: | To expand on this comment, UCS-2 defines a fixed-length, | 2-byte encoding of Unicode. It can therefore only represent | the first 65536 characters in the Basic Multilingual Plane | (BMP). | | UTF-16 allows representing characters outside of the BMP by | using a reserved area to split a single codepoint into two | surrogates that form a pair. | | This makes UTF-16 complicated and in some ways worse than | UTF-8: the encoding is longer for many typical texts, but is | still not fixed-width. The bug you typically see is that | codepoints outside of the BMP are munged when clipping the | text to a certain length (or reversing it, but that doesn't | happen in real systems generally.) | seba_dos1 wrote: | The reason why some older mobile phones struggle with SMS | containing emojis instead of just displaying tofus in place | of unsupported characters is that there's no way to send | emojis in accordance to SMS standard - it defines the | encoding to be UCS-2. In order to put emojis in SMS, newer | phones send the messages as UTF-16 instead, technically | violating the standard, which can break some parsers that | only expect UCS-2 to be there. | [deleted] | a1369209993 wrote: | Nitpick: UCS-2 actually isn't fixed-length either, eg "x" | (small x+umlaut+ring above) is two code units (1E8D 030A) | or possibly three (0078 0308 030A). | ygra wrote: | UCS-2 uses a fixed number of (16-bit) code units to | represent a Unicode scalar value (code point). Of course, | to represent a grapheme cluster, more than one code point | may be needed, but that's true of Unicode in general. | a1369209993 wrote: | > that's true of Unicode in general. | | Yes, that was rather my point: if you're using a Unicode- | based character encoding, you're going to have variable- | width characters regardless, so you might as well use | UTF-8. | | > UCS-2 uses a fixed number of (16-bit) code units to | represent a Unicode scalar value (code point). | | Sure, but that's a implementaion detail of the mapping | from characters (at the application level) to bytes (at | the physical(-ish) representation level). | ucarion wrote: | To that point, what are systems supposed to make of UTF-8 | strings encoding codepoints in the surrogate pair range? Is | that well-defined? | | In other words, to what extent are surrogate pairs a UTF-16 | thing, rather than a Unicode thing that exists to | accommodate for UCS-2 -> UTF-16? | 1996 wrote: | Yes. We shouldn't dismiss the weight of non-technical people | voting with their dollars. | | They may not care about i18n, but they do care about cute | emoticons. | | So we get not just unicode support everywhere, but also character | pickers inside the keyboard. | | And now we also get to benefit from unicode for things we find | pretty - for example, the famous powerline | https://github.com/powerline/powerline | | Information density matters, and I can't wait for someone to | replace "old" color coding of files (.dircolors in batch) by 1 | emoticon : a music note for music files, etc. | jcims wrote: | Now we just need emoji's for ip addresses so we can move to ipv6 | djxfade wrote: | There's no place like 127*0*0*1 | | Edit: I didn't realize HN doesn't support emojis | smrtinsert wrote: | Homesite, I miss that program. Those were the days. | kaetemi wrote: | Not only helped to improve support for UTF-8, but also for those | pesky characters that take multiple codepoints... | masklinn wrote: | Skin tone modifiers & other composites arrived later to force | supporting those properly. | disown wrote: | Pretty sure the dominance of ASCII on the internet and the | efficieny/compatability of UTF-8 in relation to ASCII paved the | way for UTF-8 everywhere. It is the standard unicode encoding of | the internet. | | If anything, I would say the UTF-8 paved the way for emojis, not | the other way around as the ubiquity of a unicode encoding | allowed for the existence of emojis. Can't encode emojies with | ASCII. You have to have unicode and its encoding first before you | can have emojis. | nradov wrote: | It's interesting to watch the evolution of written language in | action. I expect in 20 years we will routinely see emojis in | written English novels and news articles. In 50 years we'll see | them in textbooks and scientific journal articles. | AnIdiotOnTheNet wrote: | Yet another reason I'm glad for the inevitability of death. | | I can't be the only one who thinks emoji are a terrible idea. | Granted, I also don't think logographic characters are a good | idea but at least they have thousands of years of use and | agreed upon semantic meaning behind them. | szhu wrote: | If everything you care to talk about can be easily described | using thousand-year-old ideas, then I can see why you are | against emojis. But this isn't true for many things people | want to talk about today. | | Language is just an encoding for ideas, and emojis are a new | compression algorithm. Using a single character, you can now | convey certain thoughts and sentiments that you previously | needed many more characters to reference or explain. | | "So then just explain it!" some might respond. "Why can't | people be bothered to spend even a little time to write down | what they think?" It's an accessibility issue. People have | limited time every day to get their ideas across, and they | deserve ways of conveying their ideas concisely. There is | precedent for this too -- this is why we have acronyms and | new words. "lol" and "minivan" don't have thousands of years | of agreed-upon semantic meaning behind them. | | A final thought -- whether you think emojis are a terrible | idea might not be relevant to whether they should exist. | Letting people live their own lives to the fullest is much | more important than making sure you, I, a future historian, | or any other third party understands what they are saying. | But you don't have to worry about not being to understand | conversations. Given what you prefer, if someone wants to | address you as a target audience, then they probably won't | use emojis. | reaperducer wrote: | _this isn 't true for many things people want to talk about | today_ | | This makes me curious. What things can people talk about | with emojis that they can't talk about in a proper | language? | jhanschoo wrote: | What an overreaction. You'll find the proliferation of emojis | distributed appropriately according to the genre of writing. | For example, you'll still hardly see emoji in newswriting | where they don't have much to add to the semantic content, | but you already see it liberally used in places where | stickers and drawings are already expected: e.g. in edited | Instagram photos. | an_opabinia wrote: | The ascendency of the CJK market, followed by Google Chrome, | paved the way for UTF-8 everywhere. | | The more interesting thing is why basically no one uses Eastern | ideograms in the West, except maybe the Korean ideogram for | crying (yuyu) and rarely, other kaomoji-like stuff. Some kanji | also tell visual stories, and most children learn them just fine, | so it's not as simple as accessibility. Borrowing kanji was also | anticipated by many sci fi writers and yet is not to be. | nneonneo wrote: | Out of curiosity, which Korean ideograph would that be? Korean | doesn't use ideograms (much) anymore; they use an alphabet | packed into syllabic blocks. | | The character I can think of that kind of matches the | description is Jiong , which is a Chinese character. | kevin_thibedeau wrote: | It's a jamo component used to compose full hangul characters. | | https://en.wikipedia.org/wiki/List_of_Hangul_jamo | masklinn wrote: | > Out of curiosity, which Korean ideograph would that be? | | Yu. Having two of them looks like a crying face. Although tha | (th) is also a common component of crying face (th_th). | They're talking about kaomoji which use various non-latin or | fullwidth symbols (though you're right that they're largely | _not_ ideograms) to compose pretty extensive "smileys" e.g. | the look of disapproval uses kannada, denko uses greek and | katakana, ... | mattnewton wrote: | Shameless plug a( deg [?]? deg)a | | The Gboard keyboard on android has a tab for many of these | common "emoticon" faces / character sequences. If you open | the emoji picker on the keyboard and then tap the far right | bottom tab icon ":-)" | | They can get very elaborate though, these are just very | basic common faces. | masklinn wrote: | > The Gboard keyboard on android has a tab for many of | these common "emoticon" faces / character sequences. If | you open the emoji picker on the keyboard and then tap | the far right bottom tab icon ":-)" | | iOS also has that on the standard Japanese "Kana" | keyboard (and possibly others), under the "^_^" key. | reificator wrote: | Windows has this as well, just hit Windows + ; and go to | the ;-) tab. | jandrese wrote: | I'm a little sad that the cute Japanese [quote characters] | have not gotten traction. I'd love to be able to use those in | code. | jrochkind1 wrote: | Huh, I didn't know about those. I've been using euro-style << | and >> though, to be able to copy-paste things that already | include " and ', and still delimit what I am quoting. | throw0101a wrote: | > _I 've been using euro-style << and >> though, to be able | to copy-paste things that already include " and ', and | still delimit what I am quoting._ | | That would really be handy on the CLI instead of doing a | bunch of escaping with backslashes. | SahAssar wrote: | That's just pushing the problem one level down, no? | Freak_NL wrote: | Guillemets are used in many languages like <<this>>, but | 'euro-style' is a bit of a misnomer. They are used all over | the world, and in many European languages different pairs | are used, such as guillemets the >>other<< way around, and | ,,this" matching pair. | skipnup wrote: | At least in Germany the closing quotation mark is the | other way around like ,,this" | [deleted] | microtherion wrote: | I believe it's ,,this", actually (U-201E to start, U-201C | to end), but the distinction between all those quotation | marks is hellishly difficult, and I bet native speakers | get it wrong all the time. | | Once upon a time, I wanted to rely on these distinctions | in a TTS frontend to distinguish between 5" floppy disk | and "Mambo No. 5" | | I soon realized that people use quotation marks and | dashes in such a random manner that insisting on treating | the semantics literally would create more confusion than | it would resolve. | bloak wrote: | See this rather nice map: | | https://jakubmarian.com/map-of-quotation-marks-in- | european-l... | | I like to call <<this>> the Swiss system, because in | Switzerland they use it for four different official | languages. | andrewl-hn wrote: | First of all, they ARE getting traction. Many Youtubers and | Twitch streamers started to use them in stream / video | titles. I haven't seen corner barackets at all ten years ago, | and these days I see them in use at least once a week. | | Some programming languages also start adopting them, too. | Raku is the one I know (it allows French and German quotes, | too). Maybe Julia, too? I think some language communities | tend to be more open to widespread Unicode usage in source | code than others. | Isthatablackgsd wrote: | Those are called Corner Bracket. It took me a while to find | out that I have to have CJK font installed in my computer to | use the corner bracket. And the file size of CJK font family | are huge! More than 100MB. | boogies wrote: | I see them and the only CJK font on my PC is GNU Unifont, | which is only ~12MB for the TTF version IIRC, and smaller | for other formats. | Isthatablackgsd wrote: | Oh GNU Unifont is new for me, thanks for sharing that | information. I used other source for CJK (the one that | are 100MB)to ensure that I have every single possible | uncommon character/glyph installed without chasing for | more fonts. One source that have it all in one file. I | discovered this because other sources don't always have a | full set. | josefx wrote: | > followed by Google Chrome | | I will bite, wtf has Chrome to do with UTF-8? As far as I can | find the last browser to struggle with it was IE5, IE6 was | released almost a decade before Chrome was a thing. | SpicyLemonZest wrote: | It's accessibility in the sense that computer input methods | popular in the West can't generate them. As far as I know, | there's no way to get my computer or phone keyboards to produce | Shui without switching to one of the CJK input modes. | nneonneo wrote: | On Mac, at least, the "Emoji keyboard" accessible through | Cmd+Ctrl+Space in all standard text controls makes it | possible to add basically any character in Unicode if you | know its name. For example, you can type "water" to get [?] | (along with other characters, like the water droplet emojis). | I use this often to type the Greek beta symbol, for example. | layoutIfNeeded wrote: | On Windows you can use Alt + numpad keys for entering the | character code. https://en.m.wikipedia.org/wiki/Alt_code | whateveracct wrote: | And yet HN won't let me use them | [deleted] | grawprog wrote: | I'm glad about that. I find it weird reading through comment | threads or forums and seeing mobile phone emojis scattered | through. I find them distracting. | | I'm not really too sure why. I don't mind them in personal | messages or texts and stuff, but seeing them on public pages | just kind of annoys me for some reason. | masklinn wrote: | > I'm glad about that. | | I'm not because it's completely arbitrary about it e.g. you | can include , , [?], , box drawing, or Za[?][?][?]lg[?]o but | not trigrams, die faces, box elements, musical notes or | flags. They just whitelisted/blacklisted entire blocks and | called it a day. | | Which obviously is par for the course when it comes to HN's | comment box, the markup system is even more half-assed. | tzs wrote: | I wish they would add U+2009 (thin space) to that list. | That's the standard way under the SI system to separate | digit groups, e.g., 1 234 567. HN just treats it as a | regular space. | | (The SI standard for separating the integer part from the | fractional part is to use "." or ",", whichever is | customary in your location. Using thin space for grouping | removes the ambiguity that you get in places that use one | of "."/"," for grouping and the other for a decimal point). | jrochkind1 wrote: | I wonder if that's putting things through unicode | "canonical normalization", or than custom rules. | | Let's see what it does with `U+00BC Vulgar Fraction One | Quarter Unicode Character`... 1/4 | | Nope it allows it instead of turning into `1/4`, so | that's not canonical normalization. I guess it's custom | rules? Or some other unicode transformation we're not | thinking of, or other third-party re-usable | transformation. | masklinn wrote: | > I guess it's custom rules? Or some other unicode | transformation we're not thinking of, or other third- | party re-usable transformation. | | They just blacklisted (or whitelisted) blocks or | categories. | jrochkind1 wrote: | Converting a U+2009 THIN SPACE into an ordinary ascii | space is not black/whitelisting. | masklinn wrote: | True, there's almost certainly a whitespace normalisation | pass at one point as well, likely during / around the | processing of what little makup HN has. | grawprog wrote: | Sounds like they blacklisted things likely to clutter up | the comment threads and left things unlikely to be used. | | Country flags seem like they could be used for political | trolling. | | Die faces could lead to weird rolling threads or other | things. | | Musical notes, you got me, can't really think of anything | too bad for those. | | The markup's not great, but too much formatting is | distracting. I personally prefer the limited options. You | focus more on the content of your comment than making it | look pretty. | | The only thing i really despise about hn's formatting is | the code blocks or whatever they are, the one on mobile | that vanishes off the side and you have to scroll | horizontally to read everything. I really can't stand when | people use those for quotes. | | Other than that though, hn's formatting makes everything | uniform and fairly easy to read through. There's no fancy | nonsense getting in the way of things. | | Actually, that's part of why those code block things piss | me off, they're probably the fanciest piece of formatting | you can do and all it does is obstruct information and make | me waste time while reading. | masklinn wrote: | > Sounds like they blacklisted things likely to clutter | up the comment threads and left things unlikely to be | used. | | That's not really believable given how arbitrary it is. | | > Die faces could lead to weird rolling threads or other | things. | | As if tiles or playing cards could not be used that way. | | > The markup's not great, but too much formatting is | distracting. | | The problem is that despite having only two directives | half of HN's markup is actively detrimental: because | there is no escaping, no inline literals, and the parsing | is sub-par, in my experience the "emphasis" directive | causes issues more often than it helps. HN's markup would | be significantly improved by removing it entirely. | | > I really can't stand when people use those for quotes | | Which would be way less likely if HN actually supported | quotes. | grawprog wrote: | >> I really can't stand when people use those for quotes | | >Which would be way less likely if HN actually supported | quotes. | | But look how well this works ;p. | | Sorry...couldn't resist. | | I dunno, I like the 'hackish' nature of it. | | You're right i'm sure the tiles or playing cards could be | used like that too, it may be arbitrary, I don't know. | But, those were just some reasons off the top of my head, | i'm sure when HN was being programmed a bit more thought | went into it, or maybe not, who knows? | | My main point is, I like the simplicity of it all, sure | it could be better, but better doesn't necessarily lead | to better quality content. | | There's a minimum amount of distractions, most users find | reasonable ways to communicate the context of the content | of their posts and scrolling through most threads tends | to be a mostly uniform experience where if users are | following a few established conventions, you can follow | the flow of things pretty well. | | It's not perfect, it's not the best, but I feel like it | fits the general vibe and nature of the site. It gives HN | an identity among all the other news aggregators and | forums. | hprotagonist wrote: | how about very carefully rebased commit histories? | grawprog wrote: | I have to admit, i've never actually read through any | commit histories with emojis in them... | | Don't get me wrong, i'm not going to get mad or lose my | mind or anything when I see an emoji somewhere, it just I | dunno it looks wrong or something. | mainstreem wrote: | I've seen interesting commit strategies prepending a | different emoji for, e.g., feature/bug changes. | masklinn wrote: | Switch your system / default font to B&W if you don't | like the colorised emoji. | Freak_NL wrote: | Oddly enough nobody in my company uses emoji in commit | messages, even though we have no policy that prevents it. | It just doesn't make sense there. | | I see it on public repositories sometimes, but it never | really seems to add anything useful. | jefftk wrote: | The amp project uses them a lot, and has a system where | different kinds of commits get different leading emoji: | https://github.com/ampproject/amphtml/commits/master | | I'm used to them at this point, and it's kind of nice | when scanning commits to be able to see what type they | are. | whateveracct wrote: | Imagine how much more expressive my comment would've been if | HN didn't strip the emoji [1] I had at the end though | | [1] https://emojipedia.org/pensive-face/ | oauea wrote: | It would make me instantly disregard your comment as | immature | whateveracct wrote: | hm that feels more like a problem you have than one | inherent to my comment tho ___________________________________________________________________ (page generated 2020-11-17 23:00 UTC)