[HN Gopher] Unicode 15.0 Slide Show ___________________________________________________________________ Unicode 15.0 Slide Show Author : optimalsolver Score : 76 points Date : 2023-02-26 16:10 UTC (6 hours ago) (HTM) web link (www.babelstone.co.uk) (TXT) w3m dump (www.babelstone.co.uk) | paulirish wrote: | Perhaps a better match for some folks' expectations, the Unicode | Consortium's YouTube has plenty of talks on | https://youtube.com/@unicode. Low view counts but often quite | fascinating | raffy wrote: | I have tool which lets you scroll through all characters in a | large document. It also shows names, scripts, and IDNA-like | status. | | https://adraffy.github.io/ens-normalize.js/test/chars.html | pavlov wrote: | Wikipedia has the "no original research" rule. Unicode really | should have had a similar "no original designs" rule. | | Too late now -- it's become an annually refreshed collection of | fun fashionable clip art instead of an impartial repository of | humankind's symbols. | chrisshroba wrote: | Why shouldn't emoji be an annually refreshed collection of fun | fashionable clip art? That's exactly how users use emoji - for | culturally relevant references. As a user, I get excited to see | all the new emoji each release! | csande17 wrote: | The reason why this is a bad idea is that it's hard to remove | emoji when they go _out_ of fashion. You can 't just pull | characters out of the set without breaking the backwards | compatibility assumptions made by every programming language | that supports Unicode identifiers, and you can't change the | meaning of characters without a painful, politically | controversial collaboration between every large tech company | (see the pistol emoji). | | Even ZWJ sequences are forever: we're still stuck with the | "eye in speech bubble" emoji despite the fact that it's a | logo for a defunct anti-bullying campaign that doesn't even | have a working website anymore. | int_19h wrote: | Why does it matter, though? The symbols are already drawn, | and the code that handles the emoji range is already | written. As for "obsolete" emojis, regardless of the | original intent, they will be appropriated in no time if | some suitable use comes along; the official name in the | Unicode chart is merely a historical curiosity (and, | indeed, those names are wrong for many "legitimate" | characters, but can't be changed for the same back-compat | reasons). | Groxx wrote: | I think there's plenty of evidence that people use pictograms. | Just look at the millions of discord emoji and chat stickers. | In that respect, they're showing remarkable constraint in only | adding _twenty_. | | tbh my bigger worry is simply around implementation scope and | complexity - kinda like browsers, Unicode and text rendering is | perpetually getting harder to create new competitors / complete | fonts / etc. Emoji are rather small in number and simple to | implement though, if somewhat costly to include all those | images / complex svgs. | tokinonagare wrote: | Unicode needs a split to separate the serious work of encoding | existing writing systems and the politically correct meme | glyphs that become weirder by the day. | jurimasa wrote: | What makes a writing system such, and why emoji are not a | writing system? | Eisenstein wrote: | What makes a meme glyph politically correct and why are they | in unicode? | Name_Chawps wrote: | This split already exists. Emoji are kept in a separate block | from other characters. | Thing123456 wrote: | Here's the blog post for the Unicode 15 release: | https://blog.unicode.org/2022/09/announcing-unicode-standard... | | A tiny portion consists of "fun fashionable clip art." The vast | majority of the changes are new scripts and symbols that allow | people to correctly input existing texts. | avgcorrection wrote: | All threads need at least one irrelevant pet peeve-aside. | satvikpendem wrote: | So what? Unicode follows language and language is descriptivist | not prescriptivist. | legrande wrote: | > it's become an annually refreshed collection of fun | fashionable clip art instead of an impartial repository of | humankind's symbols | | Constraints are sometimes good. It's great to see how something | like the Eggplant & Peach emoji got hijacked and used as a | sexual reference. | photochemsyn wrote: | You can hire a ghostwriter to write a biography about yourself | and pay a publishing house for a limited edition print run and | voila, you can have your own wikipedia page based on an | authorized source. | | Wikipedia's view of what's 'original research' has alway been a | bit murky, on top of that. Are the primary documents that | historians rely on acceptable as sources, or only if they've | been synthesized by an 'accredited historian' (whatever that | is) in book or research journal form? If publishing original | research is wrong, why publish a journalist's original research | which is incorporated into a newspaper article? What about | original research published as a blog post, is that now an | acceptable secondary source? | skissane wrote: | > You can hire a ghostwriter to write a biography about | yourself and pay a publishing house for a limited edition | print run and voila, you can have your own wikipedia page | based on an authorized source | | It isn't as easy as you make it sound - to establish | notability, they don't accept self-published / vanity press | sources - so you'd need to get your book published by a | publisher with an established track record. That's a lot | harder - it is either something that money can't buy, or at | least you'd need a lot lot more money to buy it than self- | publishing charges | | Furthermore, one high quality reliable source is generally | not considered enough for notability, they want multiple | sources. If you get your biography published, and then get | some journalists in established media outlets to publish | articles about it, you'll meet that hurdle too. But if you | manage that, you are probably actually are notable, as | opposed to just some random nobody trying to buy their way | into Wikipedia | chungy wrote: | Unicode's stuck to that principle better than Wikipedia has | stuck to its principle. | kens wrote: | Unicode has completely separate rules for characters and | emojis. Characters and scripts generally need solid | documentation that they are real existing symbols. Emojis, on | the other hand, are accepted largely on how likely they are to | be used. My impression is that the Unicode committee would | prefer to deal with scripts and characters, but they got stuck | with emojis for historical reasons and that's what most people | care about. | | Refs: https://www.unicode.org/emoji/proposals.html | http://www.unicode.org/pending/proposals.html | avgcorrection wrote: | No, man. Digital communication should follow whatever history | for inclusion that text before 1850 did. Write a `:P` (you | know: HN and emojis...) as monastic marginalia for three | hundred years and then maybe we'll spare one code point for | it. | dhosek wrote: | I kind of expected this to be more an overview of the new stuff | in Unicode 15.0. As the author of a Rust Unicode crate | (finl_unicode), I always like to dig through the release notes to | see what sort of strange new stuff is on offer. | PostOnce wrote: | Tangent: | | I recognized the domain and tried to remember why, and now I | remember. | | I'm working on a game, and babelstone.co.uk has probably the | world's most comprehensive (and high quality) set of runic fonts: | | https://www.babelstone.co.uk/Fonts/ | | https://www.babelstone.co.uk/Fonts/Runic.html | | https://www.babelstone.co.uk/Fonts/AngloSaxon.html | [deleted] | willm wrote: | I found this more entertaining than the new Avatar movie. | TheRealPomax wrote: | Andrew's Babelmap [1] is one of those applications that, if you | do anything text or typography related, is basically required | owning. With a donation, of course. | | [1] https://www.babelstone.co.uk/Software/BabelMap.html | virtualritz wrote: | I'm usually ok with what macOS Character Viewer offers. I am | rarely on Windows and didn't know about BabelMap. It looks like | it fills the gap there. | | I work mostly on Linux so I hacked a Character Viewer clone in | Rust over a weekend recently[1]. | | It just does what I need but I'm planning to add features to it | if I find them useful. | | So I am curious: what functions does BabelMap offer that you | can't live without, especially as a typographer? | | [1] https://github.com/virtualritz/glyphana | arm wrote: | Since you mentioned macOS, it would be remiss of me to not | mention UnicodeChecker: | | https://earthlingsoft.net/UnicodeChecker/index.html | mycall wrote: | I would love to see someone make an image to unicode "curve | fitting" algorithm or converter, similar to ANSIDRAW. | hollasch wrote: | See https://shapecatcher.com/. | einpoklum wrote: | This is the most important part of Unicode for me: | | https://www.unicode.org/reports/tr9/tr9-46.html | | because I speak a right-to-left language. Whoever wants to write | an application involving text entry, and truly support | localization or internationalization, should take the time to | read at least section 3: | | https://www.unicode.org/reports/tr9/tr9-46.html#Basic_Displa... | phkahler wrote: | GNU unifont has the entire MBP hut is a bitmap font. Is there an | equivalent monospaced scalable font we can use in GPL software? | politelemon wrote: | Noto Sans? | https://fonts.google.com/noto/specimen/Noto+Sans+Mono | troymc wrote: | Some random characters didn't render in my browser. Upon | inspection: | | font-family: Georgia, Serif; | | I don't think those fonts support all of Unicode. Google created | their Noto fonts [1] for this purpose; I wonder why those aren't | being used. | | [1] https://en.wikipedia.org/wiki/Noto_fonts | mistrial9 wrote: | gentium is a font with a very large number of glyphs also | | https://software.sil.org/gentium/ | jfk13 wrote: | But only for Latin/Greek/Cyrillic scripts; it makes no claim | to be a pan-Unicode font (family). | jfk13 wrote: | Browsers will generally do "fallback" to some other font, if | the font(s) named in the CSS don't support the characters | present in the text. But for some of the rarer characters, you | may not have any available font that supports them. | Someone wrote: | See https://en.wikipedia.org/wiki/Fallback_font. It typically | isn't a browser feature, but an OS one. | | Since 1998 MacOS has a "last resort" font that has glyphs | (not necessarily unique) for every Unicode code point. They | donated it to Unicode (https://en.wikipedia.org/wiki/Fallback | _font#Unicode_Last_Res...), so I expect most OSes running | full-blown modern browsers to have it or something similar | (those running smaller browser engines may be too space | constrained to have room for it) | abudabi123 wrote: | http://www.chinaknowledge.de/Literature/Science/shuowenjiezi.ht | ml | | I have the noto fonts and ctext dot org's hana fonts but still | see tofu in the above page. Whatever font is used on the | iPhone's Pleco app the correctness depends on context where you | are in the app. | | These two examples often are confused: Ri Yue | nanis wrote: | And yet there is still no unambiguous lower case "I" or upper | case "i". | ClumsyPilot wrote: | thats the job of a font, not encoding | nanis wrote: | No, the fact that there is no codepoint that makes those | mappings ambiguous is due to the way Unicode decided to save | to codepoints for seemingly no good reason. | | What should be _the_ value of `"I".lower()`? Or, "i".upper()? | | And please don't bring up locales. The whole point of | accepting the complexity of Unicode is to be able to take a | document which stands on its own without external references. | | > Early character encodings also conflicted with one another. | That is, two encodings could use the same number for two | different characters, or use different numbers for the same | character. | | > The Unicode Standard provides a unique number for every | character, no matter what platform, device, application or | language.[1] | | Those statements are outright lies: Unicode does not provide | a unique number fpr "upper case Turkish dotless i". Nor does | it provide one for "lower case Turkish dotted i". | | If it did, it would be possible to correctly map "i" to "I" | or "I" and "I" to "i" or "i" without having to know anything | other than the source codepoint. | | The font does not even come into play here. | | [1]: https://unicode.org/standard/WhatIsUnicode.html | Kwpolska wrote: | Unicode is for representing text, not allowing arbitrary | manipulation of it. It isn't the job of Unicode to encode | those relationships. Also, the Turkish `i` stuff is just | the tip of the iceberg. Should Unicode be able to round- | trip `'ss'.upper().lower()`? Keeping the existing | capitalization of ss - SS, you need to define a "uppercase | S that used to be ss" character. Then there's the Dutch | `ij`, in which both characters are either uppercase or | lowercase (`Ij` at the start of a word is incorrect). | There's a ligature in Unicode, but it's only for | compatibility with some legacy keymaps. But is there a | point in adding a new version of "S" that a lot of software | would not recognize as equivalent to the plain old ASCII | "S" (and one might end up far away from a ss due to copy- | pasting or stuff), bringing weird bugs and security issues? | Should the Dutch throw out all their keyboards just so they | get a new key for the special IJ ligature? | [deleted] ___________________________________________________________________ (page generated 2023-02-26 23:00 UTC)