[HN Gopher] Unicode character "" (U+A66E) is being updated ___________________________________________________________________ Unicode character "" (U+A66E) is being updated Author : SerCe Score : 251 points Date : 2022-09-19 11:39 UTC (11 hours ago) (HTM) web link (twitter.com) (TXT) w3m dump (twitter.com) | iLoveOncall wrote: | I want to be that person that has so much time on their hand they | can afford to waste it on pointless things like this. | shadowgovt wrote: | There's a career path to get there. It involves becoming | someone who cares deeply about the ways and means of digitizing | data stored in analog media. Drill down deep enough, and you'll | find yourself in a fascinating world of coding an error. | | There are things like the "ghost characters," which are | codepoints in Japanese that map to characters that were | basically transcription errors when the team was putting | together a full set of Kanji. Some characters with an extra | horizontal line snuck into the set; they were likely caused by | a transcription error because the character got split onto two | pieces of paper by lines of text being copy-pasted into a | records book, and the shadow cast by the thin extra layer of | paper was misinterpreted as another stroke. | | https://www.dampfkraft.com/ghost-characters.html | hidudeurcool wrote: | dmz73 wrote: | And then people wander why software developers don't care to | support Unicode properly. First 60,000+ characters made sense, | than few more were needed and Unicode suddenly got to play with a | 1,000,000+ and just went off the rails. | lifthrasiir wrote: | You can support Unicode without ever having to display all | possible characters "correctly". | perihelions wrote: | Related thread, about non-existent CJK characters ending up in | Unicode through transcription mistakes ("ghost characters"): | | https://news.ycombinator.com/item?id=32095502 ( _" A Spectre Is | Haunting Unicode"_, 180 comments) | | edit to add: The top thread in the 2020 repost was about , | | https://news.ycombinator.com/item?id=24955536 | sshine wrote: | (+D)+Shan +-+ | SnooSux wrote: | Be not afraid | drewzero1 wrote: | Bee not afraid? | msla wrote: | Bee Nut Afraid. | | (When an apiarist is terrified.) | thechao wrote: | +-+no( _ no) | loudmax wrote: | > === | | The James Webb Space Telescope. | throwaway98797 wrote: | cant unsee | Izkata wrote: | Don't worry, you'll forget about this one when it gets six | more eyes. | rafaelturk wrote: | Finally! | hulitu wrote: | > Unicode character "" (U+A66E) is being updated | | I fear this will lead to a lot of "bug fixes and performance | improvements" in Android. /s | vintermann wrote: | Biblically accurate O? | Waterluvian wrote: | I'm not sure how I feel about this. I'm not an expert by any | means. | | But something just doesn't feel right when you've got unicode | with a character with one known use from forever ago. | | Doesn't this open up the flood gates to just a ridiculous amount | of work or else biased gatekeeping? | | How much work would it be to implement your own font of the | entire unicode set? Or is that not actually a thing and fonts | implement as-desired subsets? | lifthrasiir wrote: | > How much work would it be to implement your own font of the | entire unicode set? Or is that not actually a thing and fonts | implement as-desired subsets? | | You can't, and you are not expected to do so. You are limited | by OpenType limit (65,535 glyphs), various shaping rules that | possibly increase the number of required glyphs, and lack of | local or historical typographic convention. Your best bet is | either to recruit a large number of experts (e.g. Google Noto | fonts) or to significantly sacrifice quality (e.g. GNU | Unifont). | poizan42 wrote: | A single OpenType font file is limited to 65,535 glyphs. | Nothing stops your font from being implemented as a series of | .otf files (besides what people think of as a "font" when it | comes to usage on computers). | | But yes, time constraints are the limiting factor. I don't | think anyone is going to dedicate their entire life to making | a single font. | lifthrasiir wrote: | While you are right that one logical font can consist of | multiple font files (or possibly a OpenType collection), | this constraint does affect most typical fonts. Wide- | coverage CJK fonts already hit this limit. Fonts supporting | only one of Chinese, Japanse and Korean don't need that | many glyphs, and probably even two of them will be okay, | but fonts with all three sets of glyphs won't. It is | therefore common to provide three versions of fonts, all | differently named. | Waterluvian wrote: | I wasn't aware of the 2^16 limitation. Thank you for the | notes! | aasasd wrote: | I'll tell you more: there are Unicode glyphs without known | usage. | gumby wrote: | There are quite a few such characters in Unicode because | academic articles about things like cuneiform need to be | digitized too. And because the historical record is so sparse, | we often have vanishingly few, or only one example of a | character, and perhaps no way to know if it was a misprint or a | real character. | | Actually this character seems like a scribe's joke, no | different from the illustrated characters at the beginning of | medieval paragraphs (all of which are represented in Unicode as | A, B or whatever). But the point still holds. | | It even holds for modern languages -- consider the ghost | characters needed for round trip compatibility: https://weekly- | geekly.imtqy.com/articles/418717/index.html | | (actually cuneiform is a poor example; perhaps Linear A would | have been a better example) | diimdeep wrote: | Being stuck on macOS Catalina with Unicode 12, I think there is a | way to upgrade to newer versions and get new emoji support [1][2] | | [1] https://apple.stackexchange.com/questions/278937/is- | there-a-... [2] https://forums.macrumors.com/threads/updating- | maverickss-emo... | quickthrower2 wrote: | Crazy that it renders in HN comments (which rejects a lot of | Unicode) | etamponi wrote: | By the same reasoning, the 7-eyed O has now been used more than | once, so it deserves a glyph! So the right way to do this is to | introduce a new character for the correct glyph, and also leave | the current one (perhaps changing the title). Otherwise these | tweets won't make when read by someone that updated to Unicode | 15.0 | echelon wrote: | _This_ thread on HN won 't make sense in the future if the | Unicode body replaces | | Make a new character! | koboll wrote: | Honestly it probably deserves the Pluto treatment: | decertification as a character. One historical use in the 1400s | doesn't merit a character and never did. | Pinus wrote: | Isn't there an entire Unicode block for the symbols on the | Phaistos disc? Yes: | https://en.wikipedia.org/wiki/Phaistos_Disc_(Unicode_block) . | I suppose those occur in quite a few documents _about_ the | disc, even though the disc itself is the only known document | written _in_ those symbols. | colejohnson66 wrote: | Unicode's mission is to make _every_ document "roundtrip- | able". Even if a character is only used once, it should be | possible to save a plaintext version of the containing | document without losing any information. Roughly, I should be | able to put a transcription of that one translation from the | 1400s on Wikisource without using images. | | You may disagree with me, and that's fine, but it doesn't | change Unicode's mission. Besides, there's room for 1,112,064 | codepoints[a], and only 149,146 are in use. It's predicted | we'll never use it up, so what harm is there in one codepoint | no one will ever need? | | [a]: U+10'FFFF max; it used to be U+FFFF'FFFF, but UTF-16 and | surrogates ruined that | djur wrote: | Unicode doesn't have a character for every illuminated | initial, nor should it. I'm not clear on why this character | should be considered any differently. | skyyler wrote: | http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3194.pdf | | It was introduced with other "ocular O"s which are | seemingly more commonly used than this one. | | It's not quite an illuminated initial. | akavel wrote: | Wow, this is probably the most actually useful and | interesting comment in this whole discussion, thanks! For | anyone interested, the most relevant quotes from the | document are in particular: | | _" This document requests the addition of a number of | Cyrillic characters to be added to the UCS. It also | requests clarification in the Unicode Standard of four | existing characters. This is a large proposal. While all | of the characters are either Cyrillic characters (plus a | couple which are used with the Cyrillic script), they are | used by different communities. Some are used for non- | Slavic minority languages and others are used for early | Slavic philology and linguistics, while others are used | in more recent ecclesiastical contexts. We considered the | possibility of dividing the proposal into several | proposals, but since this proposal involves changes to | glyphs in the main Cyrillic block, adds a character to | the main Cyrillic block, adds 16 characters to the | Cyrillic Supplement block, adds 10 characters to the new | Cyrillic Extended-A block currently under ballot, creates | two entirely new Cyrillic blocks with 55 and 26 | characters respectively, as well as adding two characters | to the Supplementary Punctuation block, it seemed best | for reviewers to keep everything together in one | document._ | | _(...)_ | | _MONOCULAR O , BINOCULAR O , DOUBLE MONOCULAR O , and | MULTIOCULAR O are used in words which are based on the | root for 'eye'. The first is used when the wordform is | singular, as k; the second and third are used in the root | for 'eye' when the wordform is dual, as chi, chi; and the | last in the epithet 'many-eyed' as in serafimi | mnogochityii 'many-eyed seraphim'. It has no upper-case | form. See Figures 34, 41, 42, 55. "_ | j-bos wrote: | Because it's already been added to unicode. Now it's not | a question of whether or not to add, rather to remove, | and unicode almost by definition does not remove. | thayne wrote: | Unicode does have deprecated code points though. Not that | I necessarily think making this character deprecated | makes sense. | lmm wrote: | Meanwhile one still can't roundtrip regular Japanese | without some kind of funky out-of-band signalling. By | itself this kind of thing is harmless, but it speaks to | poor prioritization from Unicode. | bityard wrote: | Today, I wrote a document by hand containing a new symbol | that only looks like genitalia if you squint really hard. | Where do I apply to have it included in unicode so that it | can be digitized properly? | lucumo wrote: | Rule-lawyering wise-asses try to mess with many policies. | It's rarely a sensible indictment of a policy, nor is it | very effective. Anyone dealing with such people just | ignores them. | fluoridation wrote: | What's the criterion that includes the document in the | tweet, but excludes the document referenced by the GP? | bzxcvbn wrote: | https://www.unicode.org/pending/proposals.html | | https://www.unicode.org/emoji/proposals.html#selection_fa | cto... | fluoridation wrote: | I don't see any anything on the inclusion of symbols that | are not icons, such as U+A66E, or the symbol proposed by | bityard. | koala_man wrote: | Can you reuse or ? | 0xbadcafebee wrote: | And for years we've just been using eggplants! | 411111111111111 wrote: | ( no >= [?] <= ) no mi + - + | | ~ ( Jjut Jo Jjut ) ~ | | ( // y . // y ) | [deleted] | layer8 wrote: | > Unicode's mission is to make every document "roundtrip- | able". | | Only for characters from existing coded character sets. | modzu wrote: | why isnt the artist formerly known as prince in unicode? | koboll wrote: | Okay, let's take a look at the context where the | multiocular o was used: | https://en.wikipedia.org/wiki/Multiocular_O | | I see that near it, there is an ef (F) with a very tall | stem. | | Why should that not be included as a standard unicode | character? Surely it is used more often than the | multiocular o. | | You may say "it's a decorative flourish", which is of | course true, but so is the multiocular o. Should we allow | every conceivable decorative flourish into unicode? What is | the standard for where flourishes become distinct | characters? | shp0ngle wrote: | The thing is, this is just a decorative way to write "o". | It's not a specific letter by any definition. | | I can't speak of other letters that were added in the same | batch in 2007. Some of them seam meaningful, I donno, I | don't speak old church slavonic (although I am told it | sounds like Croatian, which I understand a little) | | http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3194.pdf | kadoban wrote: | For as inclusive as that mission is, it seems weird to me | how limited in certain areas unicode is. For instance, | people use peach emoji since there isn't one for butt, | eggplant since there's no penis, etc. | | This doesn't contradict the stated goal exactly, but it | seems against the spirit of it at least. | yakireev wrote: | One could argue that emoji should have never been added | to Unicode in the first place. Peaches and butts are | images, pictures, illustrations, whatever - but they are | not characters. There's no writing system which has a | colored drawing of a peach as a character. | eternityforest wrote: | But that doesn't change the fact that most people use | them snd like them, and there is not much technical | disruption. They just chose practicality over purity. | yakireev wrote: | Most people (me included) like funny cat videos and send | funny cat videos. Shall we include some to Unicode? | | I mean, this ship has long sailed, but that was a mistake | nevertheless. Not everything has to be a unicode | character. | PeterisP wrote: | Yes there is - a widely used character set (when Unicode | talks about "writing systems" it explicitly includes all | the computer character sets used in practice pre-unicode) | used by japanese 'featurephones' had emoji characters, so | in order to be able to include that character set in | unicode, unicode had to add emoji. | [deleted] | bzxcvbn wrote: | Yes there is. We're using it right now. Even linguists | are studying the use of emoji today. | vcxy wrote: | vcxy wrote: | I tried to reply with just a unicode penis but that got | flagged immediately, so I'll be more substantial and | leave out the actual penis. It appears in Egyptian | hieroglyphs, so actually there is a penis included in | unicode. | bhk wrote: | Cataloging every doodle ever drawn inline with text by | anyone at any time in history would exhaust any finite set | of code points. | tzs wrote: | If that was once its mission, it was clearly abandoned long | ago. They rejected Klingon characters on the grounds that | it has low usage for communication, and that many of the | people who do communicate in Klingon use a latinized form. | | seems to just be a fancy way of writing O. I haven't seen | anything that says it has a different meaning. The | arguments for excluding Klingon seem to apply even more so | to . | mananaysiempre wrote: | If you look through the old mailing list postings, the | oft-left-implicit problem with Klingon (as well as | Tengwar, Emerson's pet project) is that it may get people | into legal trouble (even though in a reasonable world it | shouldn't be able to). So in the unofficial CSUR / UCSUR | they remain. | | A weird solitary character from the 1400s isn't subject | to that, and even if it's a mistake it's probably not | worth breaking compatibility at this point (I think the | last such break with code points genuinely changing | meanings was to repair a mistaken CJK unification some | time in the 00s, and the Consortium may even have tied | its own hands in that regard with the ever-more-strict | stability policies). | | Similarly, for example, old ISO keyboard symbols (the [?] | for erase backwards, but also a ton of virtually unused | ones) were thrown in indiscriminately at the beginning of | the project when attempting to cover every existing | encoding, but when the ISO decided to extend the | repertoire they were told to kindly provide examples of | running-text (not iconic) usage in a non-member-body- | controlled publication. (Crickets. The ISO keyboard input | model itself only vaguely corresponds to how input | methods for QWERTY-adjacent keyboards work in existing | systems--as an attempt at rationalization, it seems to | mostly be a failed one.) | bobsmooth wrote: | Unless it's legitimately someone's native tongue, | conlangs shouldn't be in unicode. If there are kids out | there that are native Klingon speakers, then you can make | the argument it should be included. | reaperducer wrote: | _One historical use in the 1400s doesn 't merit a character | and never did_ | | One _known and surviving_ use. It is possible that it exists | in other places, since the vast majority of the planet 's | written work has not been digitized. It may also have been | used other places that have not survived. | | Just because it's not important to you does not mean it is | not important. | | The fact that is survived for 600 years makes it interesting | and worth saving. It is infinitely unlikely that anything you | do, write, or say will last that long. | bhaney wrote: | > It is infinitely unlikely that anything you do, write, or | say will last that long | | Ouch | koboll wrote: | Sure it's possible, but there should be a higher bar than | "it's possible it's used more than once" for meriting | inclusion in the standard keyboard of billions of devices | worldwide. | tsimionescu wrote: | The thing is, looking at the page, there are many other | characters that were not added - the large red S-looking | characters, for example. But for some "bizarre" reason, | those were not included in Unicode... | | Of course, the simple answer is that Unicode actually | includes any character that someone cares enough to ask to | be added, with rare exceptions. | runarberg wrote: | idk. when the word Planet was redefined such that Pluto was | no longer a planet, it kind of ruined the word Planet. It | suddenly wasn't nearly as useful as a word as it used to | (even though now it has a precise meaning). For most people | that use the word, it won't matter (and is actually rather | exciting) that they keep discovering new planets in our solar | system. | | If they'd treat the word characters the same way, it would | only serve to confuse and do no favors to the remaining | glyphs. | BlueTemplar wrote: | This is temporary though, soon people will look at you | funny if you say that Pluto is a planet - and/or they might | not even have heardof it (though of course that is still | worth learning about in an History of Science context). | | We do NOT keep discovering new planets, rather minor | planets (I agree that the term is confusing), more than a | million of them discovered in the Solar System now, like | the 9007 James Bond. | runarberg wrote: | It could go either way, it is not always that the | scientific meaning wins out, especially not when even | scientists don't find the new definition useful. | | When I think of a planet, I think of a world that has | active geology that isn't a moon (I know excluding moons | is arbitrary, and perhaps I shouldn't do that; but hey, | that's language for you). I honestly don't care about the | orbit, and I bet that when most people think about | planets they aren't thinking about the orbit either, let | alone whether the planet has cleared the orbit or not. I | doubt that will change. | gerikson wrote: | > When I think of a planet, I think of a world that has | active geology | | Wouldn't that definition rule out gas giants? | runarberg wrote: | Yeah, probably strictly... But I'm not a planetary | scientist. I'm merely a user of language, and I don't | need to be rigorous in my definitions. And to me the | weather patterns on Jupiter is an interesting feature | enough to count as geology (even though it is probably | not strictly a geology). | jameshart wrote: | No just that, but whether or not Mars is still | geologically active is still an open question. If you | admit planets on the basis that they have a history of | geological activity, then Ceres is a planet too. | | I don't think anybody considers geological activity as | particularly useful for classifying things as 'planet' or | 'not planet'. | runarberg wrote: | Why shouldn't Ceres be a planet? If Pluto gets to be a | planet then Ceres is definitely a planet. | | But there is still geology active geology on Mars. There | are still moisture, winds and glaciers that are shaping | the environment. I consider that to be geologically | active. | [deleted] | PeterisP wrote: | At the moment this character is used in many documents and | databases - including comments in this thread, the article | mentioned there, etc. | | There could have been a good case not to include it back in | 2007, but once it has been included, excluding it would break | stuff. | BlueTemplar wrote: | And updating it rather than adding a new, correct one, | might make the current uses confusing ? | | Speaking of which, do we have any similar hexagonal symbol | ? | jotato wrote: | My thought as well | baybal2 wrote: | Unicode basic rule is that character definitions never ever | change, even when enumerated erroneously. | Arnt wrote: | Yes, but this is a change either way, because that | codepoint's definition referred to that character. Either the | reference or the description of the appearance has to change. | echelon wrote: | Make a new character. Updating the existing character ruins | the meaning of all previous usages. | | It's like trying to change an API. Don't disrespect your | existing users. Make a new version. | | ( [?]?) | | Think of all the ASCII art this botches. That has to have | some historical importance to the Unicode standards body. | | ([?]_) | | For scholarly digital (unprinted) documents where the | correct character rendering matters, erroneous past usages | can be trivially found with grep, a date search, and easily | corrected. The domain experts will familiarize themselves | with this issue and fix the problem. Don't take a shotgun | to it! | | This message wn't have the riginally intended meaning if | the characters are updated from underneath. | nerfhammer wrote: | why not make an additional eye a diacritic mark so you can just | add an arbitrary number of eyes | martin_a wrote: | Uff. | | I'm not sure we have space for another glyph in Unicode. Looks | pretty packed in here... | BlueTemplar wrote: | UTF-8 is still more than 80% empty, and can be potentially | extended... | colejohnson66 wrote: | _Theoretically_ , UTF-8 can encode up to 31 bits | (U+7FFF'FFFF)[0], but for compatibility with UTF-16's | surrogates, it's officially capped to 21 bits with the max | being U+10'FFFF[1]. That decision was made November 2003, | so there's two decades of software written with hard caps | of U+10'FFFF. | | [0]: https://www.rfc-editor.org/rfc/rfc2279 | | [1]: https://www.rfc-editor.org/rfc/rfc3629#section-3 | RcouF1uZ4gsC wrote: | I think the big issue with Unicode is that it is centralized and | there are politics about what characters get included (see | Klingon) | | I think I have a solution to decentralize Unicode: | | 1. Extend Unicode to 128-bits. We can still use UTF-8 variable | length encoding which will limit the real size. | | 2. Use a blockchain to coordinate the characters. That way | whoever wants to add a character can do it without gatekeeping. | | These simple suggestions will go a long way in making Unicode | less centralized. | dhosek wrote: | This is not exactly a correct description. Unicode does _not_ | specify the appearance of characters, only their meaning. It | seems what's changed is the reference presentation of the | character in the Unicode tables, not the character itself. | Unicode goes to great lengths to preserve backwards compatibility | so changing the meaning of a code point would violate that | principle. Your OS or application providing Unicode 15.0.0 | support will not change the appearance of U+A66E. The appearance | is dependent on the font. | idlewords wrote: | They should put in a few additional eyes as hot spares. | xanathar wrote: | So it's a Unicode character that represents a... blob with 10 | eyes? | | _Hordes of Wizards of the Coast lawyers getting ready for the | big fight_ | gedy wrote: | Name checks out: | https://forgottenrealms.fandom.com/wiki/Xanathar_(original) | supernewton wrote: | Nah, Beholders have 11 eyes, so we're good here. | tsimionescu wrote: | I feel like the spelling should be updated to Behlders, or | better yet, Behlders, to reflect that (of course, this would | only make sense once the glyph update actually hits). | ElfinTrousers wrote: | Am I alone in thinking that this is not so much a separate | character, as a doodle a bored monk made to relieve a tiny bit of | the tedium of copying manuscripts? | BearOso wrote: | And its new official name shall be the Trypophobigon. | xashor wrote: | Too bad I have to adjust my business cards for .world | Traubenfuchs wrote: | remedan wrote: | We do have a Unicode character for a gun: U+1F52B PISTOL. Most | fonts that have it choose to style it as a water gun, though. | dafoex wrote: | There's an emoji for handgun, but Apple and other big tech | decided it needed to be a water gun. There is also a rifle | character intended to represent the sport of shooting in a | pentathlon, but again Apple threw its weight around and, while | the character became codified in Unicode, it never became an | emoji and no font from big tech supports it. | jrockway wrote: | I guess because the goal of Unicode is to be able to represent | every character that's appeared in language. This one is in a | published book, while guns and a sexual intercourse symbol | aren't. | | Emoji was a weird value add that Japanese mobile providers | added to their phones before Unicode. To get them to move to | Unicode, they had to keep them. That's why there's a Tokyo | Tower emoji, but not an Eiffel Tower. That's why the post | office has a @ on it. That people get any use out of emoji | outside of Japan is really pure luck. | ElfinTrousers wrote: | That seems actually logical when you consider that kanji | presumably began as simple depictions of objects that could | be drawn quickly. Perhaps the only difference between emoji | and kanji is time. | shadowgovt wrote: | I've even heard emoji referred to as "the carrot that keeps | the implementations current." Every time a new version of | Unicode is published, a few more emoji are tacked on. It acts | as incentive for all the cellphone carriers and such to put | the money into updating their implementations, because nobody | wants to be the one on the block with the one phone that | can't render "Mirror Ball" . | | (ETA: LOL, Hacker News drops "Mirror Ball" | https://emojipedia.org/mirror-ball/ from the comment when you | post) | Traubenfuchs wrote: | I believe the majority of emoji do not work on hacker news. | jrockway wrote: | Incidentally, Windows doesn't have the mirror ball. I guess | it is a carrot to get me to upgrade to Windows 11, which I | am skipping. (The key with Windows is to only use the good | versions; XP, 7, 10, ???. Hoping ??? arrives soon ;) | int_19h wrote: | It's not in Win11 yet. | Dwedit wrote: | There are heiroglyph dicks in unicode, see U+130B8. | Traubenfuchs wrote: | I even posted phallus with emission in my comment above. | | I can see it on latest iOS, but not on Windows 10 + Chrome. | rizoma_dev wrote: | I'm always happy to see some esoteric unicode updates | diimdeep wrote: | Here[1][2] is the scan of manuscript from 1429, image #251 | | [1] https://lib-fond.ru/lib-rgb/304-i/f-304i-308/#image-251 [2] | https://web.archive.org/web/20110927102700/https://www.stsl.... | aasasd wrote: | So the text at that point literally talks about 'many-eyes | seraphims'. The eyes symbol is a pure gag--seems to be spliced | in place of the letter 'o' in the word 'eye' just a little down | the line. (However, Old Slavonic is a tough read due to no | spaces, so I'm not sure about that word. But at least it's not | the Glagolitic script, which was just ridiculous and actually | had multi-circle letters.) | klyrs wrote: | It's curious that the red ink blobs behind the "eyes" aren't | included in the unicode glyph either... | msoad wrote: | This is similar to "man in business suit levitating" emoji. | | How this stuff make it to Unicode?! | shp0ngle wrote: | Levitating man is just an unicode encoding of an old Webdings | (or windings?) font. | | There was an accepted proposal to add many windings and | webdings letters as unicode endpoints. Thus, levitating man in | a suit. | octoberfranklin wrote: | I miss the good old days when character sets didn't feel the | need for _annual updates_. | baltimore wrote: | Is there any end to this? E.g., why not include Galileo's | pictograms of Saturn as seen here: | https://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=... | no-reply wrote: | I'll check back in the future. | xenonite wrote: | Sadly, is not eligible for engraving by Apple on AirPods. | colejohnson66 wrote: | As of right now, it's available for "adoption": | https://www.unicode.org/consortium/adopt-a-character.html | adhesive_wombat wrote: | Meanwhile I check back every now and again on MUFI (Medieval | Unicode Font Initiative) [1] and it's still not in. | | [1]: https://mufi.info | politelemon wrote: | Here's the original tweet where the discrepancy was noticed in | 2020, and a photograph of a page inside the book where it's used: | | https://twitter.com/etiennefd/status/1322673792452354048 | pippy wrote: | The Unicode can be ridiculous at times. It contains a character | used once in a single manuscript in a extinct language, but not a | standardized glyph for an external URL link. | lordnacho wrote: | Wait a minute, how will we refer to the old glyph in the future? | Once this is updated the articles such as this one will have the | new shape. | martin_a wrote: | "The character formely known as U+A66E" | lifthrasiir wrote: | There was a joke that U+A66E should retain seven eyes and | further eyes should be added with a ZWJ sequence [1]. If that | character somehow got _very_ popular in modern texts, updating | its glyph may result in an interoperability problem so such | solution would have been needed. But that didn 't happen so the | glyph itself has been updated instead. | | [1] https://twitter.com/BabelStone/status/1323440365429542919 | memorable wrote: | Alternative frontend version: | https://nitter.net/jonty/status/1571615998335123457 | jerf wrote: | When my kids were young, I accidentally flubbed the pronunciation | of "Santa Claus" once and said something that sounded a lot like | "Centiclops", which I decided to roll with. Centiclops is a lot | like a cyclops with one eye, except the as a reading of the roots | clearly indicates, this is a creature with 100 eyes. | | Today I learn that Centiclops effectively has a Unicode | character. As Centiclops' representative in the world of the non- | imaginary, we accept that a Unicode character with a hundred eyes | is not practical and we accept the representation with just a few | eyes, but generally agree that upgrading to 7 to 10 is a nice | improvement, as 7 does not evenly divide into 100 but 10 does. | This is important, because... reasons. | sshine wrote: | From "The House of Asterion" by Jorge Luis Borges: | | "It is true that I never leave my house, but it is also true | that its doors (whose numbers are infinite) (footnote: The | original says fourteen, but there is ample reason to infer | that, as used by Asterion, this numeral stands for infinite.) | are open day and night to men and to animals as well." | | https://klasrum.weebly.com/uploads/9/0/9/1/9091667/the_house... | thaumasiotes wrote: | > Centiclops is a lot like a cyclops with one eye, except | th[at] as a reading of the roots clearly indicates, this is a | creature with 100 eyes. | | Not in any normal sense of "roots". _Cent_ is a Latin root | meaning 100. _ops_ is a Greek form meaning eye. The -i- | indicates that the word is being formed in Latin, and the -cl- | is entirely spurious. The original Greek word divides as cycl- | ops, not cy-clops. | inopinatus wrote: | In any case, there is already an ancient, general, and | perfectly serviceable epithet _Panoptes_. | martyvis wrote: | A bit like the heli-copter | helico-pter thing. | layer8 wrote: | There should be a combining "eye" character so that you can | have as many or few eyes as you like. | | Though to be honest, that Unicode character looks more like a | bunch of cells forming a tissue to me than eyes. | doodpants wrote: | Or perhaps this character is an accurate representation of a | Dekaclops. | jerf wrote: | My client finds your proposal offensive and an appropriation | of his culture, and also that Dekaclops guy is mean and | smells bad and hasn't returned the lawnmower my client lent | him even though my client has clearly referred to the need to | mow his lawn several times now so he totally doesn't deserve | a Unicode character. | tzot wrote: | It'd be dekaops, because the -cl- is part of "cy _cl_ | e"+"ops" (one round eye, with the one dropped because it's | inferred). So "cycle" out, "deka" in. | tempodox wrote: | "Santa Clause" would translate to "holy clause". There might be | such a thing but I think you meant Santa Claus :) | dtparr wrote: | Maybe just a big fan of the Tim Allen movie? | jerf wrote: | My fingers love adding the e's on the end of any worde that | can conceivably take them. Also have that problem with any | word that can take an "ly" even if I don't meanly it. | | Fixed, thanks. | JohnFen wrote: | I thought "santa" meant "saint"? | 0xbadcafebee wrote: | It does; the character originates from Saint Nicholas (or | Odin, depending who you ask) | felix318 wrote: | "Santa" means "female saint" in Italian and Spanish. | Perhaps the English "santa" came from another language but | I always found the name "Santa Claus" just horrible. | Archelaos wrote: | The first mention of this version of Saint Nicholas's | name has the form "St. A Claus" and appeared in the New- | York Gazette of 20 Dec 1773.[1] The same issue also first | reported some incident regarding tea in Boston harbour. | Nice coincidence. | | [1] Source: https://boston1775.blogspot.com/2016/12/st- | claus-was-celebra... | nohuck13 wrote: | The name Santa Claus evolved from Nick's Dutch nickname, | Sinter Klaas, a shortened form of Sint Nikolaas (Dutch | for Saint Nicholas) | | https://www.history.com/.amp/topics/christmas/santa- | claus#si... | thaumasiotes wrote: | > I thought "santa" meant "saint"? | | Well, _santa_ is a Spanish word meaning "holy" and _saint_ | is a cognate French word meaning the same thing. They | descend from Latin _sanctus_ ; compare _sanctify_. | | When the prayer goes "holy Mary, mother of god", "holy | Mary" is an exact equivalent of "santa Maria". | robocat wrote: | Might as well mention "Sancta Maria" in Latin, for | example from the Christian Hail Mary[1], a recorded Latin | version[2], written Latin next to English and Spanish[3] | and of course translated into _thousands_ of languages[4] | although unfortunately mostly written using _/ A-Z/i_; I | am an atheist interested in languages. | | [1] https://en.m.wikipedia.org/wiki/Hail_Mary | | [2] https://glaemscrafu.jrrvf.com/english/avemaria.html | | [3] https://hymnary.org/text/hail_mary_full_of_grace_the_ | lord_is... | | [4] http://www.marysrosaries.com/Rosary_prayers_in_differ | ent_lan... | fortran77 wrote: | I thought it was a misspelling of Satan, but maybe that's | because I'm Jewish. | wongarsu wrote: | Saint is more or less the same as holy, just used as a | title. It comes from Old French saint, seinte "holy, pious, | devout," from Latin sanctus "holy, consecrated" | kratom_sandwich wrote: | I love this character and I love the fact that is being updated. | Just to get this right: at some point some person chose to doodle | the letter instead of writing it the correct way and now we have | a corresponding Unicode character? Sort of amazing and it also | makes you think ... | lmkg wrote: | There was a... "tradition" is a strong word, perhaps "trend" is | better. Authors making copies of the Bible or related works in | Cyrillic, that the letter O (equivalent to Roman O) at the | beginning of the word for "eye" would be stylized to look like | an eye. There are a variety of glyphs along these lines: , , . | All of them, including , were added to Unicode as a single | group. | | The glyph "" was used to refer to an Angel with a whole buncha | eyeballs, as one does. In terms of texts that survive today, | this specific glyph has exactly one use in a single manuscript | from the 1400's. It might have been used more, in texts which | don't survive. But it is part of a larger trend, and I bet that | its inclusion in Unicode depends strongly on that. | | But yeah, in itself the character exists solely so that modern | computers are capable of a more-faithful rendition of the | transcription of a single handwritten copy of the Book of | Psalms. | happytoexplain wrote: | Thank you for describing the missing context. I couldn't | understand why this stylized letter deserved a code point | more than the uncountable others. I don't necessarily agree | still, but the fact that this character was only unique | _within a larger trend_ makes it much more reasonable. | henriquecm8 wrote: | So you are saying that the glyph is now more biblically | accurate? | int_19h wrote: | The Bible doesn't specify how many eyes seraphim have. | | "In the center, around the throne, were four living | creatures, and they were covered with eyes, in front and in | back. ... Each of the four living creatures had six wings | and was covered with eyes all around, even under its | wings." | vintermann wrote: | Hah, and here I thought I was making a joke when I called it | a biblically accurate O! | cyral wrote: | > modern computers are capable of a more-faithful rendition | of the transcription of a single handwritten copy of the Book | of Psalms. | | I wonder if there is even a copy of the book transcribed to | actual characters or if it only exists as scanned PDF copies? | If anyone did transcribe it, would they have any knowledge | that the character even exists on computers? | cillian64 wrote: | It does raise interesting questions about what counts as | decoration/formatting and what counts as part of the actual | text. You could view these ocular O characters as purely | decorative (like the fancy first character in a paragraph) but | they could also be seem as a quirk of spelling which should be | represented in unicode. | | But the multiocular O really does seem like one monk got bored | one time and did some doodling. | Arnt wrote: | I attended a Unicode meeting (or maybe two? not sure?) and came | away with the impression that Unicode is like those open source | projects that are used by half of the world and maintained by a | handful of skilled and benevolent people. | | In Unicode's case I think most of them are paid, at least. | shp0ngle wrote: | That is what I understood too. It doesn't seem particularly | hard to add new letters to Unicode too if you try a bit. | | However that is a bit harder with emojis, that have their own | subcommittee, which seem to be more bureaucratic and also | more popular than the rest of Unicode. Everyone wants to make | a new emoji. | Stamp01 wrote: | I don't understand why this character needs to exist given that, | at least according to the author, it has only been seen once in | the wild, and it's semantically identical to another more widely | used character. | | I'm glad I'm not responsible for unicode. Clearly I have the | wrong mindset for it. | 1-6 wrote: | I agree with your mindset. It's time for a unicode replacement. | lifthrasiir wrote: | Surprisingly many characters in Unicode are only recorded a few | times if not once before the assignment. Chinese characters for | example have a lot of them, because it was relatively frequent | to make a new character for newborns before the modernity and | some of them have survived through literatures but otherwise | seen no uses (e.g. U+21E2B only appears once in the _Records | of the Three Kingdoms_ San Guo Zhi ). But they have still | received code points because they are considered essential for | digitaization of historical works, and multiocular O is no | different. | bogwog wrote: | Imagine you're a historian from the future studying some old | document, and you spot a weird character that you've never seen | before. Wouldn't it be useful to be able to search for that | character to see if it shows up in any other document? A simple | OCR scan will bring up all the information you could ever need | for that one weird symbol. | PeterisP wrote: | Perhaps it's relevant to look at how it was introduced - as a | "package deal" with many, many characters from medieval | cyrillic literature, as described in this proposal | https://www.unicode.org/L2/L2007/07003r-n3194r-cyrillic.pdf | | It certainly made sense to include this package in Unicode, and | the vast majority of those characters certainly should be in | this proposal. You do have to draw the line somewhere, and | obviously those close to the line will be debatable, no matter | where you chose to draw it, like this particular symbol - but | once you've decided that you will include the one-eyed O (small | and capital) and the two-eyed O (small and capital), then | putting in the many-eyed O as well to complete the set doesn't | seem so far-fetched. | shadowgovt wrote: | It's been seen once in the in-print wild. | | There's no way to know how many since-written documents will | break if a whole codepoint is dropped. | wheybags wrote: | This kind of stupid thing is my problem with Unicode. We have all | this baggage for stuff that _nobody uses_ , and we need to deal | with it forever. The worst for me is the way there is no possible | way to encode a grapheme cluster as a constant size, so using | Unicode make it impossible to have simple character access like | an old style c string, no matter how big you make your char, even | though it's totally possible with damn near every language that | people actually use. | | So then we all end up paying this massive complexity tax | everywhere to pay for support for some Mongolian script that died | out 200 years ago (or multi codepoint encodings of simple things | like e - just why, it was so avoidable). | JohnFen wrote: | I hear you. I loathe working with Unicode for this exact | reason. It's a bit of a nightmare due to its complexity. | | That said, what it's trying to do is enormously complex. | svat wrote: | > _encode a grapheme cluster as a constant size [...] totally | possible with damn near every language that people actually | use_ | | This is not true. For a concrete example: the languages Hindi | and Marathi, with ~500 million speakers, use the Devanagari | script (also used by Nepali and Sanskrit), in which a grapheme | cluster is (usually) a sequence of consonants followed by a | vowel. For instance, something like "bhuktva" (bhuktvaa) would | be two grapheme clusters, one (bhu) for "bhu" and one (ktvaa) | for "ktva". In Unicode each vowel and consonant (here, bh, u, | k, t, v, a) is separately encoded, which is the only reasonable | thing to do, and inevitably means that grapheme clusters can | have different lengths (number of code points). The alternative | would have been to encode every possible (sequence of | consonants + vowel) as a single codepoint, which gets | ridiculous quickly: these sequences can be up to 5 consonants | long, so you'd end up having to encode (33^5 * 13 [?] 500M) | codepoints for Devanagari alone (or completely prevent certain | sequences of consonants from being expressed, which makes no | sense either), not to mention that most of the scripts of the | Indian subcontinent and south-east Asia follow the same | principle and have similar issues (e.g. Bengali with 250M | speakers, Telugu, Javanese, Punjabi, Kannada, Gujarati, Thai | with over 50M speakers each, etc). | | (See chapters 12-17 of the Unicode standard, currently version | 15: https://www.unicode.org/versions/Unicode15.0.0/ch12.pdf) | gnulinux wrote: | Have you ever written software before Unicode? We had N | different encodings for each language, each culture, each | country. There were all kinds of bugs creeping up, and software | that works perfectly well could be buggy for one random | language. Unicode abstracted all of this away from the | programmer in a pretty simple fashion. I simply do not see how | we're paying the "complexity tax" by using Unicode, unless | you're writing a _library_ that handles Unicode (which you | shouldn 't do, you should use existing libraries) you don't | need to know anything about Unicode. | mkipper wrote: | Before Unicode, everyone who came up with a character encoding | scheme probably thought their system was good enough for any | reasonable use-case. But they all had limitations that made | them inadequate for things less obscure than representing some | dead Mongolian language. | | It would be nice if we could come up with some magical system | that optimally encodes all the text that "matters" and ignores | everything else, but history has shown that to be very hard. So | we're left with Unicode, which takes the approach of giving us | (effectively) infinite code points to represent characters, | with (effectively) infinite ways to visually represent them. | That does lead to a bunch of "unnecessary" baggage and | headaches, but it also solves a bunch of real problems that you | probably don't know exist. | | Unicode is a pain in the ass, but it's a solution to a very | hard problem. You can feel free to design your own solution, | but you'll probably run head-first into all the problems | Unicode was trying to solve from 40 years ago. | lifthrasiir wrote: | Your notion of character doesn't necessarily match others, and | there are many cases where the number of possible "characters" | in some notion is unbounded. Unicode provides a very well- | defined superset of those notions _for you_. Collecting | characters is only a minor portion of their jobs. | BlueTemplar wrote: | I'm getting the impression that this is only "obvious" from a | latin-cyrillic-greek alphabet point of view ? | | P.S.: Also, even for those, it would seem that one of the big | reasons for things like combining characters was added to | Unicode in order to be backwards compatible even with mutually | incompatible encodings ? ___________________________________________________________________ (page generated 2022-09-19 23:00 UTC)