hngopher.com

       [HN Gopher] Unicode character "" (U+A66E) is being updated
       ___________________________________________________________________
        
       Unicode character "" (U+A66E) is being updated
        
       Author : SerCe
       Score  : 251 points
       Date   : 2022-09-19 11:39 UTC (11 hours ago)
        
 (HTM) web link (twitter.com)
 (TXT) w3m dump (twitter.com)
        
       | iLoveOncall wrote:
       | I want to be that person that has so much time on their hand they
       | can afford to waste it on pointless things like this.
        
         | shadowgovt wrote:
         | There's a career path to get there. It involves becoming
         | someone who cares deeply about the ways and means of digitizing
         | data stored in analog media. Drill down deep enough, and you'll
         | find yourself in a fascinating world of coding an error.
         | 
         | There are things like the "ghost characters," which are
         | codepoints in Japanese that map to characters that were
         | basically transcription errors when the team was putting
         | together a full set of Kanji. Some characters with an extra
         | horizontal line snuck into the set; they were likely caused by
         | a transcription error because the character got split onto two
         | pieces of paper by lines of text being copy-pasted into a
         | records book, and the shadow cast by the thin extra layer of
         | paper was misinterpreted as another stroke.
         | 
         | https://www.dampfkraft.com/ghost-characters.html
        
       | hidudeurcool wrote:
        
       | dmz73 wrote:
       | And then people wander why software developers don't care to
       | support Unicode properly. First 60,000+ characters made sense,
       | than few more were needed and Unicode suddenly got to play with a
       | 1,000,000+ and just went off the rails.
        
         | lifthrasiir wrote:
         | You can support Unicode without ever having to display all
         | possible characters "correctly".
        
       | perihelions wrote:
       | Related thread, about non-existent CJK characters ending up in
       | Unicode through transcription mistakes ("ghost characters"):
       | 
       | https://news.ycombinator.com/item?id=32095502 ( _" A Spectre Is
       | Haunting Unicode"_, 180 comments)
       | 
       | edit to add: The top thread in the 2020 repost was about ,
       | 
       | https://news.ycombinator.com/item?id=24955536
        
       | sshine wrote:
       | (+D)+Shan +-+
        
         | SnooSux wrote:
         | Be not afraid
        
           | drewzero1 wrote:
           | Bee not afraid?
        
             | msla wrote:
             | Bee Nut Afraid.
             | 
             | (When an apiarist is terrified.)
        
         | thechao wrote:
         | +-+no(  _ no)
        
         | loudmax wrote:
         | >      ===
         | 
         | The James Webb Space Telescope.
        
         | throwaway98797 wrote:
         | cant unsee
        
           | Izkata wrote:
           | Don't worry, you'll forget about this one when it gets six
           | more eyes.
        
       | rafaelturk wrote:
       | Finally!
        
       | hulitu wrote:
       | > Unicode character "" (U+A66E) is being updated
       | 
       | I fear this will lead to a lot of "bug fixes and performance
       | improvements" in Android. /s
        
       | vintermann wrote:
       | Biblically accurate O?
        
       | Waterluvian wrote:
       | I'm not sure how I feel about this. I'm not an expert by any
       | means.
       | 
       | But something just doesn't feel right when you've got unicode
       | with a character with one known use from forever ago.
       | 
       | Doesn't this open up the flood gates to just a ridiculous amount
       | of work or else biased gatekeeping?
       | 
       | How much work would it be to implement your own font of the
       | entire unicode set? Or is that not actually a thing and fonts
       | implement as-desired subsets?
        
         | lifthrasiir wrote:
         | > How much work would it be to implement your own font of the
         | entire unicode set? Or is that not actually a thing and fonts
         | implement as-desired subsets?
         | 
         | You can't, and you are not expected to do so. You are limited
         | by OpenType limit (65,535 glyphs), various shaping rules that
         | possibly increase the number of required glyphs, and lack of
         | local or historical typographic convention. Your best bet is
         | either to recruit a large number of experts (e.g. Google Noto
         | fonts) or to significantly sacrifice quality (e.g. GNU
         | Unifont).
        
           | poizan42 wrote:
           | A single OpenType font file is limited to 65,535 glyphs.
           | Nothing stops your font from being implemented as a series of
           | .otf files (besides what people think of as a "font" when it
           | comes to usage on computers).
           | 
           | But yes, time constraints are the limiting factor. I don't
           | think anyone is going to dedicate their entire life to making
           | a single font.
        
             | lifthrasiir wrote:
             | While you are right that one logical font can consist of
             | multiple font files (or possibly a OpenType collection),
             | this constraint does affect most typical fonts. Wide-
             | coverage CJK fonts already hit this limit. Fonts supporting
             | only one of Chinese, Japanse and Korean don't need that
             | many glyphs, and probably even two of them will be okay,
             | but fonts with all three sets of glyphs won't. It is
             | therefore common to provide three versions of fonts, all
             | differently named.
        
           | Waterluvian wrote:
           | I wasn't aware of the 2^16 limitation. Thank you for the
           | notes!
        
         | aasasd wrote:
         | I'll tell you more: there are Unicode glyphs without known
         | usage.
        
         | gumby wrote:
         | There are quite a few such characters in Unicode because
         | academic articles about things like cuneiform need to be
         | digitized too. And because the historical record is so sparse,
         | we often have vanishingly few, or only one example of a
         | character, and perhaps no way to know if it was a misprint or a
         | real character.
         | 
         | Actually this character seems like a scribe's joke, no
         | different from the illustrated characters at the beginning of
         | medieval paragraphs (all of which are represented in Unicode as
         | A, B or whatever). But the point still holds.
         | 
         | It even holds for modern languages -- consider the ghost
         | characters needed for round trip compatibility: https://weekly-
         | geekly.imtqy.com/articles/418717/index.html
         | 
         | (actually cuneiform is a poor example; perhaps Linear A would
         | have been a better example)
        
       | diimdeep wrote:
       | Being stuck on macOS Catalina with Unicode 12, I think there is a
       | way to upgrade to newer versions and get new emoji support [1][2]
       | 
       | [1] https://apple.stackexchange.com/questions/278937/is-
       | there-a-... [2] https://forums.macrumors.com/threads/updating-
       | maverickss-emo...
        
       | quickthrower2 wrote:
       | Crazy that it renders in HN comments (which rejects a lot of
       | Unicode)
        
       | etamponi wrote:
       | By the same reasoning, the 7-eyed O has now been used more than
       | once, so it deserves a glyph! So the right way to do this is to
       | introduce a new character for the correct glyph, and also leave
       | the current one (perhaps changing the title). Otherwise these
       | tweets won't make when read by someone that updated to Unicode
       | 15.0
        
         | echelon wrote:
         | _This_ thread on HN won 't make sense in the future if the
         | Unicode body replaces
         | 
         | Make a new character!
        
         | koboll wrote:
         | Honestly it probably deserves the Pluto treatment:
         | decertification as a character. One historical use in the 1400s
         | doesn't merit a character and never did.
        
           | Pinus wrote:
           | Isn't there an entire Unicode block for the symbols on the
           | Phaistos disc? Yes:
           | https://en.wikipedia.org/wiki/Phaistos_Disc_(Unicode_block) .
           | I suppose those occur in quite a few documents _about_ the
           | disc, even though the disc itself is the only known document
           | written _in_ those symbols.
        
           | colejohnson66 wrote:
           | Unicode's mission is to make _every_ document  "roundtrip-
           | able". Even if a character is only used once, it should be
           | possible to save a plaintext version of the containing
           | document without losing any information. Roughly, I should be
           | able to put a transcription of that one translation from the
           | 1400s on Wikisource without using images.
           | 
           | You may disagree with me, and that's fine, but it doesn't
           | change Unicode's mission. Besides, there's room for 1,112,064
           | codepoints[a], and only 149,146 are in use. It's predicted
           | we'll never use it up, so what harm is there in one codepoint
           | no one will ever need?
           | 
           | [a]: U+10'FFFF max; it used to be U+FFFF'FFFF, but UTF-16 and
           | surrogates ruined that
        
             | djur wrote:
             | Unicode doesn't have a character for every illuminated
             | initial, nor should it. I'm not clear on why this character
             | should be considered any differently.
        
               | skyyler wrote:
               | http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3194.pdf
               | 
               | It was introduced with other "ocular O"s which are
               | seemingly more commonly used than this one.
               | 
               | It's not quite an illuminated initial.
        
               | akavel wrote:
               | Wow, this is probably the most actually useful and
               | interesting comment in this whole discussion, thanks! For
               | anyone interested, the most relevant quotes from the
               | document are in particular:
               | 
               |  _" This document requests the addition of a number of
               | Cyrillic characters to be added to the UCS. It also
               | requests clarification in the Unicode Standard of four
               | existing characters. This is a large proposal. While all
               | of the characters are either Cyrillic characters (plus a
               | couple which are used with the Cyrillic script), they are
               | used by different communities. Some are used for non-
               | Slavic minority languages and others are used for early
               | Slavic philology and linguistics, while others are used
               | in more recent ecclesiastical contexts. We considered the
               | possibility of dividing the proposal into several
               | proposals, but since this proposal involves changes to
               | glyphs in the main Cyrillic block, adds a character to
               | the main Cyrillic block, adds 16 characters to the
               | Cyrillic Supplement block, adds 10 characters to the new
               | Cyrillic Extended-A block currently under ballot, creates
               | two entirely new Cyrillic blocks with 55 and 26
               | characters respectively, as well as adding two characters
               | to the Supplementary Punctuation block, it seemed best
               | for reviewers to keep everything together in one
               | document._
               | 
               |  _(...)_
               | 
               |  _MONOCULAR O , BINOCULAR O , DOUBLE MONOCULAR O , and
               | MULTIOCULAR O  are used in words which are based on the
               | root for 'eye'. The first is used when the wordform is
               | singular, as k; the second and third are used in the root
               | for 'eye' when the wordform is dual, as chi, chi; and the
               | last in the epithet 'many-eyed' as in serafimi
               | mnogochityii 'many-eyed seraphim'. It has no upper-case
               | form. See Figures 34, 41, 42, 55. "_
        
               | j-bos wrote:
               | Because it's already been added to unicode. Now it's not
               | a question of whether or not to add, rather to remove,
               | and unicode almost by definition does not remove.
        
               | thayne wrote:
               | Unicode does have deprecated code points though. Not that
               | I necessarily think making this character deprecated
               | makes sense.
        
             | lmm wrote:
             | Meanwhile one still can't roundtrip regular Japanese
             | without some kind of funky out-of-band signalling. By
             | itself this kind of thing is harmless, but it speaks to
             | poor prioritization from Unicode.
        
             | bityard wrote:
             | Today, I wrote a document by hand containing a new symbol
             | that only looks like genitalia if you squint really hard.
             | Where do I apply to have it included in unicode so that it
             | can be digitized properly?
        
               | lucumo wrote:
               | Rule-lawyering wise-asses try to mess with many policies.
               | It's rarely a sensible indictment of a policy, nor is it
               | very effective. Anyone dealing with such people just
               | ignores them.
        
               | fluoridation wrote:
               | What's the criterion that includes the document in the
               | tweet, but excludes the document referenced by the GP?
        
               | bzxcvbn wrote:
               | https://www.unicode.org/pending/proposals.html
               | 
               | https://www.unicode.org/emoji/proposals.html#selection_fa
               | cto...
        
               | fluoridation wrote:
               | I don't see any anything on the inclusion of symbols that
               | are not icons, such as U+A66E, or the symbol proposed by
               | bityard.
        
               | koala_man wrote:
               | Can you reuse  or ?
        
               | 0xbadcafebee wrote:
               | And for years we've just been using eggplants!
        
               | 411111111111111 wrote:
               | ( no >= [?] <= ) no mi + - +
               | 
               | ~  ( Jjut Jo Jjut ) ~
               | 
               | (  // y  .   // y )
        
               | [deleted]
        
             | layer8 wrote:
             | > Unicode's mission is to make every document "roundtrip-
             | able".
             | 
             | Only for characters from existing coded character sets.
        
             | modzu wrote:
             | why isnt the artist formerly known as prince in unicode?
        
             | koboll wrote:
             | Okay, let's take a look at the context where the
             | multiocular o was used:
             | https://en.wikipedia.org/wiki/Multiocular_O
             | 
             | I see that near it, there is an ef (F) with a very tall
             | stem.
             | 
             | Why should that not be included as a standard unicode
             | character? Surely it is used more often than the
             | multiocular o.
             | 
             | You may say "it's a decorative flourish", which is of
             | course true, but so is the multiocular o. Should we allow
             | every conceivable decorative flourish into unicode? What is
             | the standard for where flourishes become distinct
             | characters?
        
             | shp0ngle wrote:
             | The thing is, this is just a decorative way to write "o".
             | It's not a specific letter by any definition.
             | 
             | I can't speak of other letters that were added in the same
             | batch in 2007. Some of them seam meaningful, I donno, I
             | don't speak old church slavonic (although I am told it
             | sounds like Croatian, which I understand a little)
             | 
             | http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3194.pdf
        
             | kadoban wrote:
             | For as inclusive as that mission is, it seems weird to me
             | how limited in certain areas unicode is. For instance,
             | people use peach emoji since there isn't one for butt,
             | eggplant since there's no penis, etc.
             | 
             | This doesn't contradict the stated goal exactly, but it
             | seems against the spirit of it at least.
        
               | yakireev wrote:
               | One could argue that emoji should have never been added
               | to Unicode in the first place. Peaches and butts are
               | images, pictures, illustrations, whatever - but they are
               | not characters. There's no writing system which has a
               | colored drawing of a peach as a character.
        
               | eternityforest wrote:
               | But that doesn't change the fact that most people use
               | them snd like them, and there is not much technical
               | disruption. They just chose practicality over purity.
        
               | yakireev wrote:
               | Most people (me included) like funny cat videos and send
               | funny cat videos. Shall we include some to Unicode?
               | 
               | I mean, this ship has long sailed, but that was a mistake
               | nevertheless. Not everything has to be a unicode
               | character.
        
               | PeterisP wrote:
               | Yes there is - a widely used character set (when Unicode
               | talks about "writing systems" it explicitly includes all
               | the computer character sets used in practice pre-unicode)
               | used by japanese 'featurephones' had emoji characters, so
               | in order to be able to include that character set in
               | unicode, unicode had to add emoji.
        
               | [deleted]
        
               | bzxcvbn wrote:
               | Yes there is. We're using it right now. Even linguists
               | are studying the use of emoji today.
        
               | vcxy wrote:
        
               | vcxy wrote:
               | I tried to reply with just a unicode penis but that got
               | flagged immediately, so I'll be more substantial and
               | leave out the actual penis. It appears in Egyptian
               | hieroglyphs, so actually there is a penis included in
               | unicode.
        
             | bhk wrote:
             | Cataloging every doodle ever drawn inline with text by
             | anyone at any time in history would exhaust any finite set
             | of code points.
        
             | tzs wrote:
             | If that was once its mission, it was clearly abandoned long
             | ago. They rejected Klingon characters on the grounds that
             | it has low usage for communication, and that many of the
             | people who do communicate in Klingon use a latinized form.
             | 
             |  seems to just be a fancy way of writing O. I haven't seen
             | anything that says it has a different meaning. The
             | arguments for excluding Klingon seem to apply even more so
             | to .
        
               | mananaysiempre wrote:
               | If you look through the old mailing list postings, the
               | oft-left-implicit problem with Klingon (as well as
               | Tengwar, Emerson's pet project) is that it may get people
               | into legal trouble (even though in a reasonable world it
               | shouldn't be able to). So in the unofficial CSUR / UCSUR
               | they remain.
               | 
               | A weird solitary character from the 1400s isn't subject
               | to that, and even if it's a mistake it's probably not
               | worth breaking compatibility at this point (I think the
               | last such break with code points genuinely changing
               | meanings was to repair a mistaken CJK unification some
               | time in the 00s, and the Consortium may even have tied
               | its own hands in that regard with the ever-more-strict
               | stability policies).
               | 
               | Similarly, for example, old ISO keyboard symbols (the [?]
               | for erase backwards, but also a ton of virtually unused
               | ones) were thrown in indiscriminately at the beginning of
               | the project when attempting to cover every existing
               | encoding, but when the ISO decided to extend the
               | repertoire they were told to kindly provide examples of
               | running-text (not iconic) usage in a non-member-body-
               | controlled publication. (Crickets. The ISO keyboard input
               | model itself only vaguely corresponds to how input
               | methods for QWERTY-adjacent keyboards work in existing
               | systems--as an attempt at rationalization, it seems to
               | mostly be a failed one.)
        
               | bobsmooth wrote:
               | Unless it's legitimately someone's native tongue,
               | conlangs shouldn't be in unicode. If there are kids out
               | there that are native Klingon speakers, then you can make
               | the argument it should be included.
        
           | reaperducer wrote:
           | _One historical use in the 1400s doesn 't merit a character
           | and never did_
           | 
           | One _known and surviving_ use. It is possible that it exists
           | in other places, since the vast majority of the planet 's
           | written work has not been digitized. It may also have been
           | used other places that have not survived.
           | 
           | Just because it's not important to you does not mean it is
           | not important.
           | 
           | The fact that is survived for 600 years makes it interesting
           | and worth saving. It is infinitely unlikely that anything you
           | do, write, or say will last that long.
        
             | bhaney wrote:
             | > It is infinitely unlikely that anything you do, write, or
             | say will last that long
             | 
             | Ouch
        
             | koboll wrote:
             | Sure it's possible, but there should be a higher bar than
             | "it's possible it's used more than once" for meriting
             | inclusion in the standard keyboard of billions of devices
             | worldwide.
        
             | tsimionescu wrote:
             | The thing is, looking at the page, there are many other
             | characters that were not added - the large red S-looking
             | characters, for example. But for some "bizarre" reason,
             | those were not included in Unicode...
             | 
             | Of course, the simple answer is that Unicode actually
             | includes any character that someone cares enough to ask to
             | be added, with rare exceptions.
        
           | runarberg wrote:
           | idk. when the word Planet was redefined such that Pluto was
           | no longer a planet, it kind of ruined the word Planet. It
           | suddenly wasn't nearly as useful as a word as it used to
           | (even though now it has a precise meaning). For most people
           | that use the word, it won't matter (and is actually rather
           | exciting) that they keep discovering new planets in our solar
           | system.
           | 
           | If they'd treat the word characters the same way, it would
           | only serve to confuse and do no favors to the remaining
           | glyphs.
        
             | BlueTemplar wrote:
             | This is temporary though, soon people will look at you
             | funny if you say that Pluto is a planet - and/or they might
             | not even have heardof it (though of course that is still
             | worth learning about in an History of Science context).
             | 
             | We do NOT keep discovering new planets, rather minor
             | planets (I agree that the term is confusing), more than a
             | million of them discovered in the Solar System now, like
             | the 9007 James Bond.
        
               | runarberg wrote:
               | It could go either way, it is not always that the
               | scientific meaning wins out, especially not when even
               | scientists don't find the new definition useful.
               | 
               | When I think of a planet, I think of a world that has
               | active geology that isn't a moon (I know excluding moons
               | is arbitrary, and perhaps I shouldn't do that; but hey,
               | that's language for you). I honestly don't care about the
               | orbit, and I bet that when most people think about
               | planets they aren't thinking about the orbit either, let
               | alone whether the planet has cleared the orbit or not. I
               | doubt that will change.
        
               | gerikson wrote:
               | > When I think of a planet, I think of a world that has
               | active geology
               | 
               | Wouldn't that definition rule out gas giants?
        
               | runarberg wrote:
               | Yeah, probably strictly... But I'm not a planetary
               | scientist. I'm merely a user of language, and I don't
               | need to be rigorous in my definitions. And to me the
               | weather patterns on Jupiter is an interesting feature
               | enough to count as geology (even though it is probably
               | not strictly a geology).
        
               | jameshart wrote:
               | No just that, but whether or not Mars is still
               | geologically active is still an open question. If you
               | admit planets on the basis that they have a history of
               | geological activity, then Ceres is a planet too.
               | 
               | I don't think anybody considers geological activity as
               | particularly useful for classifying things as 'planet' or
               | 'not planet'.
        
               | runarberg wrote:
               | Why shouldn't Ceres be a planet? If Pluto gets to be a
               | planet then Ceres is definitely a planet.
               | 
               | But there is still geology active geology on Mars. There
               | are still moisture, winds and glaciers that are shaping
               | the environment. I consider that to be geologically
               | active.
        
             | [deleted]
        
           | PeterisP wrote:
           | At the moment this character is used in many documents and
           | databases - including comments in this thread, the article
           | mentioned there, etc.
           | 
           | There could have been a good case not to include it back in
           | 2007, but once it has been included, excluding it would break
           | stuff.
        
             | BlueTemplar wrote:
             | And updating it rather than adding a new, correct one,
             | might make the current uses confusing ?
             | 
             | Speaking of which, do we have any similar hexagonal symbol
             | ?
        
         | jotato wrote:
         | My thought as well
        
         | baybal2 wrote:
         | Unicode basic rule is that character definitions never ever
         | change, even when enumerated erroneously.
        
           | Arnt wrote:
           | Yes, but this is a change either way, because that
           | codepoint's definition referred to that character. Either the
           | reference or the description of the appearance has to change.
        
             | echelon wrote:
             | Make a new character. Updating the existing character ruins
             | the meaning of all previous usages.
             | 
             | It's like trying to change an API. Don't disrespect your
             | existing users. Make a new version.
             | 
             | ( [?]?)
             | 
             | Think of all the ASCII art this botches. That has to have
             | some historical importance to the Unicode standards body.
             | 
             | ([?]_)
             | 
             | For scholarly digital (unprinted) documents where the
             | correct character rendering matters, erroneous past usages
             | can be trivially found with grep, a date search, and easily
             | corrected. The domain experts will familiarize themselves
             | with this issue and fix the problem. Don't take a shotgun
             | to it!
             | 
             | This message wn't have the riginally intended meaning if
             | the characters are updated from underneath.
        
         | nerfhammer wrote:
         | why not make an additional eye a diacritic mark so you can just
         | add an arbitrary number of eyes
        
         | martin_a wrote:
         | Uff.
         | 
         | I'm not sure we have space for another glyph in Unicode. Looks
         | pretty packed in here...
        
           | BlueTemplar wrote:
           | UTF-8 is still more than 80% empty, and can be potentially
           | extended...
        
             | colejohnson66 wrote:
             | _Theoretically_ , UTF-8 can encode up to 31 bits
             | (U+7FFF'FFFF)[0], but for compatibility with UTF-16's
             | surrogates, it's officially capped to 21 bits with the max
             | being U+10'FFFF[1]. That decision was made November 2003,
             | so there's two decades of software written with hard caps
             | of U+10'FFFF.
             | 
             | [0]: https://www.rfc-editor.org/rfc/rfc2279
             | 
             | [1]: https://www.rfc-editor.org/rfc/rfc3629#section-3
        
       | RcouF1uZ4gsC wrote:
       | I think the big issue with Unicode is that it is centralized and
       | there are politics about what characters get included (see
       | Klingon)
       | 
       | I think I have a solution to decentralize Unicode:
       | 
       | 1. Extend Unicode to 128-bits. We can still use UTF-8 variable
       | length encoding which will limit the real size.
       | 
       | 2. Use a blockchain to coordinate the characters. That way
       | whoever wants to add a character can do it without gatekeeping.
       | 
       | These simple suggestions will go a long way in making Unicode
       | less centralized.
        
       | dhosek wrote:
       | This is not exactly a correct description. Unicode does _not_
       | specify the appearance of characters, only their meaning. It
       | seems what's changed is the reference presentation of the
       | character in the Unicode tables, not the character itself.
       | Unicode goes to great lengths to preserve backwards compatibility
       | so changing the meaning of a code point would violate that
       | principle. Your OS or application providing Unicode 15.0.0
       | support will not change the appearance of U+A66E. The appearance
       | is dependent on the font.
        
       | idlewords wrote:
       | They should put in a few additional eyes as hot spares.
        
       | xanathar wrote:
       | So it's a Unicode character that represents a... blob with 10
       | eyes?
       | 
       |  _Hordes of Wizards of the Coast lawyers getting ready for the
       | big fight_
        
         | gedy wrote:
         | Name checks out:
         | https://forgottenrealms.fandom.com/wiki/Xanathar_(original)
        
         | supernewton wrote:
         | Nah, Beholders have 11 eyes, so we're good here.
        
           | tsimionescu wrote:
           | I feel like the spelling should be updated to Behlders, or
           | better yet, Behlders, to reflect that (of course, this would
           | only make sense once the glyph update actually hits).
        
       | ElfinTrousers wrote:
       | Am I alone in thinking that this is not so much a separate
       | character, as a doodle a bored monk made to relieve a tiny bit of
       | the tedium of copying manuscripts?
        
       | BearOso wrote:
       | And its new official name shall be the Trypophobigon.
        
       | xashor wrote:
       | Too bad I have to adjust my business cards for .world
        
       | Traubenfuchs wrote:
        
         | remedan wrote:
         | We do have a Unicode character for a gun: U+1F52B PISTOL. Most
         | fonts that have it choose to style it as a water gun, though.
        
         | dafoex wrote:
         | There's an emoji for handgun, but Apple and other big tech
         | decided it needed to be a water gun. There is also a rifle
         | character intended to represent the sport of shooting in a
         | pentathlon, but again Apple threw its weight around and, while
         | the character became codified in Unicode, it never became an
         | emoji and no font from big tech supports it.
        
         | jrockway wrote:
         | I guess because the goal of Unicode is to be able to represent
         | every character that's appeared in language. This one is in a
         | published book, while guns and a sexual intercourse symbol
         | aren't.
         | 
         | Emoji was a weird value add that Japanese mobile providers
         | added to their phones before Unicode. To get them to move to
         | Unicode, they had to keep them. That's why there's a Tokyo
         | Tower emoji, but not an Eiffel Tower. That's why the post
         | office has a @ on it. That people get any use out of emoji
         | outside of Japan is really pure luck.
        
           | ElfinTrousers wrote:
           | That seems actually logical when you consider that kanji
           | presumably began as simple depictions of objects that could
           | be drawn quickly. Perhaps the only difference between emoji
           | and kanji is time.
        
           | shadowgovt wrote:
           | I've even heard emoji referred to as "the carrot that keeps
           | the implementations current." Every time a new version of
           | Unicode is published, a few more emoji are tacked on. It acts
           | as incentive for all the cellphone carriers and such to put
           | the money into updating their implementations, because nobody
           | wants to be the one on the block with the one phone that
           | can't render "Mirror Ball" .
           | 
           | (ETA: LOL, Hacker News drops "Mirror Ball"
           | https://emojipedia.org/mirror-ball/ from the comment when you
           | post)
        
             | Traubenfuchs wrote:
             | I believe the majority of emoji do not work on hacker news.
        
             | jrockway wrote:
             | Incidentally, Windows doesn't have the mirror ball. I guess
             | it is a carrot to get me to upgrade to Windows 11, which I
             | am skipping. (The key with Windows is to only use the good
             | versions; XP, 7, 10, ???. Hoping ??? arrives soon ;)
        
               | int_19h wrote:
               | It's not in Win11 yet.
        
         | Dwedit wrote:
         | There are heiroglyph dicks in unicode, see U+130B8.
        
           | Traubenfuchs wrote:
           | I even posted phallus with emission in my comment above.
           | 
           | I can see it on latest iOS, but not on Windows 10 + Chrome.
        
       | rizoma_dev wrote:
       | I'm always happy to see some esoteric unicode updates
        
       | diimdeep wrote:
       | Here[1][2] is the scan of manuscript from 1429, image #251
       | 
       | [1] https://lib-fond.ru/lib-rgb/304-i/f-304i-308/#image-251 [2]
       | https://web.archive.org/web/20110927102700/https://www.stsl....
        
         | aasasd wrote:
         | So the text at that point literally talks about 'many-eyes
         | seraphims'. The eyes symbol is a pure gag--seems to be spliced
         | in place of the letter 'o' in the word 'eye' just a little down
         | the line. (However, Old Slavonic is a tough read due to no
         | spaces, so I'm not sure about that word. But at least it's not
         | the Glagolitic script, which was just ridiculous and actually
         | had multi-circle letters.)
        
         | klyrs wrote:
         | It's curious that the red ink blobs behind the "eyes" aren't
         | included in the unicode glyph either...
        
       | msoad wrote:
       | This is similar to "man in business suit levitating" emoji.
       | 
       | How this stuff make it to Unicode?!
        
         | shp0ngle wrote:
         | Levitating man is just an unicode encoding of an old Webdings
         | (or windings?) font.
         | 
         | There was an accepted proposal to add many windings and
         | webdings letters as unicode endpoints. Thus, levitating man in
         | a suit.
        
         | octoberfranklin wrote:
         | I miss the good old days when character sets didn't feel the
         | need for _annual updates_.
        
       | baltimore wrote:
       | Is there any end to this? E.g., why not include Galileo's
       | pictograms of Saturn as seen here:
       | https://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=...
        
       | no-reply wrote:
       | I'll check back in the future.
        
       | xenonite wrote:
       | Sadly,  is not eligible for engraving by Apple on AirPods.
        
         | colejohnson66 wrote:
         | As of right now, it's available for "adoption":
         | https://www.unicode.org/consortium/adopt-a-character.html
        
       | adhesive_wombat wrote:
       | Meanwhile I check back every now and again on MUFI (Medieval
       | Unicode Font Initiative) [1] and it's still not in.
       | 
       | [1]: https://mufi.info
        
       | politelemon wrote:
       | Here's the original tweet where the discrepancy was noticed in
       | 2020, and a photograph of a page inside the book where it's used:
       | 
       | https://twitter.com/etiennefd/status/1322673792452354048
        
       | pippy wrote:
       | The Unicode can be ridiculous at times. It contains a character
       | used once in a single manuscript in a extinct language, but not a
       | standardized glyph for an external URL link.
        
       | lordnacho wrote:
       | Wait a minute, how will we refer to the old glyph in the future?
       | Once this is updated the articles such as this one will have the
       | new shape.
        
         | martin_a wrote:
         | "The character formely known as U+A66E"
        
         | lifthrasiir wrote:
         | There was a joke that U+A66E should retain seven eyes and
         | further eyes should be added with a ZWJ sequence [1]. If that
         | character somehow got _very_ popular in modern texts, updating
         | its glyph may result in an interoperability problem so such
         | solution would have been needed. But that didn 't happen so the
         | glyph itself has been updated instead.
         | 
         | [1] https://twitter.com/BabelStone/status/1323440365429542919
        
       | memorable wrote:
       | Alternative frontend version:
       | https://nitter.net/jonty/status/1571615998335123457
        
       | jerf wrote:
       | When my kids were young, I accidentally flubbed the pronunciation
       | of "Santa Claus" once and said something that sounded a lot like
       | "Centiclops", which I decided to roll with. Centiclops is a lot
       | like a cyclops with one eye, except the as a reading of the roots
       | clearly indicates, this is a creature with 100 eyes.
       | 
       | Today I learn that Centiclops effectively has a Unicode
       | character. As Centiclops' representative in the world of the non-
       | imaginary, we accept that a Unicode character with a hundred eyes
       | is not practical and we accept the representation with just a few
       | eyes, but generally agree that upgrading to 7 to 10 is a nice
       | improvement, as 7 does not evenly divide into 100 but 10 does.
       | This is important, because... reasons.
        
         | sshine wrote:
         | From "The House of Asterion" by Jorge Luis Borges:
         | 
         | "It is true that I never leave my house, but it is also true
         | that its doors (whose numbers are infinite) (footnote: The
         | original says fourteen, but there is ample reason to infer
         | that, as used by Asterion, this numeral stands for infinite.)
         | are open day and night to men and to animals as well."
         | 
         | https://klasrum.weebly.com/uploads/9/0/9/1/9091667/the_house...
        
         | thaumasiotes wrote:
         | > Centiclops is a lot like a cyclops with one eye, except
         | th[at] as a reading of the roots clearly indicates, this is a
         | creature with 100 eyes.
         | 
         | Not in any normal sense of "roots". _Cent_ is a Latin root
         | meaning 100. _ops_ is a Greek form meaning eye. The -i-
         | indicates that the word is being formed in Latin, and the -cl-
         | is entirely spurious. The original Greek word divides as cycl-
         | ops, not cy-clops.
        
           | inopinatus wrote:
           | In any case, there is already an ancient, general, and
           | perfectly serviceable epithet _Panoptes_.
        
           | martyvis wrote:
           | A bit like the heli-copter | helico-pter thing.
        
         | layer8 wrote:
         | There should be a combining "eye" character so that you can
         | have as many or few eyes as you like.
         | 
         | Though to be honest, that Unicode character looks more like a
         | bunch of cells forming a tissue to me than eyes.
        
         | doodpants wrote:
         | Or perhaps this character is an accurate representation of a
         | Dekaclops.
        
           | jerf wrote:
           | My client finds your proposal offensive and an appropriation
           | of his culture, and also that Dekaclops guy is mean and
           | smells bad and hasn't returned the lawnmower my client lent
           | him even though my client has clearly referred to the need to
           | mow his lawn several times now so he totally doesn't deserve
           | a Unicode character.
        
           | tzot wrote:
           | It'd be dekaops, because the -cl- is part of "cy _cl_
           | e"+"ops" (one round eye, with the one dropped because it's
           | inferred). So "cycle" out, "deka" in.
        
         | tempodox wrote:
         | "Santa Clause" would translate to "holy clause". There might be
         | such a thing but I think you meant Santa Claus :)
        
           | dtparr wrote:
           | Maybe just a big fan of the Tim Allen movie?
        
           | jerf wrote:
           | My fingers love adding the e's on the end of any worde that
           | can conceivably take them. Also have that problem with any
           | word that can take an "ly" even if I don't meanly it.
           | 
           | Fixed, thanks.
        
           | JohnFen wrote:
           | I thought "santa" meant "saint"?
        
             | 0xbadcafebee wrote:
             | It does; the character originates from Saint Nicholas (or
             | Odin, depending who you ask)
        
             | felix318 wrote:
             | "Santa" means "female saint" in Italian and Spanish.
             | Perhaps the English "santa" came from another language but
             | I always found the name "Santa Claus" just horrible.
        
               | Archelaos wrote:
               | The first mention of this version of Saint Nicholas's
               | name has the form "St. A Claus" and appeared in the New-
               | York Gazette of 20 Dec 1773.[1] The same issue also first
               | reported some incident regarding tea in Boston harbour.
               | Nice coincidence.
               | 
               | [1] Source: https://boston1775.blogspot.com/2016/12/st-
               | claus-was-celebra...
        
               | nohuck13 wrote:
               | The name Santa Claus evolved from Nick's Dutch nickname,
               | Sinter Klaas, a shortened form of Sint Nikolaas (Dutch
               | for Saint Nicholas)
               | 
               | https://www.history.com/.amp/topics/christmas/santa-
               | claus#si...
        
             | thaumasiotes wrote:
             | > I thought "santa" meant "saint"?
             | 
             | Well, _santa_ is a Spanish word meaning  "holy" and _saint_
             | is a cognate French word meaning the same thing. They
             | descend from Latin _sanctus_ ; compare _sanctify_.
             | 
             | When the prayer goes "holy Mary, mother of god", "holy
             | Mary" is an exact equivalent of "santa Maria".
        
               | robocat wrote:
               | Might as well mention "Sancta Maria" in Latin, for
               | example from the Christian Hail Mary[1], a recorded Latin
               | version[2], written Latin next to English and Spanish[3]
               | and of course translated into _thousands_ of languages[4]
               | although unfortunately mostly written using _/ A-Z/i_; I
               | am an atheist interested in languages.
               | 
               | [1] https://en.m.wikipedia.org/wiki/Hail_Mary
               | 
               | [2] https://glaemscrafu.jrrvf.com/english/avemaria.html
               | 
               | [3] https://hymnary.org/text/hail_mary_full_of_grace_the_
               | lord_is...
               | 
               | [4] http://www.marysrosaries.com/Rosary_prayers_in_differ
               | ent_lan...
        
             | fortran77 wrote:
             | I thought it was a misspelling of Satan, but maybe that's
             | because I'm Jewish.
        
             | wongarsu wrote:
             | Saint is more or less the same as holy, just used as a
             | title. It comes from Old French saint, seinte "holy, pious,
             | devout," from Latin sanctus "holy, consecrated"
        
       | kratom_sandwich wrote:
       | I love this character and I love the fact that is being updated.
       | Just to get this right: at some point some person chose to doodle
       | the letter instead of writing it the correct way and now we have
       | a corresponding Unicode character? Sort of amazing and it also
       | makes you think ...
        
         | lmkg wrote:
         | There was a... "tradition" is a strong word, perhaps "trend" is
         | better. Authors making copies of the Bible or related works in
         | Cyrillic, that the letter O (equivalent to Roman O) at the
         | beginning of the word for "eye" would be stylized to look like
         | an eye. There are a variety of glyphs along these lines: , , .
         | All of them, including , were added to Unicode as a single
         | group.
         | 
         | The glyph "" was used to refer to an Angel with a whole buncha
         | eyeballs, as one does. In terms of texts that survive today,
         | this specific glyph has exactly one use in a single manuscript
         | from the 1400's. It might have been used more, in texts which
         | don't survive. But it is part of a larger trend, and I bet that
         | its inclusion in Unicode depends strongly on that.
         | 
         | But yeah, in itself the  character exists solely so that modern
         | computers are capable of a more-faithful rendition of the
         | transcription of a single handwritten copy of the Book of
         | Psalms.
        
           | happytoexplain wrote:
           | Thank you for describing the missing context. I couldn't
           | understand why this stylized letter deserved a code point
           | more than the uncountable others. I don't necessarily agree
           | still, but the fact that this character was only unique
           | _within a larger trend_ makes it much more reasonable.
        
           | henriquecm8 wrote:
           | So you are saying that the glyph is now more biblically
           | accurate?
        
             | int_19h wrote:
             | The Bible doesn't specify how many eyes seraphim have.
             | 
             | "In the center, around the throne, were four living
             | creatures, and they were covered with eyes, in front and in
             | back. ... Each of the four living creatures had six wings
             | and was covered with eyes all around, even under its
             | wings."
        
           | vintermann wrote:
           | Hah, and here I thought I was making a joke when I called it
           | a biblically accurate O!
        
           | cyral wrote:
           | > modern computers are capable of a more-faithful rendition
           | of the transcription of a single handwritten copy of the Book
           | of Psalms.
           | 
           | I wonder if there is even a copy of the book transcribed to
           | actual characters or if it only exists as scanned PDF copies?
           | If anyone did transcribe it, would they have any knowledge
           | that the  character even exists on computers?
        
         | cillian64 wrote:
         | It does raise interesting questions about what counts as
         | decoration/formatting and what counts as part of the actual
         | text. You could view these ocular O characters as purely
         | decorative (like the fancy first character in a paragraph) but
         | they could also be seem as a quirk of spelling which should be
         | represented in unicode.
         | 
         | But the multiocular O really does seem like one monk got bored
         | one time and did some doodling.
        
         | Arnt wrote:
         | I attended a Unicode meeting (or maybe two? not sure?) and came
         | away with the impression that Unicode is like those open source
         | projects that are used by half of the world and maintained by a
         | handful of skilled and benevolent people.
         | 
         | In Unicode's case I think most of them are paid, at least.
        
           | shp0ngle wrote:
           | That is what I understood too. It doesn't seem particularly
           | hard to add new letters to Unicode too if you try a bit.
           | 
           | However that is a bit harder with emojis, that have their own
           | subcommittee, which seem to be more bureaucratic and also
           | more popular than the rest of Unicode. Everyone wants to make
           | a new emoji.
        
       | Stamp01 wrote:
       | I don't understand why this character needs to exist given that,
       | at least according to the author, it has only been seen once in
       | the wild, and it's semantically identical to another more widely
       | used character.
       | 
       | I'm glad I'm not responsible for unicode. Clearly I have the
       | wrong mindset for it.
        
         | 1-6 wrote:
         | I agree with your mindset. It's time for a unicode replacement.
        
         | lifthrasiir wrote:
         | Surprisingly many characters in Unicode are only recorded a few
         | times if not once before the assignment. Chinese characters for
         | example have a lot of them, because it was relatively frequent
         | to make a new character for newborns before the modernity and
         | some of them have survived through literatures but otherwise
         | seen no uses (e.g.  U+21E2B only appears once in the _Records
         | of the Three Kingdoms_ San Guo Zhi ). But they have still
         | received code points because they are considered essential for
         | digitaization of historical works, and multiocular O is no
         | different.
        
         | bogwog wrote:
         | Imagine you're a historian from the future studying some old
         | document, and you spot a weird character that you've never seen
         | before. Wouldn't it be useful to be able to search for that
         | character to see if it shows up in any other document? A simple
         | OCR scan will bring up all the information you could ever need
         | for that one weird symbol.
        
         | PeterisP wrote:
         | Perhaps it's relevant to look at how it was introduced - as a
         | "package deal" with many, many characters from medieval
         | cyrillic literature, as described in this proposal
         | https://www.unicode.org/L2/L2007/07003r-n3194r-cyrillic.pdf
         | 
         | It certainly made sense to include this package in Unicode, and
         | the vast majority of those characters certainly should be in
         | this proposal. You do have to draw the line somewhere, and
         | obviously those close to the line will be debatable, no matter
         | where you chose to draw it, like this particular symbol - but
         | once you've decided that you will include the one-eyed O (small
         | and capital) and the two-eyed O (small and capital), then
         | putting in the many-eyed O as well to complete the set doesn't
         | seem so far-fetched.
        
         | shadowgovt wrote:
         | It's been seen once in the in-print wild.
         | 
         | There's no way to know how many since-written documents will
         | break if a whole codepoint is dropped.
        
       | wheybags wrote:
       | This kind of stupid thing is my problem with Unicode. We have all
       | this baggage for stuff that _nobody uses_ , and we need to deal
       | with it forever. The worst for me is the way there is no possible
       | way to encode a grapheme cluster as a constant size, so using
       | Unicode make it impossible to have simple character access like
       | an old style c string, no matter how big you make your char, even
       | though it's totally possible with damn near every language that
       | people actually use.
       | 
       | So then we all end up paying this massive complexity tax
       | everywhere to pay for support for some Mongolian script that died
       | out 200 years ago (or multi codepoint encodings of simple things
       | like e - just why, it was so avoidable).
        
         | JohnFen wrote:
         | I hear you. I loathe working with Unicode for this exact
         | reason. It's a bit of a nightmare due to its complexity.
         | 
         | That said, what it's trying to do is enormously complex.
        
         | svat wrote:
         | > _encode a grapheme cluster as a constant size [...] totally
         | possible with damn near every language that people actually
         | use_
         | 
         | This is not true. For a concrete example: the languages Hindi
         | and Marathi, with ~500 million speakers, use the Devanagari
         | script (also used by Nepali and Sanskrit), in which a grapheme
         | cluster is (usually) a sequence of consonants followed by a
         | vowel. For instance, something like "bhuktva" (bhuktvaa) would
         | be two grapheme clusters, one (bhu) for "bhu" and one (ktvaa)
         | for "ktva". In Unicode each vowel and consonant (here, bh, u,
         | k, t, v, a) is separately encoded, which is the only reasonable
         | thing to do, and inevitably means that grapheme clusters can
         | have different lengths (number of code points). The alternative
         | would have been to encode every possible (sequence of
         | consonants + vowel) as a single codepoint, which gets
         | ridiculous quickly: these sequences can be up to 5 consonants
         | long, so you'd end up having to encode (33^5 * 13 [?] 500M)
         | codepoints for Devanagari alone (or completely prevent certain
         | sequences of consonants from being expressed, which makes no
         | sense either), not to mention that most of the scripts of the
         | Indian subcontinent and south-east Asia follow the same
         | principle and have similar issues (e.g. Bengali with 250M
         | speakers, Telugu, Javanese, Punjabi, Kannada, Gujarati, Thai
         | with over 50M speakers each, etc).
         | 
         | (See chapters 12-17 of the Unicode standard, currently version
         | 15: https://www.unicode.org/versions/Unicode15.0.0/ch12.pdf)
        
         | gnulinux wrote:
         | Have you ever written software before Unicode? We had N
         | different encodings for each language, each culture, each
         | country. There were all kinds of bugs creeping up, and software
         | that works perfectly well could be buggy for one random
         | language. Unicode abstracted all of this away from the
         | programmer in a pretty simple fashion. I simply do not see how
         | we're paying the "complexity tax" by using Unicode, unless
         | you're writing a _library_ that handles Unicode (which you
         | shouldn 't do, you should use existing libraries) you don't
         | need to know anything about Unicode.
        
         | mkipper wrote:
         | Before Unicode, everyone who came up with a character encoding
         | scheme probably thought their system was good enough for any
         | reasonable use-case. But they all had limitations that made
         | them inadequate for things less obscure than representing some
         | dead Mongolian language.
         | 
         | It would be nice if we could come up with some magical system
         | that optimally encodes all the text that "matters" and ignores
         | everything else, but history has shown that to be very hard. So
         | we're left with Unicode, which takes the approach of giving us
         | (effectively) infinite code points to represent characters,
         | with (effectively) infinite ways to visually represent them.
         | That does lead to a bunch of "unnecessary" baggage and
         | headaches, but it also solves a bunch of real problems that you
         | probably don't know exist.
         | 
         | Unicode is a pain in the ass, but it's a solution to a very
         | hard problem. You can feel free to design your own solution,
         | but you'll probably run head-first into all the problems
         | Unicode was trying to solve from 40 years ago.
        
         | lifthrasiir wrote:
         | Your notion of character doesn't necessarily match others, and
         | there are many cases where the number of possible "characters"
         | in some notion is unbounded. Unicode provides a very well-
         | defined superset of those notions _for you_. Collecting
         | characters is only a minor portion of their jobs.
        
         | BlueTemplar wrote:
         | I'm getting the impression that this is only "obvious" from a
         | latin-cyrillic-greek alphabet point of view ?
         | 
         | P.S.: Also, even for those, it would seem that one of the big
         | reasons for things like combining characters was added to
         | Unicode in order to be backwards compatible even with mutually
         | incompatible encodings ?
        
       ___________________________________________________________________
       (page generated 2022-09-19 23:00 UTC)