[HN Gopher] DuckDuckGo \u202E
       ___________________________________________________________________
        
       DuckDuckGo \u202E
        
       Author : zeepzeep
       Score  : 146 points
       Date   : 2022-02-15 20:11 UTC (2 hours ago)
        
 (HTM) web link (duckduckgo.com)
 (TXT) w3m dump (duckduckgo.com)
        
       | kroltan wrote:
       | It's intentional, if you inspect the `innerText` you'll see it's
       | reversed there too:
       | zero_click_wrapper.innerText.codePointAt(0)
       | 
       | Evaluates to 32. And if you think 32 = 0x20 could mean the next
       | one would be 0x2E, then no, codePointAt(1) is 0x55.
        
         | nneonneo wrote:
         | `innerText` doesn't include the RTL marker, probably due to the
         | fact that it is supposed to reflect the "rendered" appearance
         | of the element (i.e. deleting certain invisible characters).
         | However, `textContent` shows the RTL marker as expected.
         | 
         | I'm on the side of this being an unintentional effect.
        
       | benbristow wrote:
       | Reversed: U+202E RIGHT-TO-LEFT OVERRIDE, decimal: 8238, HTML: No
       | visual representation, UTF-8: 0xE2 0x80 0xAE, block: General
       | Punctuation
        
       | gambler wrote:
       | Extremely bad design. This kind of complexity should have been
       | moved to some kind of post-processing spec rather than core
       | Unicode. It's already causing issues and will cause more. The
       | more universal something is, the more effort should be applied to
       | keeping it simple.
        
         | [deleted]
        
         | the_mitsuhiko wrote:
         | I strongly disagree. This is a necessary part to shared content
         | text and pushing this type of functionality into another layer
         | makes a lot of content non accessible in basic text format.
         | This is precisely the type of control character that makes
         | Unicode such a powerful and successful system.
        
         | [deleted]
        
         | mananaysiempre wrote:
         | ... It's not clear how? Except by telling every speaker of
         | Arabic and Hebrew saying they want some of that delicious
         | "plain text" action to go screw themselves (there are _no_
         | purely-RTL texts, only bidirectional ones, not least because of
         | the Indic numerals). AFAIU (at least from the full-length
         | horror novel that is the CDRA) IBM tried presentation-order
         | (and no-complex-shaping) RTL text for decades and gave up, so
         | Unicode bidi is essentially the result of said giving up (and
         | the "Arabic Presentation Forms" block the foul-smelling corpse
         | of the idea).
         | 
         | Specify the dominant direction of your user-input-containing
         | elements, people, and/or enclose the input in U+2068 FSI ...
         | U+2069 PDI (after balancing outstanding bidi controls inside).
        
       | soheil wrote:
       | What's next, searching for the word death causes you to die?
        
       | soheil wrote:
       | Where does DDG get its search result? Do they scrape Google? If
       | so how do they not bet banned both technically and legally?
        
         | thesuitonym wrote:
         | They have their own web crawlers, as well as a deal with Bing
         | (And perhaps others)
        
         | sp332 wrote:
         | https://help.duckduckgo.com/duckduckgo-help-pages/results/so...
        
       | echelon wrote:
       | You still have to be mindful of \u202e in anything new that
       | you're writing, but browsers do a much better job of not having
       | it bleed across elements like they did back in the 2000s.
       | 
       | Back in the era of forums that didn't support unicode correctly
       | (2005ish?), it was trollish fun to post messages containing
       | \u202E and watch the UI and all subsequent messages and elements
       | get messed up. (One stray \u202E would flip the entire page
       | contents following it.) I never took it to a level of abuse since
       | it was easy to remove and then ban offenders, but it was fun in a
       | one-off thread, and it always had great reactions.
       | 
       | I patched my own software to handle it, but I don't recall anyone
       | really abusing it in a widespread manner. (Contrast this with the
       | era of prolific and widely abused AOL/AIM exploits that would
       | kill your IM client with malformed messages.)
       | 
       | IIRC, a bunch of messaging clients also didn't (or still don't)
       | handle \u202e termination and it sometimes bled into new messages
       | and even the text input box. That was pretty horrible and
       | unfixable without restarting.
       | 
       | Obligatory XKCD: https://xkcd.com/1137/
       | 
       | Some shenanigans in the wild:
       | 
       | https://www.reddit.com/r/Unicode/comments/hc1rxi/i_put_a_rig...
       | 
       | https://twitter.com/mkolsek/status/1237123571341803522
       | 
       | (These are way tamer than the effects used to be.)
       | 
       | (Also, HN filters it out. I tried to have some fun. :P)
        
       | splch wrote:
       | Oh that's cute! Translation for anyone curious / lazy:
       | 
       | Punctuation General :block ,0xAE 0x80 0xE2 :8-UTF ,representation
       | visual No :HTML ,8238 :decimal ,OVERRIDE LEFT-TO-RIGHT 202E+U
       | 
       | Love the demos :)
        
       | heartbeats wrote:
       | Why can't I just disable RTL on my system?
       | 
       | I do not speak a word of Arabic. There is no circumstance in
       | which my life will be materially improved by correct RTL text
       | rendering. I might want proper display of individual characters
       | so I can copy-paste them, but I have no use for RTL text.
       | 
       | On the other hand, RTL causes a lot of unpleasant problems like
       | this. Why can't I simply coerce all foreign languages into LTR?
        
       | hnlmorg wrote:
       | If there was ever a clear signal that working with Unicode is
       | incredibly hard, it would be the fact that no one on HN can
       | decide if this is accidental or intentional.
        
         | [deleted]
        
         | tedunangst wrote:
         | A significant portion of the problem seems to be that some
         | people can't even identify what's going because the tools
         | they're using to inspect the page are also showing it reversed.
        
         | divbzero wrote:
         | Let me take a stab at a definitive answer:
         | 
         | - It is unintentional for DuckDuckGo. The code for DuckDuckGo
         | works correctly but no one who wrote that code thought about
         | whether a reversal would happen.
         | 
         | - It is intentional for the browser. The code for the browser
         | works correctly and someone who wrote that code actively
         | thought through how to make a reversal happen.
         | 
         | I don't think 'accidental' is the right word to use in either
         | case because the outcome is what you would want.
        
         | shockeychap wrote:
         | This! Also, https://news.ycombinator.com/item?id=21105625
        
         | tshaddox wrote:
         | It certainly looks like a simple template that DDG applies
         | consistently to all queries for a UTF-8 byte literal. It's the
         | exact same template for a query for a more straightforward
         | literal, like u0041.
         | 
         | So I think it's fair to say that it's not intentional in the
         | sense of being a deliberately added easter egg. Of course, they
         | might be aware of the behavior and decided to leave it that
         | way.
        
         | barbazoo wrote:
         | And some of us don't even get what this is about. Should I be
         | seeing DDG doing something particular here?
        
           | dtech wrote:
           | The "answer" tab is right to left
        
             | barbazoo wrote:
             | I had that turned off. Thanks for explaining it.
        
           | [deleted]
        
         | iqanq wrote:
         | It's accidental, because other characters are also displayed:
         | https://duckduckgo.com/?q=u20aa
        
           | Retr0id wrote:
           | It's intentional, because there is no RTL override in the
           | HTML source, the string is merely reversed.
        
             | dzaima wrote:
             | but there is, see:                 document.querySelector("
             | .zci__body").textContent.charCodeAt(0)       document.query
             | Selector(".zci__body").textContent.substring(1)
        
             | progval wrote:
             | > no RTL override in the HTML source, the string is merely
             | reversed
             | 
             | What? After opening the source, ctrl-f "representation"
             | selects the reversed word. The source view just happens to
             | interpret the RTL override.
        
       | Jerrrry wrote:
       | Stacking combining diacritics[1] is also fun, to make extremely
       | tall text.
       | 
       | Also fun is enumerating all the characters in the Private
       | Character section[2] to see what UI symbols are able to be
       | inserted into unintended places.
       | 
       | [1] https://www.unicode.org/charts/PDF/U0300.pdf
       | 
       | [2] http://www.unicode.org/faq/private_use.html
       | https://www.unicode.org/charts/PDF/UE000.pdf
        
       | amelius wrote:
       | > This is often abused by hackers to disguise file extensions:
       | when using it in the file name my-text.'U+202E'cod.exe, the file
       | name is actually displayed as my-text.exe.doc
       | 
       | So every programmer has to know about and support U+202E, but not
       | filesystem programmers?
        
         | mananaysiempre wrote:
         | More like UI programmers? It seems that almost everyone has
         | agreed that text-processing smarts inside a filesystem are a
         | bad idea (see: the NTFS collation table, the APFS transition
         | away from ancient-version-NFD-but-not-quite), although there is
         | that island of (admittedly very smart) -insensitive but
         | -preserving holdouts (casing on Windows, normalization on ZFS).
         | Linus rants on the topic[1] passionately, if not very
         | informatively.
         | 
         | Note that U+202E is a _control code_ that has effect on
         | _display_ , not the logical order of the text (much like, say,
         | a bare CR), so I can't say what the filesystem is doing wrong
         | here (except maybe for not rejecting this outright, but see re
         | smarts above, this probably needs to be done on a higher
         | level). You don't blame the filesystem for believing the
         | filename "A\rB.txt" starts with A and not B, do you? Even
         | though ls will say otherwise.
         | 
         | Bidi IRIs (which _are_ at that higher level) are kind of
         | horrendous, though.
         | 
         | [1] https://yarchive.net/comp/linux/utf8.html
        
         | tedunangst wrote:
         | What do you want the filesystem programmer to do?
        
           | foxfluff wrote:
           | if (!isascii(c)) panic("stupid user");
        
         | tyingq wrote:
         | That's pretty much correct. Most of the filesystems I'm aware
         | of just treat filenames as a "string of bytes" with some list
         | of characters that aren't allowed, and perhaps a few other
         | rules. Other than that, it's a free-for-all on names.
        
         | jamescodesthing wrote:
         | Same works for urls.
        
       | TadeusTaD wrote:
       | Instantly reminded me of a relevant xkcd: https://xkcd.com/1137/
        
         | zeepzeep wrote:
         | Hey that's new to me, I'll use this, thanks.
        
       | tobz1000 wrote:
       | Easter egg or bug?
        
         | Waterluvian wrote:
         | Poe's Law applied to coding easter eggs? :D
        
         | rackjack wrote:
         | Easter bug?
        
         | zeepzeep wrote:
         | That's the question!
         | 
         | (I think it's unintended though)
        
         | oneplane wrote:
         | bug egg? it's also an instant answer from the community (the
         | little info icon on the right hand side) so perhaps just
         | presented that way due to how it was delivered by that specific
         | community member.
        
       | jfk13 wrote:
       | Similarly, if I try https://www.google.com/search?q=u202e, the
       | second result I currently get (YMMV) is from https://unicode-
       | table.com/, and almost the entire snippet shows up backwards in
       | the search results.
        
       | Sebb767 wrote:
       | I'm not sure whether this is a bug or a feature^Weaster egg
        
         | BitwiseFool wrote:
         | I'm out of the loop, what kind of Easter Egg is it?
        
           | brimble wrote:
           | The text in the instant-answer bar is reversed for this
           | result. Which could plausibly either be on purpose, or a
           | result of the character itself being inserted and not
           | escaped, so having its intended effect.
        
         | pwdisswordfish9 wrote:
         | Oversight, probably. By default, the code point is displayed
         | next to that description, and they don't turn that off for
         | bidirectional control characters.
         | 
         | https://duckduckgo.com/?q=u1f4a9
         | 
         | (Yes, I have that one memorized)
        
       | [deleted]
        
       | gunapologist99 wrote:
       | Are there any lists of unicode characters (like the OWASP one)
       | that should be blacklisted from most apps (not just for XSS, but
       | even for desktop apps)?
       | 
       | Are there any good security guides/best practices for unicode
       | sanitation?
        
         | wongarsu wrote:
         | How are users supposed to write "`bvr l duckduckgo.com kdy
         | lkhpsh byntrnt" without \u202E? It's perfectly normal for RTL
         | languages to switch text direction in the middle of a sentence.
        
         | harambae wrote:
         | Not a full security guide, but if you haven't seen this before
         | it's useful to have...
         | 
         | https://github.com/danielmiessler/SecLists/blob/master/Fuzzi...
        
           | adamrezich wrote:
           | I've seen this before but either this is new since last time
           | or I missed it, either way: lol                   # Human
           | injection         #         # Strings which may cause human
           | to reinterpret worldview                  If you're reading
           | this, you've been in a coma for almost 20 years now. We're
           | trying a new technique. We don't know where this message will
           | end up in your dream, but we hope it works. Please wake up,
           | we miss you.
        
         | sterlind wrote:
         | please don't blacklist U+202D and U+202E or the Private Use
         | Area. my conlang has a right-to-left cursive script, and it's
         | not in Unicode. the characters live in the PUA and my font
         | renders them as a fallback. there's no mechanism for fonts to
         | ask for RTL, so I have to use bidi override.
        
         | sp332 wrote:
         | I don't think this is a good place for a blacklist. Text
         | effects should be encapsulated and reset at the end of the text
         | block, the way bold or italic effects are.
        
       | thecosmicfrog wrote:
       | Reminds me of searching for the terms "do a barrel roll",
       | "recursion" or "askew" on Google. I'm sure there's plenty of
       | others.
        
       | ryukoposting wrote:
       | And somehow, the "external link" icon is outside the scope of
       | Unicode.
        
       | joelbondurant4 wrote:
        
       | lucideer wrote:
       | Everyone here is asking if this is an "intentional easter-egg" or
       | an "accidental bug"
       | 
       | But what about accidentally working-as-intended?
       | 
       | Sure it's a little trickier to read, but it's certainly not a
       | "bug" that will cause any damage / danger / instability / etc.
        
         | gambler wrote:
         | Problem is, this behavior is so outside of the range of common
         | expectations, it's really hard to say if it's harmless or not
         | and what are the worst cases for (ab)using it.
        
         | thrdbndndn wrote:
         | I don't get your take.
         | 
         | Even the most strict definition of bug doesn't imply it has to
         | "cause any damage / danger / instability / etc." to be one.
         | 
         | And I won't call it "work as intended" when the purpose of this
         | feature is to provide an answer for human to read, and it
         | failed on that.
        
           | evolve2k wrote:
           | I'd warmly beg to differ, I personally think it's
           | illustrating how it is supposed to work, most elloquently.
        
       ___________________________________________________________________
       (page generated 2022-02-15 23:00 UTC)