[HN Gopher] DuckDuckGo \u202E ___________________________________________________________________ DuckDuckGo \u202E Author : zeepzeep Score : 146 points Date : 2022-02-15 20:11 UTC (2 hours ago) (HTM) web link (duckduckgo.com) (TXT) w3m dump (duckduckgo.com) | kroltan wrote: | It's intentional, if you inspect the `innerText` you'll see it's | reversed there too: | zero_click_wrapper.innerText.codePointAt(0) | | Evaluates to 32. And if you think 32 = 0x20 could mean the next | one would be 0x2E, then no, codePointAt(1) is 0x55. | nneonneo wrote: | `innerText` doesn't include the RTL marker, probably due to the | fact that it is supposed to reflect the "rendered" appearance | of the element (i.e. deleting certain invisible characters). | However, `textContent` shows the RTL marker as expected. | | I'm on the side of this being an unintentional effect. | benbristow wrote: | Reversed: U+202E RIGHT-TO-LEFT OVERRIDE, decimal: 8238, HTML: No | visual representation, UTF-8: 0xE2 0x80 0xAE, block: General | Punctuation | gambler wrote: | Extremely bad design. This kind of complexity should have been | moved to some kind of post-processing spec rather than core | Unicode. It's already causing issues and will cause more. The | more universal something is, the more effort should be applied to | keeping it simple. | [deleted] | the_mitsuhiko wrote: | I strongly disagree. This is a necessary part to shared content | text and pushing this type of functionality into another layer | makes a lot of content non accessible in basic text format. | This is precisely the type of control character that makes | Unicode such a powerful and successful system. | [deleted] | mananaysiempre wrote: | ... It's not clear how? Except by telling every speaker of | Arabic and Hebrew saying they want some of that delicious | "plain text" action to go screw themselves (there are _no_ | purely-RTL texts, only bidirectional ones, not least because of | the Indic numerals). AFAIU (at least from the full-length | horror novel that is the CDRA) IBM tried presentation-order | (and no-complex-shaping) RTL text for decades and gave up, so | Unicode bidi is essentially the result of said giving up (and | the "Arabic Presentation Forms" block the foul-smelling corpse | of the idea). | | Specify the dominant direction of your user-input-containing | elements, people, and/or enclose the input in U+2068 FSI ... | U+2069 PDI (after balancing outstanding bidi controls inside). | soheil wrote: | What's next, searching for the word death causes you to die? | soheil wrote: | Where does DDG get its search result? Do they scrape Google? If | so how do they not bet banned both technically and legally? | thesuitonym wrote: | They have their own web crawlers, as well as a deal with Bing | (And perhaps others) | sp332 wrote: | https://help.duckduckgo.com/duckduckgo-help-pages/results/so... | echelon wrote: | You still have to be mindful of \u202e in anything new that | you're writing, but browsers do a much better job of not having | it bleed across elements like they did back in the 2000s. | | Back in the era of forums that didn't support unicode correctly | (2005ish?), it was trollish fun to post messages containing | \u202E and watch the UI and all subsequent messages and elements | get messed up. (One stray \u202E would flip the entire page | contents following it.) I never took it to a level of abuse since | it was easy to remove and then ban offenders, but it was fun in a | one-off thread, and it always had great reactions. | | I patched my own software to handle it, but I don't recall anyone | really abusing it in a widespread manner. (Contrast this with the | era of prolific and widely abused AOL/AIM exploits that would | kill your IM client with malformed messages.) | | IIRC, a bunch of messaging clients also didn't (or still don't) | handle \u202e termination and it sometimes bled into new messages | and even the text input box. That was pretty horrible and | unfixable without restarting. | | Obligatory XKCD: https://xkcd.com/1137/ | | Some shenanigans in the wild: | | https://www.reddit.com/r/Unicode/comments/hc1rxi/i_put_a_rig... | | https://twitter.com/mkolsek/status/1237123571341803522 | | (These are way tamer than the effects used to be.) | | (Also, HN filters it out. I tried to have some fun. :P) | splch wrote: | Oh that's cute! Translation for anyone curious / lazy: | | Punctuation General :block ,0xAE 0x80 0xE2 :8-UTF ,representation | visual No :HTML ,8238 :decimal ,OVERRIDE LEFT-TO-RIGHT 202E+U | | Love the demos :) | heartbeats wrote: | Why can't I just disable RTL on my system? | | I do not speak a word of Arabic. There is no circumstance in | which my life will be materially improved by correct RTL text | rendering. I might want proper display of individual characters | so I can copy-paste them, but I have no use for RTL text. | | On the other hand, RTL causes a lot of unpleasant problems like | this. Why can't I simply coerce all foreign languages into LTR? | hnlmorg wrote: | If there was ever a clear signal that working with Unicode is | incredibly hard, it would be the fact that no one on HN can | decide if this is accidental or intentional. | [deleted] | tedunangst wrote: | A significant portion of the problem seems to be that some | people can't even identify what's going because the tools | they're using to inspect the page are also showing it reversed. | divbzero wrote: | Let me take a stab at a definitive answer: | | - It is unintentional for DuckDuckGo. The code for DuckDuckGo | works correctly but no one who wrote that code thought about | whether a reversal would happen. | | - It is intentional for the browser. The code for the browser | works correctly and someone who wrote that code actively | thought through how to make a reversal happen. | | I don't think 'accidental' is the right word to use in either | case because the outcome is what you would want. | shockeychap wrote: | This! Also, https://news.ycombinator.com/item?id=21105625 | tshaddox wrote: | It certainly looks like a simple template that DDG applies | consistently to all queries for a UTF-8 byte literal. It's the | exact same template for a query for a more straightforward | literal, like u0041. | | So I think it's fair to say that it's not intentional in the | sense of being a deliberately added easter egg. Of course, they | might be aware of the behavior and decided to leave it that | way. | barbazoo wrote: | And some of us don't even get what this is about. Should I be | seeing DDG doing something particular here? | dtech wrote: | The "answer" tab is right to left | barbazoo wrote: | I had that turned off. Thanks for explaining it. | [deleted] | iqanq wrote: | It's accidental, because other characters are also displayed: | https://duckduckgo.com/?q=u20aa | Retr0id wrote: | It's intentional, because there is no RTL override in the | HTML source, the string is merely reversed. | dzaima wrote: | but there is, see: document.querySelector(" | .zci__body").textContent.charCodeAt(0) document.query | Selector(".zci__body").textContent.substring(1) | progval wrote: | > no RTL override in the HTML source, the string is merely | reversed | | What? After opening the source, ctrl-f "representation" | selects the reversed word. The source view just happens to | interpret the RTL override. | Jerrrry wrote: | Stacking combining diacritics[1] is also fun, to make extremely | tall text. | | Also fun is enumerating all the characters in the Private | Character section[2] to see what UI symbols are able to be | inserted into unintended places. | | [1] https://www.unicode.org/charts/PDF/U0300.pdf | | [2] http://www.unicode.org/faq/private_use.html | https://www.unicode.org/charts/PDF/UE000.pdf | amelius wrote: | > This is often abused by hackers to disguise file extensions: | when using it in the file name my-text.'U+202E'cod.exe, the file | name is actually displayed as my-text.exe.doc | | So every programmer has to know about and support U+202E, but not | filesystem programmers? | mananaysiempre wrote: | More like UI programmers? It seems that almost everyone has | agreed that text-processing smarts inside a filesystem are a | bad idea (see: the NTFS collation table, the APFS transition | away from ancient-version-NFD-but-not-quite), although there is | that island of (admittedly very smart) -insensitive but | -preserving holdouts (casing on Windows, normalization on ZFS). | Linus rants on the topic[1] passionately, if not very | informatively. | | Note that U+202E is a _control code_ that has effect on | _display_ , not the logical order of the text (much like, say, | a bare CR), so I can't say what the filesystem is doing wrong | here (except maybe for not rejecting this outright, but see re | smarts above, this probably needs to be done on a higher | level). You don't blame the filesystem for believing the | filename "A\rB.txt" starts with A and not B, do you? Even | though ls will say otherwise. | | Bidi IRIs (which _are_ at that higher level) are kind of | horrendous, though. | | [1] https://yarchive.net/comp/linux/utf8.html | tedunangst wrote: | What do you want the filesystem programmer to do? | foxfluff wrote: | if (!isascii(c)) panic("stupid user"); | tyingq wrote: | That's pretty much correct. Most of the filesystems I'm aware | of just treat filenames as a "string of bytes" with some list | of characters that aren't allowed, and perhaps a few other | rules. Other than that, it's a free-for-all on names. | jamescodesthing wrote: | Same works for urls. | TadeusTaD wrote: | Instantly reminded me of a relevant xkcd: https://xkcd.com/1137/ | zeepzeep wrote: | Hey that's new to me, I'll use this, thanks. | tobz1000 wrote: | Easter egg or bug? | Waterluvian wrote: | Poe's Law applied to coding easter eggs? :D | rackjack wrote: | Easter bug? | zeepzeep wrote: | That's the question! | | (I think it's unintended though) | oneplane wrote: | bug egg? it's also an instant answer from the community (the | little info icon on the right hand side) so perhaps just | presented that way due to how it was delivered by that specific | community member. | jfk13 wrote: | Similarly, if I try https://www.google.com/search?q=u202e, the | second result I currently get (YMMV) is from https://unicode- | table.com/, and almost the entire snippet shows up backwards in | the search results. | Sebb767 wrote: | I'm not sure whether this is a bug or a feature^Weaster egg | BitwiseFool wrote: | I'm out of the loop, what kind of Easter Egg is it? | brimble wrote: | The text in the instant-answer bar is reversed for this | result. Which could plausibly either be on purpose, or a | result of the character itself being inserted and not | escaped, so having its intended effect. | pwdisswordfish9 wrote: | Oversight, probably. By default, the code point is displayed | next to that description, and they don't turn that off for | bidirectional control characters. | | https://duckduckgo.com/?q=u1f4a9 | | (Yes, I have that one memorized) | [deleted] | gunapologist99 wrote: | Are there any lists of unicode characters (like the OWASP one) | that should be blacklisted from most apps (not just for XSS, but | even for desktop apps)? | | Are there any good security guides/best practices for unicode | sanitation? | wongarsu wrote: | How are users supposed to write "`bvr l duckduckgo.com kdy | lkhpsh byntrnt" without \u202E? It's perfectly normal for RTL | languages to switch text direction in the middle of a sentence. | harambae wrote: | Not a full security guide, but if you haven't seen this before | it's useful to have... | | https://github.com/danielmiessler/SecLists/blob/master/Fuzzi... | adamrezich wrote: | I've seen this before but either this is new since last time | or I missed it, either way: lol # Human | injection # # Strings which may cause human | to reinterpret worldview If you're reading | this, you've been in a coma for almost 20 years now. We're | trying a new technique. We don't know where this message will | end up in your dream, but we hope it works. Please wake up, | we miss you. | sterlind wrote: | please don't blacklist U+202D and U+202E or the Private Use | Area. my conlang has a right-to-left cursive script, and it's | not in Unicode. the characters live in the PUA and my font | renders them as a fallback. there's no mechanism for fonts to | ask for RTL, so I have to use bidi override. | sp332 wrote: | I don't think this is a good place for a blacklist. Text | effects should be encapsulated and reset at the end of the text | block, the way bold or italic effects are. | thecosmicfrog wrote: | Reminds me of searching for the terms "do a barrel roll", | "recursion" or "askew" on Google. I'm sure there's plenty of | others. | ryukoposting wrote: | And somehow, the "external link" icon is outside the scope of | Unicode. | joelbondurant4 wrote: | lucideer wrote: | Everyone here is asking if this is an "intentional easter-egg" or | an "accidental bug" | | But what about accidentally working-as-intended? | | Sure it's a little trickier to read, but it's certainly not a | "bug" that will cause any damage / danger / instability / etc. | gambler wrote: | Problem is, this behavior is so outside of the range of common | expectations, it's really hard to say if it's harmless or not | and what are the worst cases for (ab)using it. | thrdbndndn wrote: | I don't get your take. | | Even the most strict definition of bug doesn't imply it has to | "cause any damage / danger / instability / etc." to be one. | | And I won't call it "work as intended" when the purpose of this | feature is to provide an answer for human to read, and it | failed on that. | evolve2k wrote: | I'd warmly beg to differ, I personally think it's | illustrating how it is supposed to work, most elloquently. ___________________________________________________________________ (page generated 2022-02-15 23:00 UTC)