[HN Gopher] The Big List of Naughty Strings ___________________________________________________________________ The Big List of Naughty Strings Author : polm23 Score : 221 points Date : 2020-05-24 13:44 UTC (9 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | 13415 wrote: | I don't quite understand the purpose of this list. It contains | potentially malicious input, but also emoticons based on Unicode | characters that are completely harmless and used in every second | post on Reddit. | kube-system wrote: | I think the purpose is to run these strings through your inputs | and make sure it doesn't behave in unexpected ways. | MauranKilom wrote: | It's essentially a test suite for character encoding all | throughout your application. If you input all those strings | (e.g. send chat message) and they arrive incorrectly at some | other end (e.g. other user receiving chat message) then there's | a problem somewhere. | 13415 wrote: | That makes sense. Thanks a lot! Of course, it's very useful | for testing. I erroneously assumed it was for input | validation. | minimaxir wrote: | I made the list while I was a Software QA Engineer at Apple, | since there were a bunch of fun Unicode strings causing | particular issues there, which gave me the idea. | Dylan16807 wrote: | There's a lot of ways to mishandle unicode. Checking that non- | BMP characters work, that emojis in various sections work, and | that emojis with modifiers work are all good tests. | toomanybeersies wrote: | It's useful for testing a variety of things that take | text/string inputs, such as forms in web applications. It's a | handy tool for testing a site (preferably one you have | permission to test) for XSS or SQL injection, character | encoding problems, or even just form length problems. | harunurhan wrote: | OK, seeing "[?]" [1] was unexpected :). For those who does not | know, it's very important for muslims and It's all over the Quran | | [1] https://github.com/minimaxir/big-list-of-naughty- | strings/blo... | cheez wrote: | what does it mean? | gnulinux wrote: | "In the name of God, the Most Gracious, the Most Merciful." | | From Wikipedia: https://en.wikipedia.org/wiki/Basmala | | Disclaimer: I'm not a Muslim, I don't know Arabic. | ctdonath wrote: | https://www.urbandictionary.com/define.php?term=%EF%B7%BD | | Fun fact: it's a single Unicode character. | harunurhan wrote: | yeah I didn't know it until i tried to copy-paste to post | here :) | atomwaffel wrote: | Yup, you can put 280 of it into a single tweet. | robinhouston wrote: | I don't _think_ that's right. I looked into the way | Twitter counts characters when I was trying to work out | the largest prime number that could be written out in | full, in base ten, in a single tweet[1]; the rules are | more complicated than you might expect, and have changed | several times. | | The current rule seems to be that all Unicode characters | count as two, except for the ranges 0-4351, 8192-8205, | 8208-8223 and 8242-8247 which count as one. | | [1] In case you're wondering, I think it's, arguably: htt | ps://twitter.com/robinhouston/status/1197294154738544641 | atomwaffel wrote: | Good point! Still, I could swear I saw someone | (@FakeUnicode?) do exactly this once, but of course I | can't find that tweet any more, partly because it turns | out that search engines don't handle [?] well at all, and | I don't feel like testing it on my own followers somehow. | | Edit: it looks like it might count it as two characters, | so that's only 140 per tweet. | mmastrac wrote: | https://charbase.com/fdfd-unicode-arabic-ligature- | bismillah-... | | Also fun is , (https://charbase.com/fdfa-unicode-arabic- | ligature-sallallaho...) which has the longest unicode | decomposition IIRC. | beobab wrote: | I had to zoom in to 400% to be able to see the detail | there. | lopmotr wrote: | Can't it be made up of individual characters or is it stylized | in a unique way? | toolslive wrote: | https://github.com/minimaxir/big-list-of-naughty-strings/blo... | | just lovely ;) | bryanrasmussen wrote: | I did feel sort of let down that they didn't have man-hole | cover. | | on edit: yeah, I'm not gonna send a pull request on that one. | duggable wrote: | This one got me: | | > "If you're reading this, you've been in a coma for almost 20 | years now. We're trying a new technique. We don't know where | this message will end up in your dream, but we hope it works. | Please wake up, we miss you." | | Strangely terrifying.... | ball_of_lint wrote: | Eh, it's just a meme: | | https://www.reddit.com/r/copypasta/comments/5we0ny/if_youre_. | .. | | I guess it is intriguing, in a Roko's Basilisk sort of way. | willismichael wrote: | The question is, which one of us is the message meant for? | naniwaduni wrote: | Why would there be more than one of you? | myself248 wrote: | Why are there so many of me posting in this thread? | was_boring wrote: | Who says it's only meant for one? | bryanrasmussen wrote: | the real question is: Inception. Can it be done? | montroser wrote: | Almost related: https://github.com/LDNOOBW/List-of-Dirty-Naughty- | Obscene-and... | jzl wrote: | Hilarious, but also important! | dorgo wrote: | What? only 151 russian words? The russians have an own | dedicated sub-language which consists solely out of bad words. | No idea or concept is too complicated to be expressed in bad | words alone. They switch from normal russian to bad words | russian as soon as the situation allowes it. | Udik wrote: | . | | . | | . | | . | | . | | . | | d d d | | Wow, what's this? :) | EvanAnderson wrote: | It reminds me a little bit of Feynman diagrams. | majewsky wrote: | Layers upon layers of combining diacritics. | folkhack wrote: | Solid list for a quick SQL injection and XSS reference with lots | of examples. Even unicode/accents/two-byte characters etc are | super useful to check handling on all the way from the front-end | to the persistent storage solution (DB, etc). | | Lost it laughing at "Human Injection" section: | | > # Strings which may cause human to reinterpret worldview | | > If you're reading this, you've been in a coma for almost 20 | years now. We're trying a new technique. We don't know where this | message will end up in your dream, but we hope it works. Please | wake up, we miss you. | yosito wrote: | I would wake up if I could, but I opened this string in vim and | I can remember how to exit. | foresto wrote: | > Please wake up, we miss you. | | I think that sentence gives itself away as modern. Were comma | splices in common use 20 years ago? | frank2 wrote: | Yes they were. IIRC at least one of the major manuals of | style endorsed them at least in some situations. | maxfan8 wrote: | That's interesting. Maybe it's considered hyper correct? | | Which style manual are you referring to? | gerdesj wrote: | Jimmy Clitheroe - the Clitheroe Kid. That brings back some | memories. It's also nice to see that England is suitably | represented in the place names, obviously Scunthorpe is the | classic. I'll tender Somerset for first amongst equals for daft | and downright odd place names. | jzl wrote: | Also tangentially related: the big list of usernames that should | be disallowed in any online system: | https://github.com/forwardemail/reserved-email-addresses-lis... | DominikPeters wrote: | Ugh, that list might be why my email address mail@[personal | domain] is forbidden more and more often. | chris_wot wrote: | Strongly advise not using cat on the list, you will get beeped | at. | fareesh wrote: | would that be considered animal abuse :D | afandian wrote: | This is deiciously ironic: | | > Also, do not send a null character (U+0000) string, as it | changes the file format on GitHub to binary and renders it | unreadable in pull requests. | monax wrote: | Yup, can't view the file using the GitHub app for Android | minimaxir wrote: | Out of curiosity, what happens when you try to do so? | Johnjonjoan wrote: | Something went wrong | | <button>TRY AGAIN<button> | | Edit: as far as I could see it's only opening blns.txt that | causes this error the other files are fine in the app. | dhosek wrote: | I encountered an amusing instance of this recently watching my | six-year-old son playing music on the kitchen Alexa. Alexa felt | it was necessary to censor the name of a children's song | entitled, "Pussy Cat, Pussy Cat." | inetsee wrote: | When I saw the title I thought it was a list of profanity that | one might want to filter out from an open web application (i.e. | a list that also includes swear words from multiple languages). | dang wrote: | See also: | | 2018 https://news.ycombinator.com/item?id=18466787 | | 2017 https://news.ycombinator.com/item?id=13406119 | | Show HN from 2015: https://news.ycombinator.com/item?id=10035008 ___________________________________________________________________ (page generated 2020-05-24 23:00 UTC)