[HN Gopher] Web hacking techniques of 2021 ___________________________________________________________________ Web hacking techniques of 2021 Author : adrianomartins Score : 456 points Date : 2022-02-10 09:23 UTC (13 hours ago) (HTM) web link (portswigger.net) (TXT) w3m dump (portswigger.net) | furstenheim wrote: | It got me thinking, is client side rendering intrinsically safer | than SSR. | | SQL queries with params are safer because data and code flow | separately. Similarly, if you query backend for data and then do | textContent = response, that cannot do xss, right? | chrismorgan wrote: | No, client-side rendering is not intrinsically safer than | server-side rendering, provided all outputs of serialisation | are parsed identically (as is the case for _valid_ HTML trees). | | The problems start when you try to manipulate serialised data, | which is not safe to do this in the general case. You should | instead construct a proper representation of what you desire, | and then serialise _that_ , depending on the serialiser to take | care of all of this sort of stuff. This approach has always | been fairly popular in compiled languages and languages that | like types, but dynamic languages have historically | significantly preferred to manipulate strings, I suspect | because they don't have good ergonomics on the other approach, | and it's probably slower in interpreted languages--you'll note | that React felt the need to extend JavaScript to make its | approach acceptable to people. | | Most JavaScript stuff that supports server-side rendering now | is working in this way, crafting a DOM tree and then | serialising that. Svelte is a notable exception in that it | takes a declarative DOM tree and essentially serialises what it | can at compile time, thereby still retaining the required | safety guarantees. | | There are definitely downsides to strict adherence to the model | of crafting a data structure and then serialising it; most | significantly, you can't start streaming a response until | you're done. The solution for this is to use an append-only | data structure (or possibly one that allows you to "commit" the | document up to a given point, while still allowing mutations in | anything that occurs later in the document); thus serialisation | can begin before you finish writing the document. | | You know the old favourite about parsing HTML with regular | expressions? | <https://stackoverflow.com/questions/1732348/regex-match- | open...> (If not, enjoy!) This is the thing people need to | understand and realise in the general case: serialised data | should be treated as _opaque_ , and only interacted with after | real parsing and before real serialisation. | | HTTP headers aren't strings; "Date: Tue, 15 Nov 1994 08:12:31 | GMT" is a _serialised_ HTTP header, representing the actual | header that's more like {Date, 1994-11-15T08:12:31Z}. And that | latter is the form you should interact with it in. | | HTML isn't strings; "<p>Hello, world!</p>" is the _serialised_ | form of a paragraph element containing a text node with data | "Hello, world!". And that's the form you should interact with | it in. | | Yes, I am presenting a strongly-opinionated position that lacks | any shade of pragmatism. Yes, my website is generated with | templates that manipulate serialised HTML. Eventually I'll | replace it with something more sound. | | One last note: at the start I said _valid_ HTML, because it's | not enough to just serialise an arbitrary HTML DOM tree, as you | can easily craft invalid HTML DOM trees, like nesting | hyperlinks. In most regards, the XML syntax of HTML (still a | thing) is actually a safer target to serialise to because then | you don't even need to validate your tree to be confident it | won't get mangled by the serialise /parse round-trip. | furstenheim wrote: | Sorry, what do you mean by parsed identically? In CSR you can | have data displayed into the front-end without ever be parsed | as HTML. You do some http call to the backend, get a json get | the property and do, element.textContent = myData. If that's | unsafe there would be a bug in the browser, ain't it? | chrismorgan wrote: | I was going to use optional start tags and tbody as my | example, but on checking the spec it turns out that tr _is_ | actually valid as a direct child of table, even if the HTML | syntax will prevent you from creating it by inserting a | tbody around it. (XHTML 1.0 validation also confirms that | tbody is genuinely optional there.) This actually | undermines my "as is the case in _valid_ HTML"--but never | mind, I'll demonstrate what the point was, and what is at | least _generally_ the case. | | So let's go with a more egregious invalidity: nested links. | Which browsers _do_ actually support, but HTML syntax | doesn't. Suppose you produce this DOM tree (server side or | client side, I don't care): p + #text | "Look at this " + a href="https://a.example" | | + #text "link with " | + a href="https://b.example" | | | + #text "nesting" | + #text " like so" + | #text "!" | | (Client-side, you could generate it like this: | let p = document.createElement("p"); let a1 = | document.createElement("a"); let a2 = | document.createElement("a"); a1.href = | "https://a.example"; a2.href = "https://b.example"; | a2.append("nesting"); a1.append("link with ", a2, " | like so"); p.append("Look at this ", a1, "!"); | | ) | | That serialises to this in both HTML and XML syntaxes: | <p>Look at this <a href="https://a.example">link with <a | href="https://b.example">nesting</a> like so</a>!</p> | | (Client-side, `p.outerHTML`; `new | XMLSerializer().serializeToString(p)` shows the XML syntax, | which is the same modulo an xmlns attribute for XML | reasons. Incidentally, `p.outerHTML` gives you HTML syntax | for an HTML-syntax document and XML syntax for an XML- | syntax document, which mostly means if you served the file | with the application/xhtml+xml MIME type.) | | But parse _that_ with the HTML syntax, and the nested links | break (e.g. `document.body.innerHTML = p.outerHTML`): | p + #text "Look at this " + a | href="https://a.example" | + #text "link with " | + a href="https://b.example" | + #text "nesting" | + #text " like so!" | | And _that_ is the steady state (meaning you can round-trip | it again as much as you like and it will no longer change): | <p>Look at this <a href="https://a.example">link with | </a><a href="https://b.example">nesting</a> like so!</p> | | Returning to the initial remark you're asking about: I | wrote that having more than just HTML in mind (kind of why | I brought HTTP into it later on, and because other formats | like Markdown may be being used, and who knows about it; | and in the parent comment, SQL parameters had been | mentioned, which is also a good example of the issue in | hand), that this is a general remark about stability and | safety: that interpolating strings raw is just dangerous, | and that you should parse and serialise-- _provided_ the | format has been designed so that that's a safe operation. | As it happens, the typical DOM tree representation of HTML | _doesn't_ protect you enough, so you need to work with | _valid_ HTML for it to be fully robust. | | Actually, I've just thought of the perfect example of why | valid HTML is important when you're crafting a tree for | serialisation, because it actually _would_ introduce an | injection vulnerability: comments. Contemplate this: | document.createComment('--><script>alert("pwnd")</script><! | --') #comment | "--><script>alert("pwnd")</script><!--" <!-- | --><script>alert("pwnd")</script><!-- --> | | Or you could break scripts by injecting </script> or | stylesheets by injecting </style>, given that they don't | use HTML entity escaping. I _think_ these are the only | cases where invalid HTML could actually be _harmful_ ; most | places (not that there are many--optional start tags, link | nesting and paragraph nesting are just about it) it'll just | shuffle the DOM slightly. | | Y'know what? I'm starting to think even the _tree_ form is | rather dangerous to work in for HTML. XML syntax protects | you from almost all inconsistency, but doesn't guard | against that comment attack (that's literally the only | thing it'll miss) and loses the <noscript> element. | | I'm tempted to retract my position that client-side | rendering is not intrinsically safer than server-side, but | so long as you have a step that _validates_ your HTML | before you serialise it, you're still OK (and even the | breakages depend on injecting arbitrary content into a | comment, script tag or style tag, which are all extremely | unlikely), so I retain my position, now hanging | precariously from that delicate thread of the word | "intrinsically". I think there's a gaping chasm below me. | Hopefully there's something soft to land on. | jcims wrote: | Anytime is possible for the data that returns to be | interpolated by the client, you could have xss or related | attack. | | Client side rendering does help but mistakes are still | regularly made. Sometimes by the app dev, sometimes by the | framework dev. | | You could probably go to an extreme and return all of your | application data as sprites. | furstenheim wrote: | Of course you can still do <div> + input + </div> in CSR, but | you can definitely not do myelement.textContent = | whateverIGot in SSR, right? | asddubs wrote: | you can use a template engine that escapes all variables by | default. in either case, it's just about coding defensively | and being secure by default | furstenheim wrote: | Then why is parameter query safer? And not just escapes | variables? Escaping is hard, as shown in the article | alcover wrote: | > textContent = response | | Good question (that none of the replies seem to address). That | is exactly what I would do if rendering 'tainted' text. | | Can someone please tell us how it could be defeated ? | asddubs wrote: | if you use a template engine with sane defaults, you can | achieve the same level of safety. | TheAdamist wrote: | The hn title needs updating as it's misleading, even if it | reflects the title on the website. The first sentence even | clarifies it's only new techniques. | | "Welcome to the Top 10 (new) Web Hacking Techniques of 2021, the | latest iteration of our annual community-powered effort to | identify the most significant web security research released in | the last year". | | The top web hacking techniques used and the top new ones I would | expect to be very different lists. | badrabbit wrote: | This guy's work always impresses me. He had a nice Blackhat brief | as well. | | This list is great and all for redteamers but as a defender, I | would like to know if any actual threat actors used these | techniques even after publication. Even with all the | secret/private and public threat intel I am aware of, none of | them register. Not knocking down on threat research, I am | honestly curious because I can't tell if I should be on the look | out for any real threat actors using these techniques. | FastEatSlow wrote: | Yes, actual threat actors use these techniques even after | publication. There is a lot of outdated/misconfigured systems | in the wild. A fairly recent example is the defacing of | multiple Ukrainian government websites[1], through exploiting a | vulnerability fixed and publicised in august 2021. There's also | around 10,000 (can't remember where that statistic is from) | Huawei routers on the internet vulnerable to an issue from | 2015, which are constantly being infected with botnet worms. | | [1] https://www.bleepingcomputer.com/news/security/multiple- | ukra... | badrabbit wrote: | I know web exploits happen all the time first hand. | | > all 15 compromised Ukrainian sites were using an outdated | version of the October CMS, vulnerable to CVE-2021-32648. | | That cve looks like it was caused by someone doing == instead | of === in php. | | My question was things like request smuggling and protocol | abuse attacks have ever been seen in the "wild". | fendy3002 wrote: | Man the JSON inconsistency one is creative. I know it's not | consistent implementation across languages, but I don't know it | can be used to such attacks. | FabHK wrote: | Yes. The big take-away for me, whether it's JSON or YAML or XML | or whatever: never parse anything more than once (and | definitely not with different parsers). | formerly_proven wrote: | Five out of ten new techniques are langsec, which makes them | inherently difficult to fix, yet we keep using unreasonably | complex languages for protocols and keep stapling on more | complexity, resulting in formally assured insecurity. | nyanpasu64 wrote: | http://langsec.org/ does a spectacularly poor job of | introducing langsec to the uninitiated. It appears to be a list | of conferences and papers for academics, followed by | http://langsec.org/bof-handout.pdf which makes unsubstantiated | assertions and doesn't elaborate. I think more people would | learn about langsec if the homepage contained an introduction | followed by a guided tour of articles which incrementally teach | the current state of the field in an organized accessible | fashion. | | EDIT: I found https://scribe.rip/1b92451d4764 which purports to | be an "introduction followed by a tour", which links to | "Security Applications of Formal Language Theory" and "The | Seven Turrets of Babel: A Taxonomy of LangSec Errors and How to | Expunge Them". The second seems not very practical/applied or | hands-on, and the first is quite long and academic (I haven't | read it yet). It _might_ be useful as reference material, but I | 'd be interested to see examples of designing/refactoring | systems to be more secure based on langsec. | rank0 wrote: | The paper linked in your EDIT is awesome. I'm an AppSec | engineer and I had never encountered a term like "shotgun | parser". What the authors describe as shotgun parsing is | exactly what I've seen from reviewing validation logic across | hundreds of enterprise applications. It's nice to have a name | for the pattern. | | The worst part of shotgun parsing and loosely defined input | structure is the difficulty of remediation. I constantly | receive pushback from dev teams when I ask them to use regex- | based validation per field. What sounds like a simple task | actually becomes extremely difficult because lots of apps | populate datasets via convoluted monolithic endpoints. Dev | teams would have to change the way in which shared services | structure and output information. Those shared services are | frequently maintained by other teams and any other | application which consume the same data would also need to be | modified. | | In the end, it becomes a compromise where the ad-hoc parsing | is tightened/modified to be "good enough". This | bubblegum/duct-tape fix only further cements the ad-hoc | parsing throughout the org. | jcims wrote: | I was full time infosec from 1998 until 2015 then moved into | an adjacent role that is still technically infosec but is | more infrastructure/platform controls. This is the first time | i recall ever seeing the term. | | Based on reading the two sentence synopsis in Google results | it's largely indistinguishable from the more familiar "formal | methods" or "formal verification". | the-alt-one wrote: | KTH has a course called Language based security ( | https://www.kth.se/student/kurser/kurs/DD2525?l=en ) which | indeed does come from people involved in formal methods. | | Formal methods is a huge area though, but in essence it's | about establishing proofs of correctness. | ooedemis wrote: | What about GWT-Google Web Toolkit its actually not so many | updated and under top news but the idea is implement in a prooven | language java both frontend and backend | scanr wrote: | The work on exploiting prototype pollution was excellent | https://blog.s1r1us.ninja/research/PP | | I didn't know about the --disable-proto option in node or the | Document Policy proposal for dealing with it. | | Amazing that 80% of nested query parameter parsers were | susceptible to prototype pollution. | adrianomartins wrote: | Interesting community built list of the top 10 web hacking | vulnerabilities used in 2021. If you're making a web product you | might want your team to quickly run over these. | TheAdamist wrote: | It's not the top 10 used, it's the top 10 new for 2021 | techniques, and specifically excludes older techniques. | Agamus wrote: | I'm not an expert here, but truly interested to hear responses to | this question. | | To say that 1+1=2 is "true", does that not require a corollary in | "reality" to something fundamental that can be called a "one" | object? I believe this is called mathematical constructivism. | | Imagine, hypothetically, that we cannot identify something that | is physically fundamental and individual. My question is whether | any mathematics in that scenario could be considered "true" | without such constructivism, in other words, without a physical | correspondence to an unquestionably, physically fundamental "one" | object. | [deleted] | hbn wrote: | Not super on topic, but every time this site is linked, I never | properly read the URL correctly. My brain immediately thinks the | space is between the 's' and 'w' | mywacaday wrote: | Same with expertsexchange.com | thefreeman wrote: | thanks, now i'll never be able to read it properly again. :( | losthobbies wrote: | The dependency confusion article on Medium was a great read. | airstrike wrote: | It's a really good article and apologies to the author for | nitpicking but even as a bona fide Python fanboy I had to raise | my eyebrows at this statement: | | _> Some programming languages, like Python, come with an easy, | more or less official method of installing dependencies for | your projects._ | nawgz wrote: | I mean, have you ever used a language like Java? Python has a | bad package manager story, sure, but it has a package manager | story - that's not actually particularly global afaik | remus wrote: | Beautifully simple! Exfiltrating data via a DNS request was a | nice little trick too. | [deleted] | baobabKoodaa wrote: | It's amazing that such a simple vulnerability can be leveraged | in practice to gain access to so many machines on so many | different organizations. Props to the researcher! | clarnaskirq wrote: | As a web programmer, for whom the majority of this article is not | only new, but difficult to comprehend, it makes me yearn to | improve my web security knowledge. Any pointers? | orangepurple wrote: | Go through each line item in the article and create a proof of | concept for yourself. You will learn a lot along the way too. | doopy1 wrote: | You can look at the disclosed reports on hackerone and get a | feel for the kind of stuff that's being exploited and how it's | being addressed. | ipnon wrote: | Do some of your own hacking on hackthebox.com. It is shocking | what can be done with only a week of security training by an | already experienced programmer. It becomes clear that the | typical software engineer doesn't give a _single_ thought to | security. | ridiculous_leke wrote: | I suggest going through cheatsheets on OWASP. Most of it is | comprehensible to any web programmer. Here's one example: | | https://cheatsheetseries.owasp.org/cheatsheets/PHP_Configura... | icare_1er wrote: | It baffles me how convoluted and complex the webapp attacks have | become over the past few years. | | I think this is an effect of bug-bounty hunting, which has pretty | much opened the research on those topics to a massive community. | bawolff wrote: | Kind of feels a little repetitive to have request smugguling on | the list 3 different times. | ackbar03 wrote: | Anyone here that works on these kind of deep-dive type of | security research? Can you give a TLDR of how do you usually set | everything up to find these results? | | As in, do you set up some sort of test environment/website with | full debug logs and take if one step at a time from there? If so, | how to you ensure that it is realistic and relevant to real world | use since real-world architecture might differ from a setup that | worked in your experiments? | | I ask this because I used to do some bug bounties and it | consisted of a lot of painful trial and error. I can't imagine | anything new and profound can be found that way. | | (PS in case it isn't obvious I didn't open up the research links | and read in detail, hence a tldr) | EdOverflow wrote: | I am a security researcher referenced in the winning web- | hacking technique on that list ("Dependency Confusion" by Alex | Birsan [1]) and was ranked 7th in Portswigger's 2019 issue | [2,3]. My motto has always been "Learn to make it; then break | it." In other words, I invest a lot of time familiarising | myself with technologies and specifications before examining | how their implementation might lead to security flaws. This | process usually requires reading a lot of technical | documentation and source code, and becoming acquainted with how | organisations implement said technologies. | | Once I feel comfortable with my understanding of the subject | material, I start to think about how certain aspects of the | technology could lead to security flaws or interesting areas of | research. At times this may require out-of-the-box thinking or | can even be the result of pure luck. | | The "bug bounty" aspect of this all tends to come into play | once I want to find case studies for my research. | | [1]: https://medium.com/@alex.birsan/dependency- | confusion-4a5d60f... | | [2]: https://portswigger.net/research/top-10-web-hacking- | techniqu... | | [3]: https://edoverflow.com/2019/ci-knew-there-would-be-bugs- | here... ___________________________________________________________________ (page generated 2022-02-10 23:00 UTC)