[HN Gopher] Web hacking techniques of 2021
       ___________________________________________________________________
        
       Web hacking techniques of 2021
        
       Author : adrianomartins
       Score  : 456 points
       Date   : 2022-02-10 09:23 UTC (13 hours ago)
        
 (HTM) web link (portswigger.net)
 (TXT) w3m dump (portswigger.net)
        
       | furstenheim wrote:
       | It got me thinking, is client side rendering intrinsically safer
       | than SSR.
       | 
       | SQL queries with params are safer because data and code flow
       | separately. Similarly, if you query backend for data and then do
       | textContent = response, that cannot do xss, right?
        
         | chrismorgan wrote:
         | No, client-side rendering is not intrinsically safer than
         | server-side rendering, provided all outputs of serialisation
         | are parsed identically (as is the case for _valid_ HTML trees).
         | 
         | The problems start when you try to manipulate serialised data,
         | which is not safe to do this in the general case. You should
         | instead construct a proper representation of what you desire,
         | and then serialise _that_ , depending on the serialiser to take
         | care of all of this sort of stuff. This approach has always
         | been fairly popular in compiled languages and languages that
         | like types, but dynamic languages have historically
         | significantly preferred to manipulate strings, I suspect
         | because they don't have good ergonomics on the other approach,
         | and it's probably slower in interpreted languages--you'll note
         | that React felt the need to extend JavaScript to make its
         | approach acceptable to people.
         | 
         | Most JavaScript stuff that supports server-side rendering now
         | is working in this way, crafting a DOM tree and then
         | serialising that. Svelte is a notable exception in that it
         | takes a declarative DOM tree and essentially serialises what it
         | can at compile time, thereby still retaining the required
         | safety guarantees.
         | 
         | There are definitely downsides to strict adherence to the model
         | of crafting a data structure and then serialising it; most
         | significantly, you can't start streaming a response until
         | you're done. The solution for this is to use an append-only
         | data structure (or possibly one that allows you to "commit" the
         | document up to a given point, while still allowing mutations in
         | anything that occurs later in the document); thus serialisation
         | can begin before you finish writing the document.
         | 
         | You know the old favourite about parsing HTML with regular
         | expressions?
         | <https://stackoverflow.com/questions/1732348/regex-match-
         | open...> (If not, enjoy!) This is the thing people need to
         | understand and realise in the general case: serialised data
         | should be treated as _opaque_ , and only interacted with after
         | real parsing and before real serialisation.
         | 
         | HTTP headers aren't strings; "Date: Tue, 15 Nov 1994 08:12:31
         | GMT" is a _serialised_ HTTP header, representing the actual
         | header that's more like {Date, 1994-11-15T08:12:31Z}. And that
         | latter is the form you should interact with it in.
         | 
         | HTML isn't strings; "<p>Hello, world!</p>" is the _serialised_
         | form of a paragraph element containing a text node with data
         | "Hello, world!". And that's the form you should interact with
         | it in.
         | 
         | Yes, I am presenting a strongly-opinionated position that lacks
         | any shade of pragmatism. Yes, my website is generated with
         | templates that manipulate serialised HTML. Eventually I'll
         | replace it with something more sound.
         | 
         | One last note: at the start I said _valid_ HTML, because it's
         | not enough to just serialise an arbitrary HTML DOM tree, as you
         | can easily craft invalid HTML DOM trees, like nesting
         | hyperlinks. In most regards, the XML syntax of HTML (still a
         | thing) is actually a safer target to serialise to because then
         | you don't even need to validate your tree to be confident it
         | won't get mangled by the serialise /parse round-trip.
        
           | furstenheim wrote:
           | Sorry, what do you mean by parsed identically? In CSR you can
           | have data displayed into the front-end without ever be parsed
           | as HTML. You do some http call to the backend, get a json get
           | the property and do, element.textContent = myData. If that's
           | unsafe there would be a bug in the browser, ain't it?
        
             | chrismorgan wrote:
             | I was going to use optional start tags and tbody as my
             | example, but on checking the spec it turns out that tr _is_
             | actually valid as a direct child of table, even if the HTML
             | syntax will prevent you from creating it by inserting a
             | tbody around it. (XHTML 1.0 validation also confirms that
             | tbody is genuinely optional there.) This actually
             | undermines my "as is the case in _valid_ HTML"--but never
             | mind, I'll demonstrate what the point was, and what is at
             | least _generally_ the case.
             | 
             | So let's go with a more egregious invalidity: nested links.
             | Which browsers _do_ actually support, but HTML syntax
             | doesn't. Suppose you produce this DOM tree (server side or
             | client side, I don't care):                 p       + #text
             | "Look at this "       + a href="https://a.example"       |
             | + #text "link with "       | + a href="https://b.example"
             | | | + #text "nesting"       | + #text " like so"       +
             | #text "!"
             | 
             | (Client-side, you could generate it like this:
             | let p = document.createElement("p");       let a1 =
             | document.createElement("a");       let a2 =
             | document.createElement("a");       a1.href =
             | "https://a.example";       a2.href = "https://b.example";
             | a2.append("nesting");       a1.append("link with ", a2, "
             | like so");       p.append("Look at this ", a1, "!");
             | 
             | )
             | 
             | That serialises to this in both HTML and XML syntaxes:
             | <p>Look at this <a href="https://a.example">link with <a
             | href="https://b.example">nesting</a> like so</a>!</p>
             | 
             | (Client-side, `p.outerHTML`; `new
             | XMLSerializer().serializeToString(p)` shows the XML syntax,
             | which is the same modulo an xmlns attribute for XML
             | reasons. Incidentally, `p.outerHTML` gives you HTML syntax
             | for an HTML-syntax document and XML syntax for an XML-
             | syntax document, which mostly means if you served the file
             | with the application/xhtml+xml MIME type.)
             | 
             | But parse _that_ with the HTML syntax, and the nested links
             | break (e.g. `document.body.innerHTML = p.outerHTML`):
             | p       + #text "Look at this "       + a
             | href="https://a.example"       | + #text "link with "
             | + a href="https://b.example"       | + #text "nesting"
             | + #text " like so!"
             | 
             | And _that_ is the steady state (meaning you can round-trip
             | it again as much as you like and it will no longer change):
             | <p>Look at this <a href="https://a.example">link with
             | </a><a href="https://b.example">nesting</a> like so!</p>
             | 
             | Returning to the initial remark you're asking about: I
             | wrote that having more than just HTML in mind (kind of why
             | I brought HTTP into it later on, and because other formats
             | like Markdown may be being used, and who knows about it;
             | and in the parent comment, SQL parameters had been
             | mentioned, which is also a good example of the issue in
             | hand), that this is a general remark about stability and
             | safety: that interpolating strings raw is just dangerous,
             | and that you should parse and serialise-- _provided_ the
             | format has been designed so that that's a safe operation.
             | As it happens, the typical DOM tree representation of HTML
             | _doesn't_ protect you enough, so you need to work with
             | _valid_ HTML for it to be fully robust.
             | 
             | Actually, I've just thought of the perfect example of why
             | valid HTML is important when you're crafting a tree for
             | serialisation, because it actually _would_ introduce an
             | injection vulnerability: comments. Contemplate this:
             | document.createComment('--><script>alert("pwnd")</script><!
             | --')            #comment
             | "--><script>alert("pwnd")</script><!--"            <!--
             | --><script>alert("pwnd")</script><!-- -->
             | 
             | Or you could break scripts by injecting </script> or
             | stylesheets by injecting </style>, given that they don't
             | use HTML entity escaping. I _think_ these are the only
             | cases where invalid HTML could actually be _harmful_ ; most
             | places (not that there are many--optional start tags, link
             | nesting and paragraph nesting are just about it) it'll just
             | shuffle the DOM slightly.
             | 
             | Y'know what? I'm starting to think even the _tree_ form is
             | rather dangerous to work in for HTML. XML syntax protects
             | you from almost all inconsistency, but doesn't guard
             | against that comment attack (that's literally the only
             | thing it'll miss) and loses the  <noscript> element.
             | 
             | I'm tempted to retract my position that client-side
             | rendering is not intrinsically safer than server-side, but
             | so long as you have a step that _validates_ your HTML
             | before you serialise it, you're still OK (and even the
             | breakages depend on injecting arbitrary content into a
             | comment, script tag or style tag, which are all extremely
             | unlikely), so I retain my position, now hanging
             | precariously from that delicate thread of the word
             | "intrinsically". I think there's a gaping chasm below me.
             | Hopefully there's something soft to land on.
        
         | jcims wrote:
         | Anytime is possible for the data that returns to be
         | interpolated by the client, you could have xss or related
         | attack.
         | 
         | Client side rendering does help but mistakes are still
         | regularly made. Sometimes by the app dev, sometimes by the
         | framework dev.
         | 
         | You could probably go to an extreme and return all of your
         | application data as sprites.
        
           | furstenheim wrote:
           | Of course you can still do <div> + input + </div> in CSR, but
           | you can definitely not do myelement.textContent =
           | whateverIGot in SSR, right?
        
             | asddubs wrote:
             | you can use a template engine that escapes all variables by
             | default. in either case, it's just about coding defensively
             | and being secure by default
        
               | furstenheim wrote:
               | Then why is parameter query safer? And not just escapes
               | variables? Escaping is hard, as shown in the article
        
         | alcover wrote:
         | >  textContent = response
         | 
         | Good question (that none of the replies seem to address). That
         | is exactly what I would do if rendering 'tainted' text.
         | 
         | Can someone please tell us how it could be defeated ?
        
         | asddubs wrote:
         | if you use a template engine with sane defaults, you can
         | achieve the same level of safety.
        
       | TheAdamist wrote:
       | The hn title needs updating as it's misleading, even if it
       | reflects the title on the website. The first sentence even
       | clarifies it's only new techniques.
       | 
       | "Welcome to the Top 10 (new) Web Hacking Techniques of 2021, the
       | latest iteration of our annual community-powered effort to
       | identify the most significant web security research released in
       | the last year".
       | 
       | The top web hacking techniques used and the top new ones I would
       | expect to be very different lists.
        
       | badrabbit wrote:
       | This guy's work always impresses me. He had a nice Blackhat brief
       | as well.
       | 
       | This list is great and all for redteamers but as a defender, I
       | would like to know if any actual threat actors used these
       | techniques even after publication. Even with all the
       | secret/private and public threat intel I am aware of, none of
       | them register. Not knocking down on threat research, I am
       | honestly curious because I can't tell if I should be on the look
       | out for any real threat actors using these techniques.
        
         | FastEatSlow wrote:
         | Yes, actual threat actors use these techniques even after
         | publication. There is a lot of outdated/misconfigured systems
         | in the wild. A fairly recent example is the defacing of
         | multiple Ukrainian government websites[1], through exploiting a
         | vulnerability fixed and publicised in august 2021. There's also
         | around 10,000 (can't remember where that statistic is from)
         | Huawei routers on the internet vulnerable to an issue from
         | 2015, which are constantly being infected with botnet worms.
         | 
         | [1] https://www.bleepingcomputer.com/news/security/multiple-
         | ukra...
        
           | badrabbit wrote:
           | I know web exploits happen all the time first hand.
           | 
           | > all 15 compromised Ukrainian sites were using an outdated
           | version of the October CMS, vulnerable to CVE-2021-32648.
           | 
           | That cve looks like it was caused by someone doing == instead
           | of === in php.
           | 
           | My question was things like request smuggling and protocol
           | abuse attacks have ever been seen in the "wild".
        
       | fendy3002 wrote:
       | Man the JSON inconsistency one is creative. I know it's not
       | consistent implementation across languages, but I don't know it
       | can be used to such attacks.
        
         | FabHK wrote:
         | Yes. The big take-away for me, whether it's JSON or YAML or XML
         | or whatever: never parse anything more than once (and
         | definitely not with different parsers).
        
       | formerly_proven wrote:
       | Five out of ten new techniques are langsec, which makes them
       | inherently difficult to fix, yet we keep using unreasonably
       | complex languages for protocols and keep stapling on more
       | complexity, resulting in formally assured insecurity.
        
         | nyanpasu64 wrote:
         | http://langsec.org/ does a spectacularly poor job of
         | introducing langsec to the uninitiated. It appears to be a list
         | of conferences and papers for academics, followed by
         | http://langsec.org/bof-handout.pdf which makes unsubstantiated
         | assertions and doesn't elaborate. I think more people would
         | learn about langsec if the homepage contained an introduction
         | followed by a guided tour of articles which incrementally teach
         | the current state of the field in an organized accessible
         | fashion.
         | 
         | EDIT: I found https://scribe.rip/1b92451d4764 which purports to
         | be an "introduction followed by a tour", which links to
         | "Security Applications of Formal Language Theory" and "The
         | Seven Turrets of Babel: A Taxonomy of LangSec Errors and How to
         | Expunge Them". The second seems not very practical/applied or
         | hands-on, and the first is quite long and academic (I haven't
         | read it yet). It _might_ be useful as reference material, but I
         | 'd be interested to see examples of designing/refactoring
         | systems to be more secure based on langsec.
        
           | rank0 wrote:
           | The paper linked in your EDIT is awesome. I'm an AppSec
           | engineer and I had never encountered a term like "shotgun
           | parser". What the authors describe as shotgun parsing is
           | exactly what I've seen from reviewing validation logic across
           | hundreds of enterprise applications. It's nice to have a name
           | for the pattern.
           | 
           | The worst part of shotgun parsing and loosely defined input
           | structure is the difficulty of remediation. I constantly
           | receive pushback from dev teams when I ask them to use regex-
           | based validation per field. What sounds like a simple task
           | actually becomes extremely difficult because lots of apps
           | populate datasets via convoluted monolithic endpoints. Dev
           | teams would have to change the way in which shared services
           | structure and output information. Those shared services are
           | frequently maintained by other teams and any other
           | application which consume the same data would also need to be
           | modified.
           | 
           | In the end, it becomes a compromise where the ad-hoc parsing
           | is tightened/modified to be "good enough". This
           | bubblegum/duct-tape fix only further cements the ad-hoc
           | parsing throughout the org.
        
           | jcims wrote:
           | I was full time infosec from 1998 until 2015 then moved into
           | an adjacent role that is still technically infosec but is
           | more infrastructure/platform controls. This is the first time
           | i recall ever seeing the term.
           | 
           | Based on reading the two sentence synopsis in Google results
           | it's largely indistinguishable from the more familiar "formal
           | methods" or "formal verification".
        
             | the-alt-one wrote:
             | KTH has a course called Language based security (
             | https://www.kth.se/student/kurser/kurs/DD2525?l=en ) which
             | indeed does come from people involved in formal methods.
             | 
             | Formal methods is a huge area though, but in essence it's
             | about establishing proofs of correctness.
        
       | ooedemis wrote:
       | What about GWT-Google Web Toolkit its actually not so many
       | updated and under top news but the idea is implement in a prooven
       | language java both frontend and backend
        
       | scanr wrote:
       | The work on exploiting prototype pollution was excellent
       | https://blog.s1r1us.ninja/research/PP
       | 
       | I didn't know about the --disable-proto option in node or the
       | Document Policy proposal for dealing with it.
       | 
       | Amazing that 80% of nested query parameter parsers were
       | susceptible to prototype pollution.
        
       | adrianomartins wrote:
       | Interesting community built list of the top 10 web hacking
       | vulnerabilities used in 2021. If you're making a web product you
       | might want your team to quickly run over these.
        
         | TheAdamist wrote:
         | It's not the top 10 used, it's the top 10 new for 2021
         | techniques, and specifically excludes older techniques.
        
       | Agamus wrote:
       | I'm not an expert here, but truly interested to hear responses to
       | this question.
       | 
       | To say that 1+1=2 is "true", does that not require a corollary in
       | "reality" to something fundamental that can be called a "one"
       | object? I believe this is called mathematical constructivism.
       | 
       | Imagine, hypothetically, that we cannot identify something that
       | is physically fundamental and individual. My question is whether
       | any mathematics in that scenario could be considered "true"
       | without such constructivism, in other words, without a physical
       | correspondence to an unquestionably, physically fundamental "one"
       | object.
        
         | [deleted]
        
       | hbn wrote:
       | Not super on topic, but every time this site is linked, I never
       | properly read the URL correctly. My brain immediately thinks the
       | space is between the 's' and 'w'
        
         | mywacaday wrote:
         | Same with expertsexchange.com
        
         | thefreeman wrote:
         | thanks, now i'll never be able to read it properly again. :(
        
       | losthobbies wrote:
       | The dependency confusion article on Medium was a great read.
        
         | airstrike wrote:
         | It's a really good article and apologies to the author for
         | nitpicking but even as a bona fide Python fanboy I had to raise
         | my eyebrows at this statement:
         | 
         |  _> Some programming languages, like Python, come with an easy,
         | more or less official method of installing dependencies for
         | your projects._
        
           | nawgz wrote:
           | I mean, have you ever used a language like Java? Python has a
           | bad package manager story, sure, but it has a package manager
           | story - that's not actually particularly global afaik
        
         | remus wrote:
         | Beautifully simple! Exfiltrating data via a DNS request was a
         | nice little trick too.
        
           | [deleted]
        
         | baobabKoodaa wrote:
         | It's amazing that such a simple vulnerability can be leveraged
         | in practice to gain access to so many machines on so many
         | different organizations. Props to the researcher!
        
       | clarnaskirq wrote:
       | As a web programmer, for whom the majority of this article is not
       | only new, but difficult to comprehend, it makes me yearn to
       | improve my web security knowledge. Any pointers?
        
         | orangepurple wrote:
         | Go through each line item in the article and create a proof of
         | concept for yourself. You will learn a lot along the way too.
        
         | doopy1 wrote:
         | You can look at the disclosed reports on hackerone and get a
         | feel for the kind of stuff that's being exploited and how it's
         | being addressed.
        
         | ipnon wrote:
         | Do some of your own hacking on hackthebox.com. It is shocking
         | what can be done with only a week of security training by an
         | already experienced programmer. It becomes clear that the
         | typical software engineer doesn't give a _single_ thought to
         | security.
        
         | ridiculous_leke wrote:
         | I suggest going through cheatsheets on OWASP. Most of it is
         | comprehensible to any web programmer. Here's one example:
         | 
         | https://cheatsheetseries.owasp.org/cheatsheets/PHP_Configura...
        
       | icare_1er wrote:
       | It baffles me how convoluted and complex the webapp attacks have
       | become over the past few years.
       | 
       | I think this is an effect of bug-bounty hunting, which has pretty
       | much opened the research on those topics to a massive community.
        
       | bawolff wrote:
       | Kind of feels a little repetitive to have request smugguling on
       | the list 3 different times.
        
       | ackbar03 wrote:
       | Anyone here that works on these kind of deep-dive type of
       | security research? Can you give a TLDR of how do you usually set
       | everything up to find these results?
       | 
       | As in, do you set up some sort of test environment/website with
       | full debug logs and take if one step at a time from there? If so,
       | how to you ensure that it is realistic and relevant to real world
       | use since real-world architecture might differ from a setup that
       | worked in your experiments?
       | 
       | I ask this because I used to do some bug bounties and it
       | consisted of a lot of painful trial and error. I can't imagine
       | anything new and profound can be found that way.
       | 
       | (PS in case it isn't obvious I didn't open up the research links
       | and read in detail, hence a tldr)
        
         | EdOverflow wrote:
         | I am a security researcher referenced in the winning web-
         | hacking technique on that list ("Dependency Confusion" by Alex
         | Birsan [1]) and was ranked 7th in Portswigger's 2019 issue
         | [2,3]. My motto has always been "Learn to make it; then break
         | it." In other words, I invest a lot of time familiarising
         | myself with technologies and specifications before examining
         | how their implementation might lead to security flaws. This
         | process usually requires reading a lot of technical
         | documentation and source code, and becoming acquainted with how
         | organisations implement said technologies.
         | 
         | Once I feel comfortable with my understanding of the subject
         | material, I start to think about how certain aspects of the
         | technology could lead to security flaws or interesting areas of
         | research. At times this may require out-of-the-box thinking or
         | can even be the result of pure luck.
         | 
         | The "bug bounty" aspect of this all tends to come into play
         | once I want to find case studies for my research.
         | 
         | [1]: https://medium.com/@alex.birsan/dependency-
         | confusion-4a5d60f...
         | 
         | [2]: https://portswigger.net/research/top-10-web-hacking-
         | techniqu...
         | 
         | [3]: https://edoverflow.com/2019/ci-knew-there-would-be-bugs-
         | here...
        
       ___________________________________________________________________
       (page generated 2022-02-10 23:00 UTC)