[HN Gopher] Why CVE-2022-3602 was not detected by fuzz testing
       ___________________________________________________________________
        
       Why CVE-2022-3602 was not detected by fuzz testing
        
       Author : pjmlp
       Score  : 171 points
       Date   : 2022-11-21 15:48 UTC (7 hours ago)
        
 (HTM) web link (allsoftwaresucks.blogspot.com)
 (TXT) w3m dump (allsoftwaresucks.blogspot.com)
        
       | alkonaut wrote:
       | Adding more and better fuzzing instead of trying to fix the issue
       | (potentially malicious user input inside a C library) seems like
       | the wrong way to address the problem. Buffer overruns just
       | shouldn't be a concern of the developer or test suite but of the
       | compiler or language runtime.
        
         | artariel wrote:
         | Trusting the fuzzer and not examining its coverage seem to be
         | main problem here.
         | 
         | I fail to see what is problematic about giving the control over
         | the entire flow of the program to the developer. Quite the
         | contrary, I am more concerned about the paradigm shift towards
         | higher level system programming languages that hide more and
         | more control from the developer while putting more burden on
         | the perfectness of optimizer.
        
           | alkonaut wrote:
           | Absent a high performing systems language that still offers
           | some safety guarantees, the right call should be to use
           | whatever the second best is. It could be a higher level
           | language with runtime overhead, sandboxing, formal
           | verification etc. In some cases constraints won't allow this,
           | and obviously replacing even parts of infrastructure code is
           | never easy. Nor should the perfect be the enemy of the good -
           | adding better testing doesn't sound like a bad idea even for
           | a piece of code being sunset. What I'm objecting against is
           | the (apparent, or my perceived!) idea that "if only the
           | fuzzing was good enough, this code would be acceptable for
           | use forever.
        
         | DistractionRect wrote:
         | There are two problems. The CVE, and the fact that the current
         | fuzzing harness does did not find it. The CVE is getting fixed,
         | but obviously the fuzzer needs work too because it exists to
         | find these kinds of issues before they get used in the wild.
         | 
         | It's being handled how it should be. This happened, let's
         | handle it, and how can we work to better address future
         | problems.
        
       | sitkack wrote:
       | Open-loop fuzz testing catches only the most shallow of bugs. It
       | is like genetic optimization with no fitness function.
       | 
       | Why are people still using parsers for untrusted input in C? That
       | is the real flaw here, not how the fuzzing was done.
        
         | not2b wrote:
         | But modern fuzzers aren't open-loop, they are coverage
         | directed, adjusting their inputs to increase coverage. As the
         | article points out, this works best if leaf functions are
         | fuzzed; difficult to reach corners still might not be found.
        
         | [deleted]
        
         | halpmeh wrote:
         | Because there isn't a good way of distributing pre-compiled
         | cross-platform C libraries. So if you want to use a parsing
         | library written in Rust, for example, you'd need to add Rust to
         | your toolchain, which is a pain.
         | 
         | One solution to this problem would be to write an LLVM backend
         | that outputs C. Maybe such a thing already exists.
        
         | ralphb wrote:
         | I'm confused and very far from an expert here. What is wrong
         | with parsers, and what is the alternative?
        
           | jcims wrote:
           | A specific class of parsers
           | 
           | >parsers for untrusted input in C
        
             | thaumasiotes wrote:
             | That didn't answer anything. If you want to do anything
             | with your input, you have to run it through a parser.
             | Doesn't matter if it's untrusted or not. Your only options
             | are ignoring the input, echoing it somewhere, or parsing
             | it.
        
               | Diggsey wrote:
               | They're saying don't write such a parser in C. Use
               | something else (memory safe language, parser generator,
               | whatever).
        
               | lazide wrote:
               | And then do what with it? Throw it away?
               | 
               | If it hands it to a C program, that C program needs to
               | parse (in some form!) those values!
               | 
               | How is a C program expected to ever do anything if it
               | can't safely handle input?
        
               | nicoburns wrote:
               | Right, but you do have the option of writing that parser
               | in a language other than C. And given how often severe
               | security issues are caused by such parsers written in C,
               | one probably ought to choose a different language, or at
               | least use C functions and string types that store a
               | length rather than relying on null termination.
        
               | thaumasiotes wrote:
               | >>>>> Why are people still using parsers for untrusted
               | input in C?
               | 
               | No matter what the parser itself is written in, if you're
               | writing in C you'll be using the parser in C.
        
               | wongarsu wrote:
               | If you have the input in a buffer of known length in C,
               | hand it off to a (dynamic or static) library written in a
               | safe language, and get back trusted parsed output, then
               | there's much less attack surface in your C code.
        
               | lazide wrote:
               | The issue in many of these cases is there appears to be
               | no canonical safe way to know the length of the input in
               | C, and people apparently screw up keeping track of the
               | lengths of the buffers all the time.
        
               | harshreality wrote:
               | Even if application constraints mean you can't write a
               | parser in another language that's linkable to C, why
               | couldn't you use a parser generator that outputs C?
        
               | dllthomas wrote:
               | I agree that the original statement encourages that
               | interpretation, but I think it admits the interpretation
               | that the parser itself is in C and I think that is what
               | was intended.
        
               | nicoburns wrote:
               | 1. Well don't write in C then if your program is security
               | critical or going to be exposed over a network. Sure,
               | there are some targets that require C, but that's not the
               | case for the vast majority of platforms running OpenSSL.
               | 
               | 2. That's still less of a problem as the C will then be
               | handling trusted data validated by the safe langauge.
        
               | jstimpfle wrote:
               | If you make argument 2) could you explain how writing a
               | parser is more security critical than any other code that
               | has a (direct or indirect) interaction with the network?
               | At least recursive descent parsers are close to trivial.
               | I usually start by writing a "next_byte" function and
               | then "next_token". You'll have to look very hard to find
               | any pointer code there. It's close to impossible to get
               | this wrong and I don't see how the fact that it's a
               | parser would make it any more dangerous.
        
               | thaumasiotes wrote:
               | > It's close to impossible to get this wrong and I don't
               | see how the fact that it's a parser would make it any
               | more dangerous.
               | 
               | I can answer that one. The parser is more dangerous
               | because a parser, essentially by definition, takes
               | untrusted input.
               | 
               | Nothing the parser _does_ is any more dangerous than the
               | rest of the code; it 's all about the parser's position
               | in the data flow.
        
               | nicoburns wrote:
               | Well if you're dealing with a struct then the compiler
               | will provide type safety if say you try to access a field
               | that doesn't exist. You don't get the same safeguards
               | when dealing with raw bytes. Admittedly in C you can also
               | run into these hazards with arrays and strings, which I
               | why I suggest using non-standard array and string types
               | which actually store the length if you insist on using C.
        
             | lazide wrote:
             | Pretty much all input is untrusted unless it originated
             | (exclusively!) from something with more permissions that is
             | trustworthy.
             | 
             | The kernel is written in C.
             | 
             | So that pretty much means _all_ parsers written in C and
             | every other language should consider all input
             | untrustworthy, no?
        
               | Gigachad wrote:
               | Linux is probably the most carefully constructed C
               | codebase in existence and still falls in to C pitfalls
               | semi regularly. Every other project has no hope of safely
               | using C. It's looking more and more like Linux should be
               | carefully rewritten in Rust. It's a monstrous task but I
               | can see it happening over the next decade.
        
         | er4hn wrote:
         | The problem here was that the coverage of the fuzz testing was
         | not being examined.
         | 
         | Using parsers for untrusted input in C is a legacy of when this
         | was written. Requiring the parsing portion (or any version of
         | OpenSSL) to be rewritten in Rust or whatever new language is a
         | massive change given the length of time the OpenSSL project has
         | been around.
        
           | sitkack wrote:
           | "Massive" 338 line file.
           | 
           | https://github.com/openssl/openssl/blob/openssl-3.0.6/crypto.
           | ..
        
           | duped wrote:
           | It's not that big of a change, in the grand scheme of things.
           | But it's also not the only thing you can do. The memory safe
           | subset of C++ is also an option.
           | 
           | Shipping a CVE in critical infrastructure because of a
           | trivial memory safety bug is borderline negligence in 2022.
           | This is why people get upset over new code being written in
           | C. The cost of writing new portions of the software with
           | memory safety in mind dwarfs the cost of writing in C because
           | it's more convenient for the build tooling.
           | 
           | The bigger question is why hasn't OpenSSL bitten the bullet
           | and adopted some memory safety guarantees in their tooling,
           | given the knowledge of the sources of these bugs and
           | prevalent literature and tools in avoiding them!
        
             | coder543 wrote:
             | > The memory safe subset of C++ is also an option.
             | 
             | This does not actually exist, as far as I'm aware. There
             | are certain things people propose doing in C++ that
             | eliminate a small number of issues, but I haven't seen
             | anyone clearly define and propose a subset of C++ that is
             | reasonably described as memory safe. Even if such a subset
             | existed, you would still need some way to statically
             | enforce that people _only_ use that.
             | 
             | Even just writing the parsers in Lua should be a safer
             | choice than writing it in C, but I think now is as good of
             | a time as any to start writing critical code paths in Rust.
             | If the Linux kernel is beginning to allow Rust for kernel
             | modules, then it is high time that OpenSSL looked more
             | seriously at Rust too
             | 
             | As others have pointed out, parser generators could be a
             | useful intermediate option for some of this.
        
               | duped wrote:
               | You can write memory safe C++ easier than memory safe C.
               | The problem of statically verifying it the C++ is
               | actually memory safe because of the numerous ways to
               | write unsafe code in C++ is a different goalpost.
               | 
               | My point is that there isn't a compelling reason to write
               | new code in C for something where safety is critical.
        
               | pjmlp wrote:
               | It is called C++ Core Guidelines.
        
           | zasdffaa wrote:
           | Rewriting the parser portion of anything is not 'massive'.
           | Boring as anything, but mot difficult and not much time
           | consuming.
        
           | gizmo686 wrote:
           | Parser generators are one of the oldest ideas in computer
           | science. YACC was written in the 70s; and had to be ported to
           | C because it's original implementation was written in B.
           | 
           | The idea of not writing parsers directly was well established
           | by the time OpenSSL started in the late 90s.
        
             | stcredzero wrote:
             | Given that so many of these bugs are parsing bugs, maybe we
             | should have a new emphasis on compiler compilers which
             | generate fast, provably correct code? (Fast, because most
             | of the bugs and exploits are accompanied by some form of
             | optimization.)
        
               | nyrikki wrote:
               | I think you may run into the halting problem here.
        
               | stcredzero wrote:
               | You don't have to solve the halting problem every time it
               | presents itself. Otherwise, things like Valgrind and
               | fuzzing wouldn't be valid at all. You just have to
               | improve your odds.
               | 
               | EDIT: An important note to newbs: The Halting Problem is
               | correct. However, a problem which maps to the halting
               | problem can still be solved often enough in practice to
               | make it worthwhile. In fact, entire industries have been
               | born of heuristic solutions to such problems.
        
               | gizmo686 wrote:
               | Fast and provably correct are more or less solved
               | problems (at least for a large class of languages).
               | 
               | The main drawback is that it is difficult to get good
               | error messages when a parse fails.
        
             | hardware2win wrote:
             | Industry prefers hand written parsers for a reasons.
             | 
             | Parser generators feel like an academic dream
        
               | spockz wrote:
               | Why is this? I come from academia and I have yet to
               | encounter a good argument for not using parser
               | combinators, in new applications. Can you please point to
               | some reason?
        
         | mkeedlinger wrote:
         | What alternatives are there to parsers? Genuine question from
         | the ignorant.
        
           | chowells wrote:
           | There are no alternatives to parsers. There are alternatives
           | to "in C".
        
         | kazinator wrote:
         | You can easily write a robust parser in C. Just don't write a
         | clump of code that interleaves pointer manipulation for
         | scanning the input, writing the output and doing the parsing
         | _per se_.
         | 
         | * Have a stream-like abstraction for getting or peeking at the
         | next symbol (and pushing back, if necessary). Make it
         | impervious to abuse; under no circumstances will it access
         | memory beyond the end of a string or whatever.
         | 
         | * Have some safe primitives for producing whatever output the
         | parser produces.
         | 
         | * Work only with the primitives, and check all the cases of
         | their return values.
        
       | draw_down wrote:
        
       | docandrew wrote:
       | As powerful as fuzzing is, this is a good reminder why it's not a
       | substitute for formal verification in high-integrity or critical
       | systems.
        
         | er4hn wrote:
         | I would argue the issue was not checking the coverage of new
         | code vs what was being tested.
        
           | naasking wrote:
           | The OP is pointing out that what "the issue" is depends on
           | whether you want high confidence that your code has few bugs,
           | or you want certainty that your code contains no bugs.
        
         | fulafel wrote:
         | .. or more pragmatically, safer languages where errors aren't
         | exploitable to get remote code execution.
         | 
         | (I guess that semantics can also be seen as a formally verified
         | property)
        
           | ludovicianul wrote:
           | Safer languages cannot protect from bad design. Many
           | libraries have implicit behaviour which is not always
           | visible. It's a hard tradeoff to make. You want safety, but
           | in the same time enough customisation and features. I worked
           | recently with an http client library which was forbidding to
           | send special characters in headers. I understand that this is
           | a safety feature, but I really wanted to send weird
           | characters (building a fuzzing tool).
        
             | manbash wrote:
             | OK but we can mitigate these types of exploits (buffer
             | overflow etc.) using memory-safe languages.
             | 
             | Bad design is a universal orthogonal problem.
        
             | Gigachad wrote:
             | Seatbelts can not save you from bad driving, but they
             | certainly help mitigate the effects.
        
           | marginalia_nu wrote:
           | Given log4shell happened in one of the more aggressively
           | sandboxed languages with mainstream adoption, the outlook
           | isn't great.
        
             | yakubin wrote:
             | I'd classify runtime reflection as an unsafe language
             | feature, to be honest.
        
             | chowells wrote:
             | Sandboxing is mostly irrelevant to the log4j error. You'd
             | have to tell the sandbox to turn off reflection, which
             | isn't really feasible in Java. And that's because Java is
             | so poorly designed that big libraries are all designed to
             | use reflection to present an API they consider usable.
             | 
             | Compare that to a language designed well enough that
             | reflection isn't necessary for good APIs, for instance.
        
               | marginalia_nu wrote:
               | Dunno if I agree that libraries need reflection. Some do,
               | but primarily in the dependency injection and testing
               | space.
               | 
               | That's not really where you'd expect RCE-problems.
        
               | chowells wrote:
               | Yeah, I should say where developers don't _think_ they
               | need to use reflection.
               | 
               | Like, the log4j thing came from (among other design
               | errors) choosing to use reflection to look up filters for
               | processing data during logging. Why would log4j's
               | developers possibly think reflection is an appropriate
               | tool for making filters available? Because it's the easy
               | option in Java. Because it's the easy option, people are
               | already comfortable with it in other libraries. Because
               | it's easy and comfortable, it's what gets done.
               | 
               | Some languages make reflection much more difficult (or
               | nearly impossible) and other APIs much easier. It's _far_
               | more difficult to make that class of error in languages
               | like that.
        
               | strbean wrote:
               | > big libraries are all designed to use reflection to
               | present an API they consider usable.
               | 
               |  _whistles in python_
        
             | fulafel wrote:
             | Code executing in the JVM isn't sandboxed. Sandboxing could
             | have indeed mitigated log4shell. Log4shell was a design
             | where a too powerful embedded DSL was exposed to untrusted
             | data in a daft way - the log("format here...", arg1, arg2)
             | call would interpret DSL expressions in the args carrying
             | logged data. One can even imagine it passing formal
             | verification depending on the specification.
             | 
             | But more broadly the thing is that eliminating these low
             | level language footguns would allow people to focus on the
             | logic and design errors.
        
             | pjmlp wrote:
             | That belongs to the 30% of exploits that we are left with,
             | after removing the 70% others from C.
        
               | marginalia_nu wrote:
               | I think you are correct, but I do not think the average
               | severity of these exploits is necessarily the same.
        
               | pjmlp wrote:
               | US agency for cyber security thinks otherwise.
        
             | cryptonector wrote:
             | Upvote for log4shell. That's pretty funny.
             | 
             | Yes, a safer language is not enough, but it is a huge leap
             | forward, so I'll take it.
        
       | jeffbee wrote:
       | tl;dr: because ossl_a2ulabel had no unit tests until a few days
       | ago, the fuzzer could not have reached it through any combination
       | of other tests.
       | 
       | That fuzzing is tricky was not the problem here. The problem is
       | the culture that allowed ossl_a2ulabel to exist without unit
       | tests. And before some weird nerd jumps in to say that openssl is
       | so old we can't apply modern standards of project health, please
       | note that the vulnerable function was committed from scratch in
       | August 2020. Without unit tests.
        
         | MuffinFlavored wrote:
         | > because ossl_a2ulabel had no unit tests until a few days ago
         | 
         | it's not realistic to enforce unit test coverage % with a
         | project at the scale of OpenSSL, right?
        
           | pca006132 wrote:
           | It should be realistic, line coverage isn't really that hard.
           | The hard thing is that high line coverage alone is usually
           | not enough for numerical stuff...
        
           | jeffbee wrote:
           | It is trivial to enforce that new functions have new unit
           | tests and fuzz tests. You are the reviewer of
           | https://github.com/openssl/openssl/pull/9654 and you just say
           | "Please add unit tests and fuzz tests for foo and bar" and
           | you don't approve it.
           | 
           | I don't know what the deal is with their testing culture but
           | in year 27 of the project they demonstrably haven't learned
           | this lesson. It's nice that they added integration tests
           | (testing given encoded certs) but as the article points out
           | that was insufficient.
        
             | bluGill wrote:
             | Last week I code a "please add unit tests for this code".
             | The person who wrote that comment wasn't aware that this
             | was a refactoring where the functionality was well tested.
             | 
             | There is no substitute for reviewers who really understand
             | the code in question. The problem is they are the ones
             | writing the code and so are biased and not able to give a
             | good review.
        
             | dralley wrote:
             | IMO, one of the biggest benefits of "modern" systems
             | languages like Rust, D, Zig is how much easier they make
             | writing and running tests compared to C and C++. Yes, you
             | can write tests for those languages, but it's nowhere near
             | as trivial. And that makes a difference.
        
               | pjmlp wrote:
               | I was writing unit tests for C in 1996, naturally we
               | still haven't coined the term, so we just called them
               | automatic tests.
               | 
               | It was part of our data structures and algorithms
               | project, failure to execute the automatic tests meant no
               | admission to the final exam.
               | 
               | We had three sets of tests, those provided initially at
               | the begin of the term, those that we were expected to
               | write ourselves, and a surprise set on the integration
               | week at the end of the semester.
        
               | fisf wrote:
               | I am not buying it.
               | 
               | Writing unit tests for c/c++ is trivial. There are
               | perfectly fine test frameworks, used by developers every
               | day, integrated in any major IDE or runnable as one-liner
               | from the command line.
               | 
               | This is absolutely a cultural problem.
        
           | masklinn wrote:
           | > it's not realistic to enforce unit test coverage % with a
           | project at the scale of OpenSSL, right?
           | 
           | Why not?
           | 
           | You can enforce that all new files should be covered (at the
           | very least line-covered). It requires some setup effort
           | (collecting code coverage and either sending it to a tool
           | which perform the correlation or correlating yourself), but
           | once that's done... it does its thing.
           | 
           | Then you can work on increasing coverage for existing files,
           | and ratcheting requirements.
        
           | stefan_ wrote:
           | No one is asking for that. But this is code that did one
           | thing: punycode decoding, with millions of well documented
           | test vectors. The code had zero dependencies on anything
           | OpenSSL related. It is a very simple "text in, text out"
           | problem, the most trivial thing to write unit tests for. At
           | the same time, it's code that _parses_ externally provided
           | buffers and has to deal with things like unicode in C -
           | _there should be a massive red flashing warning light in
           | every developers head here_.
        
           | duped wrote:
           | It's realistic to reject a CR for a new parsing function
           | without proof that it works, which usually comes in the form
           | of a unit test
        
             | planede wrote:
             | Nit: a unit test never proves that it works. At best it can
             | prove that it doesn't work.
             | 
             | Otherwise, I agree.
        
               | hardware2win wrote:
               | Nit: "never"? Even if you unit test all cases?
        
               | bluGill wrote:
               | How do you know you covered all cases. You can verify you
               | cover all cases the code handles easily enough. However
               | does the code actually handle all the cases that could
               | exist? A formal proof can bring to your attention a case
               | that you didn't handle at all.
               | 
               | Formal proofs also have limits. Donald Knuth once
               | famously wrote "beware of bugs in the above code, I
               | proved it correct but never ran it". Which is why I think
               | we should write tests for code as well as formally prove
               | it. (On the later I've never figured out how to prove my
               | code - writing C++ I'm not sure if it is possible but I'd
               | like to)
        
               | planede wrote:
               | Touche
        
               | tz18 wrote:
               | Yes, never. You are assuming that the implementation of
               | all unit test cases are themselves correct (that they
               | would fail if there was any error in the case they
               | cover). In fact unit tests are often wrong. In that
               | context a unit test can't even prove code incorrect,
               | unless we know that the unit test is correct.
               | 
               | IMO to prove that code is correct requires a proof; a
               | unit test can only provide evidence suggestive of
               | correctness.
        
               | planede wrote:
               | Mistakes in proofs are just as probable as mistakes in
               | exhaustive tests.
               | 
               | An exhaustive test is just one type of a machine verified
               | proof.
        
               | nicoburns wrote:
               | > An exhaustive test is just one type of a machine
               | verified proof.
               | 
               | Not entirely sure I agree with this. A proof by
               | construction is a very different beast to empirical unit
               | tests that only cover a subset of inputs. The equivalent
               | would be units tests that cover every single possible
               | input.
        
           | asveikau wrote:
           | It's worth noting that coverage can be a deceptive metric
           | sometimes.
           | 
           | You can have coverage on code that divides - it won't tell
           | you if you ever divide by zero.
           | 
           | You can have coverage on code that follows a pointer - it
           | won't tell you if you ever pass a bad pointer.
        
             | adql wrote:
             | yeah but there isn't even a try here
        
               | asveikau wrote:
               | Not saying it isn't worth an attempt, just that the real
               | meaning shouldn't be lost.
        
         | adql wrote:
         | Seems OpenSSL as an organization is just irreparably broken if
         | they haven't _still_ learned the lesson
        
           | Someone1234 wrote:
           | OpenSSL still receives minimum funding. Until Heartbleed they
           | had nearly no funding, and now it is two full time people.
           | 
           | https://en.wikipedia.org/wiki/Core_Infrastructure_Initiative.
           | ..
        
             | pizza234 wrote:
             | I'm of the opinion that in cases like this, it'd be better
             | for the organization to close, and allow the gap to be
             | filled naturaly.
             | 
             | If the current OpenSSL maintainers closed the project,
             | given its importance, there would be a rush to follow up
             | maintenance. Chances are, it'd be better funded; even in
             | worst case, it'll hardly be assigned less than two devs.
             | 
             | This a case of the general dynamic where a barely-
             | sufficient-but-arguably-insufficient solution prevents
             | actors from finding and executing a proper one.
        
               | SoftTalker wrote:
               | For a project like OpenSSL, it's not just having "enough"
               | developers (whatever that is) it's having _qualified_
               | developers. Writing good crypto code requires deep
               | expertise. There aren 't a lot of people with such
               | expertise whose time is not already fully committed.
        
               | AdamJacobMuller wrote:
               | You don't even have to close. You can just refuse to
               | merge code which does not include 100% test coverage. If
               | someone wants the feature badly enough, they will figure
               | out a way to fill the gap. Alternatively, someone can
               | always fork the code and release "OpenSSL-but-with-lots-
               | of-untested-code" variant.
        
           | coder543 wrote:
           | I think at this point we've established that it's C which is
           | just irreparably broken.
           | 
           | Blaming the OpenSSL developers for writing bad C is just a
           | "no true scotsman" at this point, since there is no large,
           | popular C codebase in existence that I'm aware of that avoids
           | running into vulnerabilities like this; vulnerabilities that
           | just about every other language (mainly excluding C++) would
           | have prevented from becoming an RCE, and likely prevented
           | from even being a DoS. Memory safe languages obviously can't
           | prevent _all_ vulnerabilities, since the developer can still
           | intentionally or unintentionally write code that simply does
           | the wrong thing, but memory safe languages can prevent a lot
           | of dumb vulnerabilities, including this one.
           | 
           | No feasible amount of funding would have prevented this,
           | since it continues to happen to much better funded projects
           | also written in C.
           | 
           | On the other hand, I guess we _could_ blame the OpenSSL
           | developers for writing C at all, being unwilling to start
           | writing new code in a memory safe language of some kind, and
           | ideally rewriting particularly risks code paths like parsers
           | as well. We 've learned this lesson the hard way a thousand
           | times. C isn't going away any time soon (unfortunately), but
           | that doesn't mean we have to _continue_ writing new
           | vulnerabilities like this one, which was written in the last
           | two years.
        
             | tredre3 wrote:
             | > Blaming the OpenSSL developers for writing bad C is just
             | a "no true scotsman" at this point, since there is no
             | large, popular C codebase in existence that I'm aware of
             | that avoids running into vulnerabilities like this;
             | vulnerabilities that just about every other language
             | (mainly excluding C++) would have prevented from becoming
             | an RCE
             | 
             | No, this whole thing is about the lack of testing. Adding a
             | parser without matching tests is just absurd regardless of
             | the language it's implemented with. If only for basic
             | correctness check, you want a test.
             | 
             | Not all vulnerabilities or bugs are memory-related,
             | vulnerabilities are bound to surface in any language with
             | that kind of organizational culture.
        
             | jabart wrote:
             | Keep in mind that Ubuntu compiled OpenSSL using a gcc flag
             | that turns this one byte overflow into a crash instead of a
             | memory leak/corrpution because it has a way to do that
             | already. It's very risky, and a very long term project to
             | rewrite something with this level of history into a
             | completely new language.
        
               | coder543 wrote:
               | > It's very risky, and a very long term project to
               | rewrite something with this level of history into a
               | completely new language.
               | 
               | I _didn 't suggest_ a complete rewrite of the project.
               | However, they _could_ choose to only write new code in
               | something else, and they could rewrite certain critical
               | paths too. The bulk of the code would continue to be a
               | liability written in C.
               | 
               | I agree that it would be nearly impossible to rewrite
               | OpenSSL as-is. It would take huge amounts of funding and
               | time. In general, people with that much funding are
               | probably better off starting from scratch and focusing on
               | only the most commonly used functionality, as well as
               | designing the public interface to be more ergonomic /
               | harder to misuse.
        
           | cryptonector wrote:
           | The OpenSSL team is actually very good and they do very good
           | work, and they even have funding. The problem is that legacy
           | never goes away, and OpenSSL is a huge pile of legacy code,
           | and it will take a long long time to a) fix all the issues
           | (e.g., code coverage), b) migrate OpenSSL to not-C or the
           | industry to not-OpenSSL.
        
       | KingLancelot wrote:
       | OpenSSL is known to be broken.
       | 
       | All this bickering over language misses the real problem.
       | 
       | The actual solution is that open source, widely used code is a
       | target for hackers.
       | 
       | By using one library used everywhere for everything, you're
       | painting a target on your own back.
       | 
       | The real solution is we need the software ecosystem to have more
       | competition and decentralization.
       | 
       | Use alternative crypto libraries.
       | 
       | If you want a drop in replacement, use LibreSSL which was forked
       | and cleaned up by the OpenBSD guys due to HeartBleed.
       | 
       | But the long term solution, is more competition by using smaller,
       | more specialized libraries, or even writing your own.
        
         | nicoburns wrote:
         | > The actual solution is that open source, widely used code is
         | a target for hackers.
         | 
         | The long term solution is likely using languages which are A.
         | Memory safe, and B. make formal verification viable. Being
         | widely used and open source isn't an issue if there are no
         | exploitable bugs in the code.
        
           | KingLancelot wrote:
           | I see a lot of people pushing memory safe languages, far more
           | people than actually write systems code.
           | 
           | What is your primary language?
        
             | pjmlp wrote:
             | Many of those people have experience writing systems
             | programming code before UNIX and C got widespread outside
             | Bell Labs.
             | 
             | Mac OS was originally written in Object Pascal + Assembly,
             | just to cite one example from several.
        
               | KingLancelot wrote:
               | Nice deflection, but you still didn't answer the
               | question.
               | 
               | Have you ever written in a memory safe language like
               | you're telling others to do?
        
               | pjmlp wrote:
               | Yes, my dear, plenty of times since 1986.
               | 
               | Turbo Pascal, Turbo Basic, Modula-2, Ada, Oberon, ...
        
         | hn_acc_2 wrote:
         | There are a limited number of people willing to spend a limited
         | amount of time fuzzing, reviewing, and scrutinizing crypto
         | libraries. The more libraries exist, the more their efforts are
         | divided, and the total scrutiny each library receives
         | decreases. How would this help the problem?
        
       | sramsay wrote:
       | I don't get it. Why doesn't everyone just use the battle-
       | hardened, fully-compliant Rust implementation of OpenSSL?
        
       | yumjum wrote:
       | Loads of bugs aren't detected by fuzz testing, as this technique
       | exhibits stochastic behaviour, where you'll most likely find bugs
       | overall, but have varying chances (including none at all) of
       | uncovering specific bugs.
       | 
       | Which is great news for those of us who approach such research by
       | gaining a deep understanding of the code and the systems it
       | exists in, and figuring out vulnerabilities from that
       | perspective. An overreliance on fuzzing keeps us employed.
        
         | Diggsey wrote:
         | Fuzz testing has a very high chance of detecting bugs,
         | especially these kind, but you do need to at least check that
         | the fuzzer is reaching the relevant code!
        
           | fulafel wrote:
           | This is reasoning backwards in a misleading way. The point is
           | not changing the fuzzing setup to find this specific bug that
           | we now know with hindsight was there. There are a zillion
           | paths and you would need to be ensuring that fuzzing reaches
           | all vulnerable code with values that trigger all vulnerable
           | dynamic behaviours.
        
             | m463 wrote:
             | isn't that what code coverage does?
        
             | Diggsey wrote:
             | It's not backwards: you run the fuzzer, you look at the
             | code coverage, and you compare that against what you expect
             | to be tested. Then you update the fuzzing harness to allow
             | it to find missing code paths.
             | 
             | It's far more doable than you are suggesting: fuzzing
             | automatically covers most branches anyway, so you just need
             | to manually deal with the exceptions (which are easy to
             | locate from the code coverage).
             | 
             | I used fuzzing to test an implementation of Raft, and with
             | only a little help, the fuzzer was able to execute every
             | major code path, including dynamic cluster membership
             | changes, network failure and delays. The Raft safety
             | invariants are checked after each step. Does this guarantee
             | that there are no bugs? Of course not. It did however find
             | some very difficult to reproduce issues that would never
             | have been caught during manual testing. And this is with a
             | project not even particularly well suited to fuzzing! A
             | parser is the dream scenario for a fuzzer, you just have to
             | actually run it...
        
               | fulafel wrote:
               | Yep, code coverage can tell you code is definitely
               | entirely untested, but doesn't tell you that you are
               | covering the input space to have high assurance that
               | there aren't vulnerabilities.
               | 
               | Coverage might have helped here (or not), but it doesn't
               | fix the general problem of fuzzing being stochastic and
               | only testing some behaviours of the covered code.
        
             | PhearTheCeal wrote:
             | I wonder if one possible solution is making things more
             | "the Unix way" or like microservices. Then instead of
             | depending on some super specific inputs to reach deep into
             | some code branch, you can just send input directly to that
             | piece and fuzz it. Even if fuzzers only catch shallow bugs,
             | if everything is spread out enough then each part will be
             | simple and shallow.
        
               | fulafel wrote:
               | This is the flip size of the fuzzing approach that is
               | called property testing. It's legit but involves unit
               | test style manual creation of lots of tests for various
               | components of the system, and a lot of specs of what are
               | the contracts between components & aligning the property
               | testing to those.
        
               | eklitzke wrote:
               | Fuzzers can already do this. When you set up a fuzzer you
               | set up what functions it's going to call and how it
               | should generate inputs to the function. So you can fuzz
               | the X.509 parsing code and hope it hits punycode parsing
               | paths, but you can also fuzz the punycode parsing
               | routines directly.
        
         | skybrian wrote:
         | Fuzz tests can take a seed corpus of test vectors. If the test
         | framework tries them first, it can guarantee that it will find
         | _those_ bugs in any test run. For anything beyond that, it
         | depends on chance.
        
         | spockz wrote:
         | Or seen the other way around. By applying fuzzing to find the
         | "silly" type of bugs, you can spend your artistic efforts on
         | finding the other bugs.
        
           | ludovicianul wrote:
           | I think this is the main reason fuzzing exist. Let the boring
           | part to the tool and focus on the most creative work
        
       | w_for_wumbo wrote:
       | I'm not familiar with C enough to know the answer, but I'm trying
       | to think how anything goes from untrusted input -> trusted input
       | safely. To sanitize the data, you're putting the input into
       | memory to perform logic on it, isn't that itself then an attack
       | vector? I would think that any language would need to do this.
       | 
       | Is anyone able to explain this to me?
        
         | oconnor663 wrote:
         | There are a lot of different issues that can come up, but in
         | practice ~80% of those (my made up number) are out-of-bounds
         | issues. So for example, say you're parsing a JSON string
         | literal. What happens if the close-quote is missing from the
         | end of the string? You might have a loop that iterates forward
         | looking for the close-quote until it reaches the end of the
         | input. What that code _should_ do is then return an error like
         | "unclosed string". If you write that check, your code will be
         | fine in any language. What if you forget that check? In most
         | languages you'll get an exception like "tried to read element
         | X+1 in an array of length X". That's not a great error message,
         | but it's invalid JSON anyway, so maybe we don't care super
         | much. However in C, array accesses aren't bounds-checked, so
         | your loop plows forward into random memory, and you get a CVE
         | roughly like this one.
         | 
         | In short, the issue is that you forgot a check, and your code
         | effectively "trusted" that the input would close all its
         | strings. If you never make mistakes like that, you can validate
         | input in C just like in any other language. But the
         | consequences of making that mistake in C are really nasty.
        
         | zwkrt wrote:
         | Just because something is in memory doesn't mean that it is
         | realistically executable. That's why you can download a virus
         | to look at the code without it installing itself.
         | 
         | You aren't wrong that even downloading untrusted data is less
         | secure than not downloading it. But to actually exploit a
         | machine that is actively sanitizing unsafe data, you need
         | either (A) an attack vector for executing code at an arbitrary
         | location in memory, or (B) a known OOB bug in the code that you
         | can exploit to read your malicious data, by ensuring your data
         | is right after the data affected by the OOB bug.
        
         | bluGill wrote:
         | >To sanitize the data, you're putting the input into memory to
         | perform logic on it
         | 
         | Sure, but memory isn't normally executed.
         | 
         | One of the more common problems was not checking length. Many C
         | functions assume sanitized data and so they don't check. You
         | have functions to get that data that don't check length - thus
         | if someone supplies more data than you have more room for (gets
         | is most famous, but there are others) the rest of the data will
         | just keep going off the end - and it turns out in many cases
         | you and predict where that off the end is, and then craft that
         | data to be something the computer will run.
         | 
         | One common variation: C assumes that many strings end with a
         | null character. There are a number of ways to get a string to
         | not end with that null, and if the user can force that those
         | functions will read/write past the end of data which is
         | sometimes something you can exploit.
         | 
         | So long as your C code carefully checks the length of
         | everything you are fine. One common variation of this is
         | checking length but miss counting by one character. It is very
         | hard to get this right every single time, and mess it up just
         | once and you are open to something unknown in the future.
         | 
         | (Note, there are also memory issues with malloc that I didn't
         | cover, but that is something else C makes hard to get right).
        
       | planede wrote:
       | Would it be reasonable to have fuzz testing around reasonably
       | sized units in addition to e2e?
        
         | dllthomas wrote:
         | Yes, but see also "property testing" like QuickCheck and
         | Hypothesis, the line is blurry.
        
       | bawolff wrote:
       | > I think we should give the developers the benefit of doubt and
       | assume they were acting in good faith and try to see what could
       | be improved.
       | 
       | I feel like there is this trend of assuming any harsh criticism
       | is bad faith. Asking why industry standard $SECURITY_CONTROL
       | didn't work immediately after an issue happened that should have
       | been caught by $SECURITY_CONTROL is hardly a bad faith question.
        
         | aidenn0 wrote:
         | Questions themselves are not good-faith or bad-faith. People
         | asking questions are doing so in either good-faith or bad-
         | faith.
         | 
         | Someone pushing hard on legitimate criticisms with the intent
         | of attacking a project or members thereof is acting in bad-
         | faith, while someone ignorant with a totally bogus criticism
         | could be acting in good-faith. Many bad-faith actors hide
         | behind a veneer of legitimacy by disguising or shifting the
         | gaze away from their motivations.
        
           | bawolff wrote:
           | Umm, i disagree.
           | 
           | Bad/good faith is about whether you are being misleading or
           | dishonest in asking the question.
           | 
           | You can intend to go attack a project in good faith as long
           | as you are not being misleading in your intentions.
           | 
           | For example, a movie critic who pans a film is not acting in
           | bad faith since they aren't being misleading in their
           | intentions.
        
             | aidenn0 wrote:
             | A movie critic who pans a film they think sucked is acting
             | in good faith; a movie critic who pans a film _specifically
             | with the intent of attacking the film_ (whether as
             | clickbait or because they don 't like someone involved with
             | the film or whatever) is acting in bad faith.
             | 
             | We might actually be in agreement with each other because a
             | critic who leads with "The director slept with my wife, so
             | I'm only going to say all the bad things about the film and
             | you should probably ignore this review" would have
             | significantly blunted their attack by leading with it, and
             | are arguably not acting in bad-faith.
        
       | germandiago wrote:
       | I really find OpenSSL function call interfaces infuriating more
       | so if you think that this is a security library.
       | 
       | I think interfaces in Botan, to give an example, are way easier
       | to use.
       | 
       | It looks to me like a minefield the OpenSSL API.
        
       ___________________________________________________________________
       (page generated 2022-11-21 23:00 UTC)