[HN Gopher] Stranger Strings: An exploitable flaw in SQLite
       ___________________________________________________________________
        
       Stranger Strings: An exploitable flaw in SQLite
        
       Author : eatonphil
       Score  : 285 points
       Date   : 2022-10-25 11:40 UTC (11 hours ago)
        
 (HTM) web link (blog.trailofbits.com)
 (TXT) w3m dump (blog.trailofbits.com)
        
       | gregors wrote:
       | Great work!!!
       | 
       | "SQLite is extensively tested with 100% branch test coverage. We
       | discovered this vulnerability despite the tests, which raises the
       | question: how did the tests miss it?"
       | 
       | Tests only validate known behavior. They didn't have a test that
       | exercised a large string. I don't know how their test suite works
       | but I'm curious if quickcheck might have possibly found this or
       | if the 32-bit viewpoint of the programmer at the time would have
       | also limited possible generated values?
        
         | RHSeeger wrote:
         | Indeed. Tests do not tell you code is correct. They only tell
         | you a list of ways it is not incorrect.
        
           | aa-jv wrote:
           | .. and combined with coverage they also tell you what you are
           | yet to test properly.
        
             | kevincox wrote:
             | Actually: It tells you _some things_ that are not tested
             | properly.
             | 
             | Just because something has coverage doesn't mean that it
             | _is_ tested properly as evidenced by this bug even though
             | sqlite has 100% branch coverage.
        
               | Macha wrote:
               | Yeah, coverage (or rather 100% - coverage) is basically a
               | lower bound on what's not tested.
        
               | masklinn wrote:
               | Even "coverage" is not enough, what 100% coverage? Lines?
               | Branches? Paths?
        
               | Macha wrote:
               | That's my point, coverage (whatever your chosen metric)
               | can only prove the absence of tests, not their presence
               | 
               | e.g. if your coverage is 80%, then you know (100-80)% =
               | 20% is definitely not tested since it never even ran in
               | your tests. It doesn't tell you much about the 80%
        
             | fulafel wrote:
             | s/properly//
        
           | PathOfEclipse wrote:
           | I think this thinking is short-sighted. Tests increase your
           | confidence that code is correct, and they are much cheaper to
           | do that formal verification, which itself still requires a
           | leap of faith that there is no bug in the formal verification
           | system, its implementation in the compiler toolchain, or the
           | assumptions made about the underlying hardware.
           | 
           | You can cast doubt on anything. Will you be alive 30 seconds
           | from now? You don't know for sure! But you can be pretty
           | confident. Similarly, testing can increase confidence in code
           | correctness. SQLite has an excellent security history
           | overall, and it's testing strategy is part of the reason for
           | that success.
        
             | hulitu wrote:
             | > Tests increase your confidence that code is correct, and
             | they are much cheaper to do that formal verification,
             | 
             | Your confidence as SW engineer. The purpose of a test is to
             | look for unexpected behaviour, not for expected behaviour.
        
             | pclmulqdq wrote:
             | Formal verification also becomes practically impossible at
             | a certain level of complexity.
             | 
             | SQLite is better tested than many silicon chips, so it's
             | hard to fault their testing strategy.
             | 
             | Perhaps a few static analyzers should be run on the
             | codebase to find possible bugs like this.
        
               | hulitu wrote:
               | > Formal verification also becomes practically impossible
               | at a certain level of complexity.
               | 
               | Then reduce complexity.
        
               | nottorp wrote:
               | Well, int main (void) { return 1; } is most likely bug
               | free. But it's not useful (except as /bin/true).
        
               | [deleted]
        
               | jwilk wrote:
               | That would be a very buggy /bin/true implementation.
        
               | function_seven wrote:
               | You can make things as simple as possible, but no simpler
               | than that.
               | 
               | SQLite is a complex program, inherently. It's designed to
               | allow users to do complex things, and as such cannot be
               | simpler than its purpose allows.
               | 
               | I'm no formal verification expert--so I don't know if
               | SQLite is above or below that fuzzy threshold--but
               | "reduce complexity" is not always an available tool.
        
         | mrcus wrote:
         | "Program testing can be used to show the presence of bugs, but
         | never to show their absence!"
        
         | hulitu wrote:
         | Welcome to SW testing. Testing for limits is a no go. Testing
         | for random input is a no go.
        
         | XCSme wrote:
         | Proving that a code always works means that there doesn't exist
         | any case in which it breaks. You can't really prove something
         | doesn't exist, unless the input domain is of fixed size and you
         | can test all of it. In self-hosted technology, it is even
         | trickier to prove something won't break, as there is so much
         | potential variation in the hardware/software that your program
         | runs on. There might even be stupid edge-cases, like the
         | program failing only when the device it runs on is upside-down,
         | or placed at high altitude, etc.
        
           | naikrovek wrote:
           | Is this what "formal verification" is? That seems very hard
           | for software that accepts arbitrary input.
           | 
           | I understand formal verification, a bit, in terms of FPGA
           | design, but not software.
        
         | capableweb wrote:
         | Unsure if Property Testing would actually have found this
         | issue, 2 billion bytes is a very large string and usually you
         | wouldn't put the limit so high, the tests would run for too
         | long.
         | 
         | Fuzzing would be a more appropriate measure to have found this
         | issue, as usually you do fuzzing with no limit of length/size,
         | as that's the entire point of fuzzing to expand the scope of
         | what you're testing.
         | 
         | Regarding test coverage, I'd argue that even caring about test
         | coverage is giving people a false sense of security. It's easy
         | to achieve 100% test coverage, but that doesn't actually say
         | that you're testing the right/wrong thing, just that at one
         | point that line of code was executed, but again, that says
         | nothing about correctness.
        
           | ajross wrote:
           | Yeah, exactly. People tend to misunderstand fuzzing, and
           | think of it as "feeding random garbage at an API". Fuzzing,
           | done with proper tooling, is _automated coverage testing_.
           | You have a bounds overflow that depends on input data but
           | only if condition A and B and C are true in the runtime
           | state? Fuzzing will find it, almost always, by detecting the
           | branching behavior of your code.
           | 
           | So all the replies here saying "the tests didn't find it
           | because they didn't cover the case" are missing the point.
           | There are test paradigms that don't rely on humans to
           | enumerate cases.
        
           | adrianN wrote:
           | It is absolutely _not_ easy to achieve the kind of coverage
           | that sqlite has. You need massive amounts of tests for that.
           | Of course coverage is not sufficient, you can exercise code
           | without checking any properties at all. But to say that it
           | gives a false sense of security is too harsh.
        
             | capableweb wrote:
             | > It is absolutely not easy to achieve the kind of coverage
             | that sqlite has.
             | 
             | It is though. Give me one project and I'll send you a test
             | suite that "tests" 100% of executable lines. It won't have
             | any actual assertions, but that's not what code coverage is
             | about anyways, so hopefully you won't mind.
             | 
             | Number of assertions per line of code is about as good
             | metric for how well tested something is as code coverage,
             | which means just about nil.
        
               | adrianN wrote:
               | How much do you charge? Because adding assertions to
               | tests that give MC/DC coverage is a lot cheaper than
               | writing them. I know a couple of big companies who might
               | be interested.
        
               | chasil wrote:
               | Compliance with the DO-178B standard was _not_ easy for
               | the SQLite maintainers. It required several years to
               | achieve.
               | 
               | By adhering to the DO-178B standard, the testing is
               | confirmed extensive and satisfies requirements for use in
               | avionics.
               | 
               | "Rockwell Collins... introduced me to this concept of
               | DO-178B. It's a quality standard for safety-critical
               | aviation products, and unlike so many quality standards,
               | this one's actually readable... one of the key things
               | that they push is, they want 100% MCDC test coverage.
               | 
               | "Your tests have to cause each branch operation in the
               | resulting binary code to be taken and to fall through at
               | least once.
               | 
               | "At the machine code level... actually, MCDC's a little
               | stricter than that. Let's not dwell on the details, but I
               | had this idea, I'm going to write tests to bring SQLite
               | up to the quality of 100% MCDC, and that took a year of
               | 60 hour weeks. That was hard, hard work. I was putting in
               | 12 hour days every single day. I was just getting so
               | tired of this because with this sort of thing, it's the
               | old joke of, you get 95% of the functionality with the
               | first 95% of your budget, and the last 5% on the second
               | 95% of your budget. It's kind of the same thing. It's
               | pretty easy to get up to 90 or 95% test coverage. Getting
               | that last 5% is really, really hard and it took about a
               | year for me to get there, but once we got to that point,
               | we stopped getting bug reports from Android."
               | 
               | https://corecursive.com/066-sqlite-with-richard-hipp/
               | 
               | Indeed, SQLite is used for flight systems on the Airbus
               | A350. No other major database vendor is certified to do
               | so.
               | 
               | https://www.sqlite.org/famous.html
        
               | [deleted]
        
               | Someone wrote:
               | > Give me one project and I'll send you a test suite that
               | "tests" 100% of executable lines.
               | 
               | I don't think you can do that with every project. For
               | example, a function _f_ may do:                 if (p ==
               | null) {         abort();       }
               | 
               | If all callers also do that test, there's no way to hit
               | that _abort()_ line in the program. Your test suite may
               | not be able to hit it either. Maybe _f_ is a private
               | function, or that test occurs on one of its intermediate
               | values, or _f_ accidentally has two identical tests:
               | if (p == null) {         abort("p is null");       }
               | if (p == null) { // copy-paste bug         abort("q is
               | null");       }       if (r == null) {         abort("r
               | is null");       }
               | 
               | Even if it is possible to hit that line, finding a path
               | through the code that passes in a null value can be
               | extremely hard.
               | 
               | As a simple contrived counterexample, try getting 100%
               | coverage on a function that does _if(secure_hash(foo) ==
               | 0) bar();_
        
               | garaetjjte wrote:
               | They don't count assertions in testing coverage: https://
               | www.sqlite.org/testing.html#coverage_testing_of_defe...
        
               | phoe-krk wrote:
               | _> Your test suite may not be able to hit it either._
               | 
               | Minor nitpick, process terminations are unit-testable in
               | general; see e.g. https://github.com/google/googletest/bl
               | ob/main/docs/advanced...
        
               | jonhohle wrote:
               | What is fair in testing and what is being tested? If I
               | can stub `secure_hash` and `bar`, that's a pretty easy
               | test to write. Assuming the symbols are visible at test
               | time, stubbing those is relatively straight forward in C,
               | Java, Bash, and Ruby, JavaScript, etc.
               | 
               | Even in the above examples, a linter should point out
               | that the second `p == null` is inaccessible. I'm very pro
               | test coverage, but static analysis shouldn't be thrown
               | out just because unit tests are available.
        
           | pfdietz wrote:
           | Coverage testing is just a special case of a more general
           | concept: mutation testing (where the mutation is "if you
           | reach this line/branch, blow up.")
           | 
           | Perhaps mutation testing could have revealed the test suite
           | was lacking in a test detecting that bug (for example, a
           | mutation that changed a type from unsigned to signed or vice
           | versa.)
        
             | Quekid5 wrote:
             | I'm so sad that so few test frameworks/libraries offer
             | mutation testing. (And its profiling cousin Causal
             | Profiling.)
             | 
             | Admittedly the some languages might make them very hard to
             | do, but the payoff would be so, so sweet.
        
               | taeric wrote:
               | My team at work took it out of our product. :( I couldn't
               | make strong arguments that it was providing value, as it
               | had turned into the slow test that always passed, by the
               | time new folks joined. I do think it has a lot of merit,
               | though.
        
           | admax88qqq wrote:
           | Boundary Value tests are standard practice when doing
           | extensive testing.
           | 
           | Bugs rarely occur for typical input values but more often
           | occur at the "boundaries" or limits of the input domain.
           | 
           | That being said I don't really fault SQlites testing here.
           | 
           | The key takeaway for me is that even with exhaustive testing
           | you can still have security issues like this, so perhaps what
           | is needed is a language or methodology change.
           | 
           | If SQLite, a quality codebase with exhaustive tests, can have
           | memory vulnerabilities, maybe we need to leave C behind.
        
             | ghoward wrote:
             | SQLite is used in places that Rust has no compiler for and
             | that no other language has a compiler for.
             | 
             | One severe CVE in about 20 years is no reason to rewrite in
             | another language and remove support for some platforms.
        
               | threatofrain wrote:
               | Just how many machines are we talking about? Does anyone
               | have estimates?
        
               | ghoward wrote:
               | Probably every Airbus A350 XWB and a lot of GM, Nissan,
               | and Suzuki cars. Maybe also a few or a lot of General
               | Electric products. [1]
               | 
               | The Library of Congress also recommends it for
               | preservation of digital content, which would be harder if
               | Rust compilers were lost. If all C89 compilers were lost,
               | it would be much more simple to create a new one to
               | compile SQLite than it would be to recreate a Rust
               | compiler.
               | 
               | [1]: https://www.sqlite.org/famous.html
        
               | nicoburns wrote:
               | In the long term, I hope C stops being the default
               | language that hardware vendors provide for their
               | hardware.
        
               | ghoward wrote:
               | Me too, but I also hope Rust is not that default
               | language.
        
             | chasil wrote:
             | Your problem is that you need to replicate the DO-178B
             | certification in Rust:
             | 
             | "I'm going to write tests to bring SQLite up to the quality
             | of 100% MCDC, and that took a year of 60 hour weeks. That
             | was hard, hard work."
             | 
             | Whatever comes out of Rust will be much worse until
             | compliance is achieved. That's going to take substantial
             | time.
        
           | shagie wrote:
           | > Fuzzing would be a more appropriate measure to have found
           | this issue
           | 
           | SQLite has some fairly extensive fuzzing.
           | https://www.sqlite.org/testing.html#fuzz_testing
           | 
           | One of the sections on Fuzz testing:
           | 
           | > 4.3 Boundary Value Tests
           | 
           | > SQLite defines certain limits on its operation, such as the
           | maximum number of columns in a table, the maximum length of
           | an SQL statement, or the maximum value of an integer. The TCL
           | and TH3 test suites both contains numerous tests that push
           | SQLite right to the edge of its defined limits and verify
           | that it performs correctly for all allowed values. Additional
           | tests go beyond the defined limits and verify that SQLite
           | correctly returns errors. The source code contains testcase
           | macros to verify that both sides of each boundary have been
           | tested.
        
             | chasil wrote:
             | It sounds from the article that "-fstack-protector-all"
             | might remove everything except the denial of service.
             | 
             | "...arbitrary code execution is confirmed when the library
             | is compiled without stack canaries, but unconfirmed when
             | stack canaries are present, and denial-of-service is
             | confirmed in all cases."
             | 
             | Everything should be compiled with all available compiler
             | defenses:
             | 
             | https://manpages.debian.org/testing/devscripts/hardening-
             | che...
        
               | chasil wrote:
               | It does appear safe with the OS package on CentOS 7.
               | $ hardening-check /usr/lib64/libsqlite3.so.0.8.6
               | /usr/lib64/libsqlite3.so.0.8.6:        Position
               | Independent Executable: no, regular shared library
               | (ignored)        Stack protected: yes        Fortify
               | Source functions: yes (some protected functions found)
               | Read-only relocations: yes        Immediate binding: no,
               | not found!
        
           | marcosdumay wrote:
           | > Unsure if Property Testing would actually have found this
           | issue, 2 billion bytes is a very large string and usually you
           | wouldn't put the limit so high, the tests would run for too
           | long.
           | 
           | Yet, this is a textbook case of issues caught by static
           | analysis.
        
           | jonhohle wrote:
           | > It's easy to achieve 100% test coverage
           | 
           | If code hasn't been written to be testable, 100% coverage is
           | not _easy_, by any stretch. Any legacy code base written
           | without unit tests that I've come across required significant
           | refactoring to get any basic assertions I place, let alone
           | any reasonable coverage.
           | 
           | I also think there is some value in high coverage, even
           | without assertions. At a minimum it's a guarantee that code
           | has been run before production/distribution. That's probably
           | a lot more than can be said about a lot of code.
        
         | perlgeek wrote:
         | > how did the tests miss it?
         | 
         | I'll offer another explanation: branch coverage != path
         | coverage
         | 
         | You generally cannot cover every possible code path through a
         | program, because even for n branches, the possible paths scale
         | as 2^n, and for loops it's even worse.
         | 
         | Once you have 100% branch/statement coverage in your tests, all
         | bugs that are left are those that arise from the interaction of
         | two or more statements, in a way that wasn't covered in the
         | tests.
        
           | insanitybit wrote:
           | https://www.sqlite.org/testing.html#mcdc
           | 
           | This is what sqlite means by "branch" coverage, it is in fact
           | a bit stricter.
           | 
           | You can see in this document how significant the sqlite
           | testing is, the 100% MC/DC is just one component.
        
             | taeric wrote:
             | I don't think that tracks. Consider:                   if
             | (a) {            doA();         }         if (b) {
             | doB();         }
             | 
             | With (a,b) as (true, false) and (false, true), you satisfy
             | every outcome of each decision test, but you never tested
             | "doA()" and "doB()" at the same time.
             | 
             | I think there is some interpretation of that last bullet,
             | "Each condition in a decision is shown to independently
             | affect the outcome of the decision" that is inline with the
             | idea. But that sounds more like on an statement like:
             | if (a && b)
             | 
             | that you test changing the values of a and b independently
             | to show that either one of them can cause the check to
             | pass.
        
               | insanitybit wrote:
               | By "this" I meant what I linked, not "what you just
               | said". I realize in retrospect that I was unclear. I'm
               | not sure if their approach would imply satisfying the
               | (true, true)/ (false, false) case given that they are
               | separate branches.
        
               | taeric wrote:
               | Right, but I think that is what the original poster
               | meant. There is a "path" through my example that hits
               | both "doA()" and "doB()." And having 100% branch coverage
               | doesn't necessarily get you there. Even under the
               | stricter rule that they are using.
        
               | insanitybit wrote:
               | Yep, we're on the same page.
        
               | taeric wrote:
               | Coming back to this, I understand your previous post
               | about "this" now. :D Apologies on misunderstanding that
               | the first time around!
        
         | paulmd wrote:
         | > Tests only validate known behavior. They didn't have a test
         | that exercised a large string.
         | 
         | Which is the classic thing with no-code and with Copilot-style
         | assistance: writing the function that meets the spec is easy.
         | Writing a comprehensive and correct spec is the hard part.
         | 
         | It's like thinking someone is a better programmer because they
         | type faster or have a better keyboard or whatever: the words
         | aren't the hard part, typing is maybe 10% of the time you spend
         | programming at most. It's logical correctness in composing
         | those functions, and being able to "hold the entire sculpture
         | in your mind" so to speak.
        
       | marcodiego wrote:
       | I wonder what would be the simplest/fastest kind of test that
       | could detect this flaw considering that SQLite is extremely well-
       | tested. Wouldn't mutation tests catch it?
        
       | p-e-w wrote:
       | It's bugs like this one that lead me to believe that _yes_ ,
       | "rewrite everything in Rust" _is_ in fact the solution.
       | 
       | I have the greatest respect for the quality of the SQLite
       | codebase, which is one of the best-tested open source projects in
       | existence, but at the end of the day, if formatting a string can
       | lead to arbitrary code execution, it's time to question the
       | foundations you're building on.
       | 
       | As long as C and C++ are considered acceptable languages for
       | writing mission critical software, these vulnerabilities will
       | _never_ go away. We have incredible static analysis tools and
       | hyper-advanced fuzzers driven by genetic algorithms, and yet this
       | continues to happen despite every precaution imaginable being
       | taken.
       | 
       | It's time to fix the problem at its root. Languages that are
       | memory unsafe by default need to be phased out sooner rather than
       | later.
        
         | pornel wrote:
         | As much as I love rewriting things in Rust, SQLite is of the
         | least concern. With enough effort C can be made reasonably
         | secure, and SQLite puts that effort. This particular bug wasn't
         | a low-hanging fruit.
        
         | jmull wrote:
         | Go for it?
         | 
         | The people who think this is the solution are the ones who will
         | need to take up the work.
         | 
         | It's not like the people who don't think this is the solution
         | will do it.
         | 
         | Get ready to work hard for 5, 10, 20 years... rewriting sqlite
         | will be a monumental effort and take the laser focus of a
         | strong development team for years, if not decades. It's fraught
         | with risk. Think about it... how many years would it be before
         | a stable version of sqlite based on rust was available with
         | comparable performance and fewer bugs than the c-based version?
         | 
         | (Seems likely to me that it would never happen.)
         | 
         | I think you'd be better off creating a sqlite competitor, that
         | could start off much smaller with a much smaller feature and
         | much less stringent performance requirements that,
         | nevertheless, has enough for many common cases. Of course,
         | you'd still have to convince people to use it over sqlite. If
         | your main selling point is "significantly fewer bugs than
         | sqlite" I don't think it's going to be very easy to make that
         | true, rust or not.
        
         | chmaynard wrote:
         | Just curious, what language was used to write the Rust
         | compiler?
        
           | _flux wrote:
           | Like C was used to write C and OCaml was used to write OCaml,
           | Rust is written in Rust. Mostly, anyway, the GitHub
           | repository seems to have also JavaScript, Python and Shell.
           | 
           | Though to actually get binaries out of it LLVM is used, which
           | itself is written in C++.
        
             | xxpor wrote:
             | Rust is written in Rust today, but it was bootstrapped in
             | OCaml.
        
           | nequo wrote:
           | Originally OCaml, and now Rust:
           | 
           | https://github.com/rust-lang/rust
        
           | jraph wrote:
           | Fair point, but the Rust compiler is not there at runtime. It
           | also only needs to be written once for all the Rust programs,
           | so assuming Rust is safer, you restrict the unsafety to the
           | compiler (and the unsafe constructions of Rust, which are
           | clearly marked as unsafe if memory serves).
           | 
           | (the Rust compiler is written in Rust, but the LLVM backend
           | it uses (and soon GCC, IIRC), is still written in C++)
        
           | [deleted]
        
         | 0x000xca0xfe wrote:
         | How would you embed SQLite into other programs if it were
         | written in Rust? Isn't the lack of basically all dependencies
         | the main advantage?
         | 
         | (I'm actually curious, how does dynamic linking work with Rust?
         | Can you build a .so/.dll/.h pair and ship it _everywhere_ like
         | with C?)
        
           | steveklabnik wrote:
           | You can create a dynamic library, to be linked to just like
           | any C library, from Rust. Doing so was actually some of the
           | first production uses of Rust ever.
           | 
           |  _everywhere_ depends on what you mean; rustc does have
           | support for fewer platforms than gcc. But it 's still quite a
           | few.
        
         | jeffbee wrote:
         | Don't throw C and C++ in there together. C++ would clearly have
         | prevented this particular CVE, if the author had used idiomatic
         | C++ APIs.
        
           | pjmlp wrote:
           | Yeah but that is mostly the problem with C++, and I am a big
           | C++ fanboy, too many people write C style code with C++
           | compilers, to the point that Bjarne keeps doing tireless
           | keynotes about how to write idiomatic C++.
           | 
           | Just the other day I found a relatively recent C++ tutorial
           | on YouTube using Turbo C++ for MS-DOS, imagine how idiomatic
           | it looked like.
        
         | pjmlp wrote:
         | We don't need "rewrite everything in Rust", rather "rewrite
         | everything in safe languages".
        
         | tinalumfoil wrote:
         | Sqlite is tested so thoroughly I doubt Rust's borrow checker
         | (which is essentially just another test) would benefit it:
         | https://www.sqlite.org/testing.html
         | 
         | Sqlite also has never had a severe CVE, from the parent's link:
         | 
         | > All historical vulnerabilities reported against SQLite
         | require at least one of these preconditions:
         | 
         | > The attacker can submit and run arbitrary SQL statements.
         | 
         | > The attacker can submit a maliciously crafted database file
         | to the application that the application will then open and
         | query.
         | 
         | The OP's vulnerability requires passing an insanely long string
         | to a C api, which isn't something you'd expect to be secure.
        
           | chc4 wrote:
           | I have bad news for you: both Chromium and Firefox allow any
           | page to submit arbitrary SQL statements by default, which has
           | caused several actually-for-real exploits, most of them
           | related to buffer overflows due to integer overflow or use
           | after frees, both of which are mitigated in Rust.
           | https://bugs.chromium.org/p/chromium/issues/detail?id=900910
           | is a series of security SEV:HIGH bugs in Chromium and
           | Chromecast, for example. https://bugs.chromium.org/p/chromium
           | /issues/detail?id=116060... is a bug from December 2020 that
           | is exclusively from sqlite.
           | 
           | https://www.sqlite.org/cves.html mentions this.
        
             | jcranmer wrote:
             | > both Chromium and Firefox allow any page to submit
             | arbitrary SQL statements by default
             | 
             | How do you do this in Firefox? Firefox never implemented
             | WebSQL.
        
           | p-e-w wrote:
           | It's not about the borrow checker in this case. The borrow
           | checker deals with ownership, which mostly protects against
           | thread safety issues, data races, and use-after-free.
           | 
           | Here we are dealing with a _memory_ safety problem of a type
           | that cannot occur in Rust unless you use `unsafe` blocks.
           | Rust doesn 't allow unsafe memory access such as pointer
           | arithmetic or out-of-bounds operations (those produce a
           | runtime panic). C happily accepts those, and the effects may
           | include buffer overflows leading to arbitrary code execution.
           | 
           | This should never, ever be possible when writing a normal
           | program. I love that Rust has named the keyword that enables
           | such things "unsafe", because that's precisely what it is.
           | That C has this behavior as a default is nothing short of
           | madness, though of course C is an artifact of a time when
           | such issues weren't understood as well as they are today.
        
             | schemescape wrote:
             | So if SQLite was written in Rust, this would cause a
             | runtime panic and thus would only be a denial of service
             | CVE. Is that correct?
        
               | chc4 wrote:
               | Yes. If you index out of bounds in a Rust program, it
               | will raise a panic. This is _infinitely better_ than
               | causing memory unsafety and creating a remote code
               | execution exploit, and the best you can possibly do.
        
               | p-e-w wrote:
               | If the code were entirely analogous, that is correct,
               | unless the Rust code used some kind of `unsafe` tricks in
               | order to increase performance (this is fairly common in
               | advanced Rust code).
               | 
               | That being said, whether a runtime panic leads to DoS
               | depends on the execution model. In web applications,
               | threads are usually mapped to connections in some way,
               | and a runtime panic only unwinds the thread it occurs in,
               | so this wouldn't necessarily mean that other users
               | experience service disruptions.
               | 
               | More to the point, I'd expect modern application and/or
               | library code to include default safety checks that
               | prevent scenarios like absurdly large queries from
               | happening in the first place, and provide recoverable
               | errors if such pathological inputs are encountered.
        
               | fps-hero wrote:
               | The bug relies on undefined behaviour in the C language,
               | but it's exploitable is due to the way memory access
               | operations are written in C. Equivalent Rust code would
               | not be exploitable at a language level because that type
               | of raw pointer array access isn't possible with resorting
               | to unsafe blocks. The undefined behaviour that caused the
               | bug also doesn't not exist in Rust.
        
             | PathOfEclipse wrote:
             | If your bounds checks can make your programs as much as
             | twice as slow, then I think it's worth considering that
             | bounds checking is not a panacea for low-level code:
             | https://coaxion.net/blog/2018/01/speeding-up-rgb-to-
             | grayscal...
        
               | danhor wrote:
               | If the compiler can be sure that there is no bounds
               | checking needed and can skip it. Rust provides facilities
               | for this, C and C++ less so.
               | 
               | We have seen time and time again that humans aren't good
               | enugh at that task, even in low-level code (and 2x is
               | very buch an outlier).
        
           | IshKebab wrote:
           | > The OP's vulnerability requires passing an insanely long
           | string to a C api, which isn't something you'd expect to be
           | secure.
           | 
           | Yes it is!
           | 
           | Rust would help here - not the borrow checker, but the bounds
           | checking.
        
           | oconnor663 wrote:
           | In this case Rust (or Go or Java or almost anything other
           | than C/C++) would've caught this error with array/slice
           | bounds checks. The borrow checker is usually more concerned
           | with things like use-after-free, where other languages would
           | lean on garbage collection.
        
             | 0x457 wrote:
             | Nah, in Rust out-of-bound index would be a panic, which is
             | better than RCE.
             | 
             | I don't think SQLite needs to be rewritten in any language,
             | tho.
        
               | oconnor663 wrote:
               | Yeah to be clear by "caught" I mean "crashed cleanly at
               | runtime".
        
           | _flux wrote:
           | > Sqlite is tested so thoroughly I doubt Rust's borrow
           | checker (which is essentially just another test) would
           | benefit it
           | 
           | You may as well be correct about not helping here, but borrow
           | checker provides proofs, not tests; a test can only prove the
           | presence of a bug, while a proof can prove the lack of a
           | certain kind of bug.
        
             | Ultimatt wrote:
             | Until someone exploits the logic of the borrow checker
             | leaving all Rust compiled programs vulnerable...
        
               | oconnor663 wrote:
               | You might be under the impression that the borrow checker
               | is executing at runtime in a compiled program. But it's a
               | purely compile-time thing, like a type checker.
        
               | p-e-w wrote:
               | Sure. But that bug only needs to be fixed _once,_ whereas
               | bugs like the one described in the link need to be fixed
               | again and again and again and again and...
        
           | counttheforks wrote:
           | Isn't it? It might be naive, but I would not expect that:
           | 
           | 1. Accepting user input on a webpage (e.g. a first_name field
           | on a singup form)
           | 
           | 2. Storing that user input in a database
           | 
           | Would trigger a RCE if the string was escaped properly.
        
         | ghoward wrote:
         | I said this before [1], but one severe CVE in 20 years is no
         | reason to rewrite SQLite in a language that has no compiler for
         | some of the platforms it is deployed on.
         | 
         | Sure, write _new_ stuff in Rust, but SQLite needs to stay in C
         | at this point.
         | 
         | [1]: https://news.ycombinator.com/item?id=33331841
        
           | zimpenfish wrote:
           | > no reason to rewrite SQLite in [Rust]
           | 
           | I would like them to try, though, because I suspect, barring
           | a Bellard Event, no-one will be able to get close to SQLite's
           | functionality[1], reliability, etc., in a new Rust
           | implementation within 5 years (possibly even 10 years.)
           | 
           | [1] including being able to load extensions, integrate into
           | basically everything as library, etc.
        
         | astrobe_ wrote:
         | I wish people would abstain from that sort of insipid comments
         | and offer instead suggestions about how to replace 40+ years of
         | programming (that's just for C, BTW) and _teralines_ of code in
         | less than a century, and who 's gonna pay for this. Because
         | that's the crux.
        
           | p-e-w wrote:
           | The stakeholders are going to pay for it. It's not an
           | accident that companies like Amazon and Microsoft are betting
           | big on Rust. The cost of code quality management and
           | vulnerability mitigation is already astronomical. If Rust can
           | lessen that burden even a little (and it certainly can), the
           | effort of rewriting core technologies quickly pays for
           | itself.
           | 
           | And most of those "teralines" of code are legacy garbage. The
           | core OS, a basic userland, a database and a webserver are 90%
           | of the world's critical IT infrastructure. That doesn't take
           | a century to rewrite.
        
             | nottorp wrote:
             | > a database and a webserver
             | 
             | Well I have this native app fetish. I mean why pay for 4 G
             | ram for each electron app?
             | 
             | No, there is more legacy code than you list there.
        
             | pclmulqdq wrote:
             | For most of the world, though, that code is decades old,
             | with decades of testing and battle-hardening.
             | 
             | You will lose all of that in a rewrite, which is why people
             | don't do it.
        
               | jjnoakes wrote:
               | I'm not arguing for or against, but:
               | 
               | > You will lose all of that in a rewrite
               | 
               | I don't think you lose everything in a rewrite in general
               | - you still have all of the edge cases and business logic
               | in the old code that you translate to the new code. It's
               | not like you are implementing whatever it is from
               | scratch.
               | 
               | Sure, there are opportunities for the translation to
               | introduce regressions, but that's a vastly different
               | argument (and I think a much more minor downside).
        
         | [deleted]
        
       | j0hnyl wrote:
       | It's nice to see such comprehensive research that covers
       | seemingly every angle a bug, its context, and history.
        
       | russellbeattie wrote:
       | The root of this problem seems to be the widespread availability
       | of 64-bit CPUs.
       | 
       | >= _When this code was first written, most processors had 32-bit
       | registers and 4GB of addressable memory, so allocating 1GB
       | strings as input was impractical. Now that 64-bit processors are
       | quite common, allocating such large strings is feasible and the
       | vulnerable conditions are reachable._
       | 
       | One has to wonder how much code written in the past 30 years has
       | safety checks assuming 32-bit processors which now have
       | vulnerabilities like this one. I'd bet quite a lot.
        
       | three14 wrote:
       | Quick turnaround!
       | 
       | July 14, 2022: Reported vulnerability to the Computer Emergency
       | Response Team (CERT) Coordination Center.
       | 
       | July 15, 2022: CERT/CC reported vulnerability to SQLite
       | maintainers.
       | 
       | July 18, 2022: SQLite maintainers confirmed the vulnerability and
       | fixed it in source code.
       | 
       | July 21, 2022: SQLite maintainers released SQLite version 3.39.2
       | with fix.
        
         | [deleted]
        
       | sgt wrote:
       | Impressive work
        
       | veltas wrote:
       | > 100% branch coverage says that every line of code has been
       | executed, but not how many times
       | 
       | Actually, 100% branch coverage says every line has been executed
       | but also that every branch has been both taken and not taken. So
       | e.g. every non-constant if statement is both taken and not taken.
        
         | IshKebab wrote:
         | It's only possible to execute all lines if you take/don't take
         | every if branch. They're the same.
        
           | rhdunn wrote:
           | An if statement without an else can have the case where line
           | coverage does not cover the "don't take the branch" part of
           | branch coverage. Therefore, if not taking that branch results
           | in a bug line coverage only will not pick up that bug. -- For
           | example, if you initialize a value in an if statement's then
           | branch but don't have a default value for it or an else
           | branch to provide a default.
           | 
           | Or even if you have a ternary operator (which has an implicit
           | if-then-else take/don't take branch) that is usually only one
           | line.
        
           | mort96 wrote:
           | No? Take this code:                   void foo(int arg) {
           | int *x = malloc(sizeof(int));             if (arg) {
           | free(x);             }         }
           | 
           | If your test only ever calls `foo` with a non-zero `arg`, you
           | have covered 100% of the lines in the function, but you never
           | discover the memory leak bug in the case where `if` body
           | isn't executed.
        
         | XCSme wrote:
         | So also all combination of branches should be tested, right?
         | Doesn't that lead to exponential test-case growth?
        
           | SCHiM wrote:
           | No so that's exactly it.
           | 
           | Not all combinations are tested, all branches are both taken
           | and not taken, but not all combinations of taken/not taken
           | are tested.
        
       | harryvederci wrote:
       | "This bug is an array-bounds overflow. The bug is only accessible
       | when using some of the C-language APIs provided by SQLite. The
       | bug cannot be reached using SQL nor can it be reached by
       | providing SQLite with a corrupt database file. The bug only comes
       | up when very long string inputs (greater than 2 billion bytes in
       | length) are provided as arguments to a few specific C-language
       | interfaces, and even then only under special circumstances."
       | 
       | Source:
       | https://www.sqlite.org/cves.html#status_of_recent_sqlite_cve...
        
         | chmaynard wrote:
         | > The bug is only accessible when using some of the C-language
         | APIs provided by SQLite.
         | 
         | I assume the SQLite CLI (sqlite3) uses these APIs. Could the
         | bug be triggered indirectly by input to sqlite3? The article
         | doesn't appear to address this question.
        
           | torstenvl wrote:
           | [EDIT: https://sqlite.org/forum/forumpost/3607259d3c appears
           | to be only one way this bug can rear its head, not the core
           | of the bug itself]
        
             | ferbivore wrote:
             | That's a different bug.
        
               | torstenvl wrote:
               | You're right, my mistake
        
           | SQLite wrote:
           | The CLI does use the relevant APIs, however I don't know of a
           | way to reach this bug using a script input to the CLI. If
           | there is such a path that I don't know about, it seems like
           | it would require at least a 2GB input script.
        
             | _the_inflator wrote:
             | This, and also there is the saying: "Never trust the
             | frontend"
             | 
             | So in any case a string length check in the Backend should
             | help here. Also if you get strings from a Browser Frontend,
             | it seems unlikely you could create a payload of 2GB due to
             | several reasons, one being string length limitations on the
             | Browser side:
             | https://stackoverflow.com/questions/34957890/javascript-
             | stri...
        
           | [deleted]
        
         | efficax wrote:
         | ah looks like they had 32 bit assumptions that broke on 64 bit
         | chips.
        
           | zoomablemind wrote:
           | I vagely remember that there were a few more places in SQLite
           | codebase which could benefit from proper use of size_t type.
           | Catching increment overflows is not that straightforward,
           | perhaps some review specifically for such cases may help
           | SQLite get even more robust.
        
           | remram wrote:
           | ... in one internal function. It seems that the assumptions
           | totally hold for the library as a whole.
        
         | [deleted]
        
       | jeffbee wrote:
       | When using respectable language, is it guaranteed that these
       | types of overflows are impossible? Suppose for example I call
       | C++'s std::string::string(const char*) ... is it fundamentally
       | impossible for it to fail in this way, or is it just UB if you
       | pass it a huge string?
        
         | LegionMammal978 wrote:
         | The constructor is guaranteed to either construct a valid
         | `std::string` or throw an `std::bad_alloc` exception if the
         | allocation fails. If there were a size limit past which calling
         | the function is undefined, that would have to be noted in the
         | function's documentation.
        
           | jeffbee wrote:
           | Honestly I don't see anything in the chapter and verse that
           | supports your argument. It says that the constructor has a
           | precondition that [s, s+traits::length(s)) is a valid range,
           | which throws just a little uncertainty over the question
           | without resolving it. It seems easy to create a range that is
           | not valid (consider a very large starting position s, for
           | example).
        
             | leni536 wrote:
             | That basically means that s+traits::length(s) must be
             | defined behavior. If s is not a pointer to a 0-terminated
             | string then the behavior of this constructor is undefined.
        
       | josefx wrote:
       | C strings, not even once.
        
       ___________________________________________________________________
       (page generated 2022-10-25 23:00 UTC)