[HN Gopher] How Does Sqlite Work? (2014)
       ___________________________________________________________________
        
       How Does Sqlite Work? (2014)
        
       Author : searchableguy
       Score  : 267 points
       Date   : 2020-06-27 17:17 UTC (5 hours ago)
        
 (HTM) web link (jvns.ca)
 (TXT) w3m dump (jvns.ca)
        
       | dang wrote:
       | Discussed at the time:
       | https://news.ycombinator.com/item?id=8378894
        
       | gtrubetskoy wrote:
       | The details of how the data is stored are described in this 1979
       | paper by Douglas Comer: "The Ubiquitous B-Tree" [1]. Since it was
       | invented, much hasn't really changed: practically all relational
       | databases use some variation of this structure. I wrote about it
       | in [2], which might help explain some aspects of how it works.
       | 
       | [1] https://dl.acm.org/doi/10.1145/356770.356776
       | 
       | [2] https://grisha.org/blog/2013/05/11/relational-database-on-
       | to...
       | 
       | Edit: also [3] is about how to store SQLite in a Redis hash
       | (rather than on-disk file).
       | 
       | [3] https://grisha.org/blog/2013/05/29/sqlite-db-stored-in-a-
       | red...
        
       | oscargrouch wrote:
       | > I'm still not super clear on how exactly the interior pages are
       | structured and how the pointers to their child pages work. It's
       | time to sleep now, but perhaps that will happen another day!
       | 
       | Luckily. The SQLite source code is very well documented, and the
       | 'btreeInt.h' has something about how the btree pointers are laid
       | out
       | 
       | I happen to use the SQLite btree directly without the SQL layer
       | (the VDBE) as a key-value store on top of a userspace filesystem
       | (chrome cachefs) and even with this configuration it works
       | wonderfully.                  ** The basic idea is that each page
       | of the file contains N database        ** entries and N+1
       | pointers to subpages.        **        **
       | ---------------------------------------------------
       | -------------        **   |  Ptr(0) | Key(0) | Ptr(1) | Key(1) |
       | ... | Key(N-1) | Ptr(N) |        **
       | ----------------------------------------------------------------
       | **        ** All of the keys on the page that Ptr(0) points to
       | have values less        ** than Key(0).  All of the keys on page
       | Ptr(1) and its subpages have        ** values greater than Key(0)
       | and less than Key(1).  All of the keys        ** on Ptr(N) and
       | its subpages have values greater than Key(N-1).  And        ** so
       | forth.        **        ** Finding a particular key requires
       | reading O(log(M)) pages from the         ** disk where M is the
       | number of entries in the tree.        **        ** In this
       | implementation, a single file can hold one or more separate
       | ** BTrees.  Each BTree is identified by the index of its root
       | page.  The        ** key and data for any entry are combined to
       | form the "payload".  A        ** fixed amount of payload can be
       | carried directly on the database        ** page.  If the payload
       | is larger than the preset amount then surplus        ** bytes are
       | stored on overflow pages.  The payload for an entry        ** and
       | the preceding pointer are combined to form a "Cell".  Each
       | ** page has a small header which contains the Ptr(N) pointer and
       | other        ** information such as the size of key and data.
       | 
       | I guess if she go straight to use the btree layer, bypassing the
       | SQL layer, it will be easier to figure it out how the pages and
       | pointers works and relate to each other.
       | 
       | After this, try it out with the SQL and see how it works.
       | 
       | ---
       | 
       | Edit: just to add to this, this line here
       | 
       | > Each BTree is identified by the index of its root page
       | 
       | When you call 'sqliteBtreeCreateTable()' it will return to you
       | the root page of the table it creates. In my experience it starts
       | from 2, and my guess is that 1 is probably some master root page
       | (i bet Julia will dig into this later).
       | 
       | This is how i've manage to have "keyspaces" here. i use the first
       | created table as a meta-table, keeping the index of keyspace
       | names with their root page number counterparts, and on opening
       | the db, is just a matter of recovering this info and put it in a
       | hash table (keyspace => page number) for later use.
        
         | dunham wrote:
         | Yeah, if I remember right the first block is the root of the
         | master table, listing all the other tables. (Name, schema, root
         | block id.)
        
       | zoomablemind wrote:
       | The SQLite code is nicely split into modules, and builds together
       | reatively quickly. So the easier way to explore the codebase is
       | to load into editor the individual modules at src/ rather than
       | the aggregated sqlite3.c file which is too big and unmanageable
       | in the editor.
       | 
       | IIRC, the heart of the db engine appears to be some sort of vm
       | that processes the parsed out SQL. Stepping through in debugger
       | will inevitably capture you in the loop which executes the
       | statements.
       | 
       | Of course, there're plentiful design docs, all on
       | https://sqlite.org.
        
         | bntyhntr wrote:
         | My c is pretty limited and I was able to hack in support for
         | table prefixed columns in result sets over a night or two from
         | never seeing the code base before. It was fun to work with!
        
         | nurettin wrote:
         | they also provide an amalgamation header which is generated as
         | part of their build process, so that you can simply add a .h
         | and .c file to your project and have the whole sqlite goodness
         | linked directly into your project.
        
       | ed25519FUUU wrote:
       | > _May you do good and not evil._
       | 
       | > _May you find forgiveness for yourself and forgive others._
       | 
       | > _May you share freely, never taking more than you give._
       | 
       | If only more software engineers started out their projects with
       | this mindset, the world would be a very different place.
        
         | helloiloveyou wrote:
         | This part impacted me a lot really, almost teared up. This is a
         | truly important part of how SQLite works
        
           | ed25519FUUU wrote:
           | Me too! I think there's a place for spiritual/meditate
           | mindset in our work.
        
         | stormdennis wrote:
         | Sounds like an extract from the Our Father.
        
         | mosselman wrote:
         | Why is this down voted?
         | 
         | > If only more software engineers started out their projects
         | with this mindset, the world would be a very different place.
         | 
         | Apparently this is a horrible thing to say?
        
           | edoceo wrote:
           | Not sure about the comment you replied to (is that called
           | GP?) but I think you've missed this guidelines:
           | 
           | "Please don't comment about the voting on comments. It never
           | does any good, and it makes boring reading."
        
           | [deleted]
        
       | tekstar wrote:
       | Years ago I spent an evening at Recurse Center (at the time
       | Hacker School) in NYC. My employer sent me with some recruiters
       | to chat with participants about hiring possibilities. It was just
       | so cool.
       | 
       | That night I met Julia (she and a friend were, if I remember
       | correctly, trying to boot a handwritten kernel in a VM and
       | figured it out around 11pm that only the first 300 bytes were
       | getting loaded into memory).
       | 
       | I met Andrew Kelley, who had recently written an NES emulator
       | that translated the code to LLVM (if I remember correctly).
       | 
       | I briefly met Filippo Valsorda. I think he was the person who,
       | after I said I wanted to make a bitcoin arbitrage bot,
       | immediately came up with putting the prices into a matrix and
       | then computing optimal paths.
       | 
       | The place just seemed.. magical. I have a family and kids in
       | Canada but i thought a long time about trying to reorient my life
       | to NYC for a bit. It was just so cool.
       | 
       | It reminded me of the joy of learning for the sake of learning,
       | hacking stuff because it feels like a magic superpower to bring
       | ideas to life, and the joy of building things you have no
       | intention on trying to make money off of.
       | 
       | I'm certain none of those people remember me but I remember the
       | evening fondly whenever I see any of them show up on HN. Cheers
       | :)
        
         | forgotmypw17 wrote:
         | I have a lot of respect for Recurse Center, and am a fan of the
         | work that comes out of it. However, I feel that it is a very
         | exclusive place, and if you are not good enough, you're just
         | not welcome.
         | 
         | I applied for an internship there a couple years ago. After
         | writing an essay on why I want to do RC, and an interview
         | where, IIRC, I shared my camera but my interviewers did not, I
         | was rejected with a boilerplate "not for us" explanation, and
         | that was that.
         | 
         | I would've felt much better about it if they had at least
         | invited me to visit, but this experience gave it a very
         | corporate and impersonal feel.
        
           | epoch_100 wrote:
           | > I was rejected with a boilerplate "not for us" explanation,
           | and that was that.
           | 
           | For what it's worth, RC has struggled with the question of
           | whether to give personalized feedback to applicants who don't
           | get in. They did it for awhile, but stopped for a number of
           | reasons. They explain it here:
           | https://www.recurse.com/feedback
        
             | gentryb wrote:
             | While I may question their reasoning, I respect them for
             | putting it out there. Ultimately totally turned off to this
             | based on the elitism crap. I did enjoy the article quite a
             | bit though.
        
           | 4gotunameagain wrote:
           | To be honest, elitism in such places keeps them awesome. Not
           | everything is for everyone, despite how much modern movements
           | want to believe so.
        
             | [deleted]
        
             | forgotmypw17 wrote:
             | I think the strategy you describe works to an extent, but
             | eventually leads to echo chamber effect.
             | 
             | I think, perhaps arrogantly, that RC would have benefited
             | from my pollination.
             | 
             | We'll both be fine on our own, too.
        
             | akavel wrote:
             | _[citation needed]_
        
               | daseiner1 wrote:
               | tragedy of the commons
        
           | ponker wrote:
           | There are places where passion isn't enough, only performance
           | is what counts. The NBA, Recurse Center, Navy Seals, etc.
        
             | mi_lk wrote:
             | I get why you're downvoted, but it's not inaccurate
        
               | peruvian wrote:
               | I can't speak for the NBA or Navy, but it's not true for
               | RC. I was a pretty meh developer when I got in 2016.
               | Technical questions in the interviews are a magnitude
               | simpler than any job interview I've ever had.
               | 
               | Most people don't have the skills Julia and the other
               | people the original post mentioned. They're honestly the
               | 1% of RC's over 1k (I think!) students over the years.
               | Most of us have little projects that we work on for 12
               | weeks in a great environment or learn new things. Most of
               | us are normal people who are lucky enough to be able to
               | study programming for 12 weeks in NYC.
        
           | mattgreenrocks wrote:
           | I think the danger here is thinking that RC is the only place
           | that cool people reside.
           | 
           | RC has excellent marketing. There needs to be more places
           | like it rather than everyone thinking that there can only be
           | one place that everyone tries to get into.
        
             | quadrifoliate wrote:
             | This. Why are there not more places (or for that matter,
             | workplaces) like RC?
        
           | peruvian wrote:
           | > However, I feel that it is a very exclusive place, and if
           | you are not good enough, you're just not welcome.
           | 
           | What do you mean by this?
           | 
           | RC is a company (a YC company actually) that leases expensive
           | NYC real estate and bears all the other costs of running a
           | company while providing its service for free. Unfortunately,
           | it's not a hangout where anyone can join and thus they have
           | to limit space.
           | 
           | As for "if you are not good enough"... all the tech interview
           | questions (at least in 2016) are easier or simpler than real
           | job interview questions. The important parts are the soft
           | skills questions. I "studied" with PhDs, bootcamp grads, and
           | college students. And personally, I'm a pretty meh developer.
           | But just like any company, they don't always give the best
           | feedback on rejections.
        
       | seemslegit wrote:
       | Quite well actually
        
       | Animats wrote:
       | Lookup is easy. Update is hard. Sqlite does quite well at update,
       | given the limitations it works under. Multiple instances can
       | update the same database while maintaining strong consistency,
       | yet there's no common server process that can see all the
       | transactions. This costs you some speed in updating (don't do
       | your web service this way) but it's pretty good for most low-
       | write-traffic use cases.
        
       | justinclift wrote:
       | Part 2: https://jvns.ca/blog/2014/10/02/how-does-sqlite-work-
       | part-2-...
        
       | ramoz wrote:
       | I encourage anyone interested in this to also explore embedded
       | dbs (leveldb, badger, lmdb, rocks, etc)
        
         | refset wrote:
         | Embedded KV stores are as comparable to SQLite as Cassandra is
         | to Postgres (i.e. lacking joins or other data modelling
         | essentials), but I agree that embedding a database directly
         | inside your app is interesting and opens up a world of
         | possibilities for efficiently blending N+1 code and queries. It
         | gets even more interesting to take it a step further and embed
         | a networked-replica of a database a la Datomic Peers.
        
           | dfox wrote:
           | It was probably meant as a learning exercise. And for me it
           | is that implementation details of the KV/heap layer with some
           | meaningful transactional properties seem more interesting
           | than how to build SQL/whatever database on top of that, which
           | in comparison looks like mostly obvious but tedious SMOP.
        
         | searchableguy wrote:
         | Check out sled too which is written in rust. -
         | https://github.com/spacejam/sled
        
       | twic wrote:
       | Anyone looked at DuckDB [1]? Like SQLite, it's an embedded
       | relational database with a C API. In fact, it has a wrapper which
       | presents the same API as SQLite [2]. But it's optimised for
       | analytical rather than transactional work - a few huge queries
       | rather than lots of little reads and writes.
       | 
       | [1] https://duckdb.org/
       | 
       | [2]
       | https://github.com/cwida/duckdb/tree/master/tools/sqlite3_ap...
        
         | iagovar wrote:
         | So I guess I can use SQLite connectors or GUI to play with it?
         | I was just looking for this.
         | 
         | Not a dev so I'm a bit dumb with this stuff, but I was
         | definitely looking for pivoting stuff and such, which requires
         | a lot of extra steps with sqlite.
        
         | mytherin wrote:
         | DuckDB offers a SQLite compatible interface; we use it
         | extensively ourselves as it allows us to use the SQLite shell
         | directly.
         | 
         | The internal design of DuckDB, however, is different and not
         | directly related to SQLite. As you mentioned, it is entirely
         | optimized for analytical workloads whereas SQLite is optimized
         | for point-queries and point-updates. The main things we share
         | is that we also have a single-file storage format and offer an
         | amalgamation (i.e. single-source file compilation, duckdb.cpp
         | and duckdb.hpp).
        
       ___________________________________________________________________
       (page generated 2020-06-27 23:00 UTC)