[HN Gopher] SQLite: Past, Present, and Future
       ___________________________________________________________________
        
       SQLite: Past, Present, and Future
        
       Author : chrstr
       Score  : 199 points
       Date   : 2022-09-01 13:20 UTC (9 hours ago)
        
 (HTM) web link (vldb.org)
 (TXT) w3m dump (vldb.org)
        
       | Thaxll wrote:
       | "While it continues to be the most widely used database engine in
       | the world"
       | 
       | It realy depends what do you mean by that, yes it's shipping in
       | every phones and browser, but I don't consider that as a
       | database. Is the windows registry a database?
       | 
       | Oracle, MySQL, PG, MSSQL are the most widly used DB in the world,
       | the web runs on those not SQLite.
        
         | adamrezich wrote:
         | there are far, far more sqlite instances than Windows Registry
         | instances in the world.
        
       | dang wrote:
       | The pdf: https://www.vldb.org/pvldb/vol15/p3535-gaffney.pdf
        
       | gorjusborg wrote:
       | I came for SQLite, got sold DuckDB.
        
       | rafale wrote:
       | SQLite vs Postgres for a local database (on disk, not over the
       | network): who wins? (Each in their most performance oriented
       | configuration)
        
         | lvass wrote:
         | SQLite. The most performant configuration is unsuited to most
         | usage, and may lead to database corruption on a system crash.
        
           | rafale wrote:
           | Should have said the most performance oriented setting that's
           | also safe from data corruption.
        
             | lvass wrote:
             | Then it depends on the usage. You'd likely need to run with
             | synchronous mode on, and even on WAL, multiple separate
             | write transactions is a issue. If you don't have many
             | writes or buffer them into not many transactions, SQLite is
             | the most performant.
        
         | bob1029 wrote:
         | >most performance oriented configuration
         | 
         | I am 99% sure SQLite is going to win unless you actually care
         | about data durability at power loss time. Even if you do, I
         | feel I could defeat Postgres on equal terms if you permit me
         | access to certain ring-buffer-style, micro-batching, inter-
         | thread communication primitives.
         | 
         | Sqlite is not great at dealing with a gigantic wall of
         | concurrent requests out of the box, but using a little bit of
         | innovation in front of SQLite can solve this problem quite
         | well. The key is resolve the write contention outside of the
         | lock that is baked into the SQLite connection. Writing batches
         | to SQLite on a single connection with WAL turned on and Sync
         | set to normal is pretty much like operating at line speed with
         | your IO subsystem.
        
           | [deleted]
        
         | ergocoder wrote:
         | Functionality-wise, SQLite's dialect is really lacking...
        
           | bob1029 wrote:
           | The entire point is to bring your own functions to SQLite,
           | since it is presumably running in-proc and can be integrated
           | with trivially.
           | 
           | https://sqlite.org/appfunc.html
           | 
           | We currently use this path to offer a domain-specific SQL-
           | based scripting language for our product.
        
           | simonw wrote:
           | Is it the SQL dialect there lacking or is it the built-in
           | functions?
           | 
           | I agree that SQLite default functionality is very thin
           | compared to PostgreSQL - especially with respect to things
           | like date manipulation - but you can extend it with more SQL
           | functions (and table-valued functions) very easily.
        
             | ergocoder wrote:
             | Depends on what easily means.
             | 
             | Sqlite can't do custom format date parsing and regex
             | extract. How do we extend something like this?
             | 
             | If we go beyond a simple function to window function, I
             | imagine it would be even harder.
             | 
             | At this point, we nlmight as well use postgres.
        
               | simonw wrote:
               | Funny you should mention those specific examples - I have
               | Datasette plugins adding custom SQL functions to SQLite
               | for both of them!
               | 
               | - https://datasette.io/plugins/datasette-dateutil
               | 
               | - https://datasette.io/plugins/datasette-rure
        
               | polyrand wrote:
               | Adding user-defined functions to SQLite is not difficult,
               | and the mechanism is quite flexible. You can create
               | extensions and load them when you create the SQLite
               | connection to have the functions available in queries. I
               | wrote a blog post explaining how to do that using Rust,
               | and the example is precisely a `regex_extract` function
               | [0].
               | 
               | If you need them, you also have a "stdlib" implemented
               | for Go [1] and a pretty extensive collection of
               | extensions [2]
               | 
               | [0]: https://ricardoanderegg.com/posts/extending-sqlite-
               | with-rust...
               | 
               | [1]: https://github.com/multiprocessio/go-sqlite3-stdlib
               | 
               | [2]: https://github.com/nalgeon/sqlean
        
               | ergocoder wrote:
               | Wow this is helpful. I'm using sqlite for some of my
               | projects and always bothered that some functions are
               | missing. WITH RECURSIVE is too mind bending.
               | 
               | This seems like I can add a lot more functions to it, not
               | just regex extract.
               | 
               | Came here to complain and learned something useful.
        
               | Groxx wrote:
               | Probably also worth noting: you don't need to build (many
               | kinds of) extensions as C-compatible code and separate
               | .so files that you load.
               | 
               | SQLite is an _in-process_ database. You can give it a
               | _callback func_ to execute. So your regex-extract can
               | literally just call a function in your code:
               | https://sqlite.org/appfunc.html
               | 
               | edit: Python's stdlib documentation concisely shows how
               | easy this can be: https://docs.python.org/3/library/sqlit
               | e3.html#sqlite3.Conne... Basically every SQLite library
               | should have something similar. This extreme ease of
               | extending is a big part of why SQLite has so little
               | built-in.
        
             | eatonphil wrote:
             | I like SQLite (qualifying not for you, simonw, but for
             | others). But I hate that I can't be lazy by using arrays in
             | SQLite... because they don't exist. group_concat is a poor
             | approximation.
             | 
             | Also, I genuinely dislike how loose SQLite is with allowed
             | syntax. Probably it's preference. But even interactively I
             | prefer to know immediately that I messed up a query. SQLite
             | is so forgiving I've often wasted time trying to understand
             | why my results are nonsense (because I typoed in the query
             | and SQLite didn't fail the query).
             | 
             | But I also strongly dislike Python for that reason and I
             | know where you stand there. Maybe SQLite/PostgreSQL is
             | similar to the dynamic/static language preference divide.
        
               | simonw wrote:
               | I'm increasingly finding myself using SQLite's JSON
               | features for array stuff - they're surprisingly
               | convenient once you get into the habit of using them. A
               | couple of recent examples:
               | 
               | - https://til.simonwillison.net/sqlite/sort-by-number-of-
               | json-...
               | 
               | - https://til.simonwillison.net/sqlite/function-
               | list#user-cont...
        
         | nikeee wrote:
         | The documentation offers some advice on this:
         | 
         | https://www.sqlite.org/whentouse.html
        
         | thomascgalvin wrote:
         | This is basically the exact use case SQLite was designed for;
         | PostgreSQL is a marvel, and at the end of the day presents a
         | much more robust RDBMS, but it's never going to beat SQLite _at
         | the thing SQLite was designed for_.
        
         | samatman wrote:
         | Postgres obviously.
         | 
         | Sorry, just thought I'd buck the trend and assume a very write-
         | heavy workload with like 64 cores.
         | 
         | If you don't have significant write contention, SQLite every
         | time.
        
           | dinosaurdynasty wrote:
           | If you can have one "database" thread and 63 "worker"
           | threads, send messages back and forth, and don't hold open
           | transactions, this would probably work with sqlite. Aka treat
           | sqlite like redis.
        
           | innocenat wrote:
           | Where is write contention coming from if it's operated
           | locally?
        
             | d3nj4l wrote:
             | ... you can get tons of requests on a server?
        
               | dinosaurdynasty wrote:
               | Redis has the same limitation (only one transaction at a
               | time) and is used a lot for webapps. It solves this by
               | requiring full transactions up front. The ideal case for
               | sqlite for performance is to have only a single
               | process/thread directly interacting with the database and
               | having other process/threads send messages to and from
               | the database process.
        
             | Thaxll wrote:
             | SQLite is "single" threaded for writes.
        
           | ledgerdev wrote:
           | Here's sqlite doing 100 million inserts in 33 seconds which
           | should fit into nearly every workload, though it is batched.
           | https://avi.im/blag/2021/fast-sqlite-inserts/
           | 
           | So write contention from multiple connections is what you're
           | talking about, versus a single process using sqlite?
        
             | endisneigh wrote:
             | No durability guarantee is a showstopper for any serious
             | use case
        
               | axelthegerman wrote:
               | Not sure what you mean by durability. Sqlite has WAL that
               | can be replicated (see litestream)
        
               | endisneigh wrote:
               | https://en.m.wikipedia.org/wiki/Durability_(database_syst
               | ems...
        
               | dinosaurdynasty wrote:
               | https://www.sqlite.org/atomiccommit.html
               | 
               | sqlite is as good at durability as any non-replicated
               | database, though you can configure it to be non-durable
               | (most other databases too tbf).
               | 
               | https://www.sqlite.org/pragma.html#pragma_synchronous
               | 
               | By default WAL mode can rollback committed transactions
               | in cases of power failure, but you can do `PRAGMA
               | synchronous = FULL` to trade speed for durability.
        
               | endisneigh wrote:
               | I'm talking about the post I originally commented on.
               | Things were disabled so durability is not guaranteed.
        
               | dinosaurdynasty wrote:
               | Yeah that's not great
        
               | ledgerdev wrote:
               | If it's good enough for avionics and nuclear subs, it's
               | probably good enough for most web apps.
        
               | endisneigh wrote:
               | Web apps do more concurrent writes than subs, plus you
               | can configure SQLite for more durability
        
               | sph wrote:
               | I don't have the data for subs, but there's web app and
               | web app. No one is talking about using SQLite for 5k
               | queries/s.
               | 
               | It might work, but I reckon 90% of web applications live
               | beneath this relatively small threshold and 80% probably
               | don't even reach 50 q/s.
        
             | lvass wrote:
             | Keyword here is transactions, not processes. You can model
             | any workload to be transaction-efficient, but it might not
             | be easy.
        
             | samatman wrote:
             | That's correct, I meant the many cores to allude to many
             | processes.
        
         | RedShift1 wrote:
         | SQLite is always going to win in that category just from the
         | fact that there are less layers of code to be worked through to
         | execute a query.
        
           | remram wrote:
           | Latency-wise _maybe_ , but throughput can be more important
           | for a lot of applications or bigger databases.
           | 
           | I say "maybe" because even there, SQLite is much more limited
           | in terms of query-planning (very simple statistics) and the
           | use of multiple indexes.
           | 
           | That's assuming we're talking about reads, PostgreSQL will
           | win for write-heavy workloads.
        
           | sophacles wrote:
           | > just from the fact that there are less layers of code to be
           | worked through
           | 
           | This is not an invariant. I've seen be true, and I've seen it
           | be false. Sometimes that extra code is just cruft yes. Other
           | times though it is worth it to set up your data (or whatever)
           | to take advantage of mechanical sympathies in hot paths, or
           | filter the data before the expensive processing step, etc.
        
             | RedShift1 wrote:
             | I'm not talking about extra code, I'm talking about
             | _layers_ of code. With PostgreSQL you're still sending data
             | over TCP/IP or a UNIX socket, and are copying things around
             | in memory. Compare that to SQLite that runs in the memory
             | space of the program, thus no need for copying and socket
             | traffic. There's just less middlemen (middlepersons?) with
             | SQLite that are unavoidable with PostgreSQL. So less layers
             | = less
             | interpreting/serialization/deserialization/copying/... =
             | higher performance. I will even argue that even if the
             | SQLite query engine is slightly less efficient than
             | PostgreSQL, you're still winning because of less memory
             | copying going around.
        
               | fuckstick wrote:
               | > less
               | interpreting/serialization/deserialization/copying/... =
               | higher performance
               | 
               | Unfortunately for many database workloads you are
               | overestimating the relative cost of this factor.
               | 
               | > even if the SQLite query engine is slightly less
               | efficient than PostgreSQL
               | 
               | And this is absurd - the postgresql query engine isn't
               | just "slightly" more efficient. It is tremendously more
               | sophisticated. People using a SQL datastore as a
               | glorified key-value store are not going to notice - which
               | seems to be a large percentage of the sqlite install
               | base. It's not really a fair comparison.
        
               | ok_dad wrote:
               | With SQLite, though, you could reasonably just skip doing
               | fancy joins and do everything in tiny queries in tight
               | loops because SQLite is literally embedded in your app's
               | code. You can be careless with SQLite in ways you cannot
               | with a monolithic database server because of that reason.
               | I still agree there are use cases where a centralized
               | database is better, but SQLite is a strange beast that
               | needs a special diet to perform best.
        
           | electroly wrote:
           | As long as you turn it into a throughput race instead of a
           | latency race, PostgreSQL can definitely win. SQLite has a
           | primitive query builder and a limited selection of query
           | execution steps to choose from. For instance, all joins in
           | SQLite are inner loop joins. It can't do hash or merge joins.
           | It can't do GIN or columnstore indexes. If a query needs
           | those things, PostgreSQL can provide them and can beat
           | SQLite.
        
             | ac2u wrote:
             | out of interest, what columnstore indexes are available to
             | postgres? Would be happy to find out that I'm missing
             | something.
             | 
             | I know citus can provide columnar tables but I can't find
             | columnar indexes for regular row-based tables in their
             | docs. (use case of keeping an OLTP table but wanting to
             | speed up a tiny subset of queries)
             | 
             | Closest thing I could find was Swarm64 for columnar indexes
             | but it doesn't seem to be available anymore.
        
       | Kalanos wrote:
       | i wish it had an optional server for more concurrent and
       | networked transactions in the cloud
        
         | jjtheblunt wrote:
         | you could make one pretty easily, no?
        
           | axelthegerman wrote:
           | I'd like to see that. I also think the single write situation
           | is not great for web applications, but I don't see an easy
           | way around it without sacrificing things like consistency
        
             | dinosaurdynasty wrote:
             | If you can treat sqlite transactions like redis
             | transactions (send the entire transaction up front) it can
             | work.
        
             | bob1029 wrote:
             | See: LMAX Disruptor and friends. The magic spell that
             | serializes many threads of events into one without relying
             | on hard contention. You can even skip the hot busy waiting
             | if you aren't trying to arbitrage the stock market.
             | 
             | The way I would do it is a MPSC setup wherein the single
             | consumer holds an exclusive connection to the SQLite
             | database and writes out transactions in terms of the batch
             | size. Basically: BEGIN -> iterate & process event batch ->
             | END. This is very very fast due to how the CPU works at
             | hardware level. It's also a good place to insert stuff like
             | [a]synchronous replication logic.
             | 
             | Completion is handled with busy/yield waiting on a status
             | flag attached to the original event instance. You'd
             | typically do a thing where all flags are acknowledged at
             | the batch grain (i.e. after you committed the txn). This
             | has some overhead, but the throughput & latency figures are
             | really hard to argue with. It also has very compelling
             | characteristics at the extremes in terms of system load.
             | The harder you push, the faster it goes.
        
         | bityard wrote:
         | You may be interested in rqlite:
         | https://github.com/rqlite/rqlite
        
       | simonw wrote:
       | I shared some notes on this on my blog, because I'm guessing a
       | lot of people aren't quite invested enough to read through the
       | whole paper: https://simonwillison.net/2022/Sep/1/sqlite-duckdb-
       | paper/
        
         | thunderbong wrote:
         | That's a very comprehensive review. Thank you.
        
         | airstrike wrote:
         | Thank you for this. Big fan of your blog and all your
         | contributions to Django
        
         | sph wrote:
         | I waited for a tl;dr but this is even better. Much appreciated.
        
       | naikrovek wrote:
       | my ipad won't let me search through the PDF, but i couldn't find
       | where "SSB" was defined, if anywhere. i did not see it defined in
       | the first paragraph, which is where it is first used.
       | 
       | everyone: not all of your readers are domain experts. omissions
       | like this are infuriating.
        
         | ryanworl wrote:
         | Star Schema Benchmark
         | https://www.cs.umb.edu/~poneil/StarSchemaB.PDF
        
         | jrochkind1 wrote:
         | came here to ask this. I wondered if it was a typo for SSD!
        
         | glhaynes wrote:
         | Just fyi: if you're viewing the PDF in Safari on your iPad, you
         | can search by typing into Safari's Location Bar and then
         | choosing "Find 'xyz'" from the popup that appears.
        
       | polyrand wrote:
       | Regarding hash joins, the SQLite documentation mentions the
       | absence of real hash tables [0]                 SQLite constructs
       | a transient index instead of a hash table in this instance
       | because it already has a robust and high performance B-Tree
       | implementation at        hand, whereas a hash-table would need to
       | be added. Adding a separate hash table        implementation to
       | handle this one case would increase the size of the library
       | (which is designed for use on low-memory embedded devices) for
       | minimal        performance gain.
       | 
       | It's already linked in the paper, but here's the link to the code
       | used in the paper [1]
       | 
       | The paper mentions implementing Bloom filters for analytical
       | queries an explains how they're used. I wonder if this is related
       | to the query planner enhancements that landed on SQLite 3.38.0
       | [2]                 Use a Bloom filter to speed up large analytic
       | queries.
       | 
       | [0]: https://www.sqlite.org/optoverview.html#hash_joins
       | 
       | [1]: https://github.com/UWHustle/sqlite-past-present-future
       | 
       | [2]: https://www.sqlite.org/releaselog/3_38_0.html
        
       | stonemetal12 wrote:
       | >SQLite is primarily designed for fast online transaction
       | processing (OLTP), employing row-oriented execution and a B-tree
       | storage format.
       | 
       | I found that claim to be fairly surprising, SQLite is pretty bad
       | when it comes to transactions per second. SQLite even owns up to
       | it in the FAQ:
       | 
       | >it will only do a few dozen transactions per second.
        
         | tiffanyh wrote:
         | > SQLite is pretty bad when it comes to transactions per
         | second. SQLite even owns up to it in the FAQ: "it will only do
         | a few dozen transactions per second."
         | 
         | That is an extremely poor quote taken way out of context.
         | 
         | The full quote is:
         | 
         | FAQ: "[Question] INSERT is really slow - I can only do few
         | dozen INSERTs per second. [Answer] Actually, SQLite will easily
         | do 50,000 or more INSERT statements per second on an average
         | desktop computer. But it will only do a few dozen transactions
         | per second. Transaction speed is limited by the rotational
         | speed of your disk drive. A transaction normally requires two
         | complete rotations of the disk platter, which on a 7200RPM disk
         | drive limits you to about 60 transactions per second."
         | 
         | https://www.sqlite.org/faq.html#q19
        
           | hnfong wrote:
           | Given the prevalence of SSDs these days the figure might be
           | out of date as well.
        
           | hu3 wrote:
           | Yeah my GP got me confused. I remember doing 40k inserts/s in
           | a trading strategy backtesting program with Go and SQLite.
           | Reads were on the same magnitude, I want to say around 90k/s.
           | My bottleneck was CPU.
        
         | jessermeyer wrote:
         | Please quote the entire statement. And stop the needless "even
         | owns up to it" FUD.
         | 
         | > Actually, SQLite will easily do 50,000 or more INSERT
         | statements per second on an average desktop computer. But it
         | will only do a few dozen transactions per second. Transaction
         | speed is limited by the rotational speed of your disk drive. A
         | transaction normally requires two complete rotations of the
         | disk platter, which on a 7200RPM disk drive limits you to about
         | 60 transactions per second.
        
       | manimino wrote:
       | TFA appears to be about adapting SQLite for OLAP workloads. I do
       | not understand the rationale. Why try to adapt a row-based
       | storage system for OLAP? Why not just use a column store?
        
         | didgetmaster wrote:
         | It is certainly possible to have a single system that can
         | effectively process high volumes of OLTP traffic while at the
         | same time performing OLAP operations. While there are systems
         | that are designed to do one or the other type of operation
         | well, very few are able to do both.
         | https://www.youtube.com/watch?v=F6-O9v4mrCc
        
         | Comevius wrote:
         | SQLite is significantly better at OLTP and being a blob
         | strorage than DuckDB, and it doesn't want to sacrifice those
         | advantages and compatibility if OLAP performance can be
         | improved independently. In my experience for diverse workloads
         | it is more practical to start with a row-based structure and
         | incrementally transform it into a column-based one. Indeed in
         | the paper there is a suggested approach that trades space for
         | improved OLAP performance.
        
         | [deleted]
        
       ___________________________________________________________________
       (page generated 2022-09-01 23:00 UTC)