[HN Gopher] SQLite: Past, Present, and Future ___________________________________________________________________ SQLite: Past, Present, and Future Author : chrstr Score : 199 points Date : 2022-09-01 13:20 UTC (9 hours ago) (HTM) web link (vldb.org) (TXT) w3m dump (vldb.org) | Thaxll wrote: | "While it continues to be the most widely used database engine in | the world" | | It realy depends what do you mean by that, yes it's shipping in | every phones and browser, but I don't consider that as a | database. Is the windows registry a database? | | Oracle, MySQL, PG, MSSQL are the most widly used DB in the world, | the web runs on those not SQLite. | adamrezich wrote: | there are far, far more sqlite instances than Windows Registry | instances in the world. | dang wrote: | The pdf: https://www.vldb.org/pvldb/vol15/p3535-gaffney.pdf | gorjusborg wrote: | I came for SQLite, got sold DuckDB. | rafale wrote: | SQLite vs Postgres for a local database (on disk, not over the | network): who wins? (Each in their most performance oriented | configuration) | lvass wrote: | SQLite. The most performant configuration is unsuited to most | usage, and may lead to database corruption on a system crash. | rafale wrote: | Should have said the most performance oriented setting that's | also safe from data corruption. | lvass wrote: | Then it depends on the usage. You'd likely need to run with | synchronous mode on, and even on WAL, multiple separate | write transactions is a issue. If you don't have many | writes or buffer them into not many transactions, SQLite is | the most performant. | bob1029 wrote: | >most performance oriented configuration | | I am 99% sure SQLite is going to win unless you actually care | about data durability at power loss time. Even if you do, I | feel I could defeat Postgres on equal terms if you permit me | access to certain ring-buffer-style, micro-batching, inter- | thread communication primitives. | | Sqlite is not great at dealing with a gigantic wall of | concurrent requests out of the box, but using a little bit of | innovation in front of SQLite can solve this problem quite | well. The key is resolve the write contention outside of the | lock that is baked into the SQLite connection. Writing batches | to SQLite on a single connection with WAL turned on and Sync | set to normal is pretty much like operating at line speed with | your IO subsystem. | [deleted] | ergocoder wrote: | Functionality-wise, SQLite's dialect is really lacking... | bob1029 wrote: | The entire point is to bring your own functions to SQLite, | since it is presumably running in-proc and can be integrated | with trivially. | | https://sqlite.org/appfunc.html | | We currently use this path to offer a domain-specific SQL- | based scripting language for our product. | simonw wrote: | Is it the SQL dialect there lacking or is it the built-in | functions? | | I agree that SQLite default functionality is very thin | compared to PostgreSQL - especially with respect to things | like date manipulation - but you can extend it with more SQL | functions (and table-valued functions) very easily. | ergocoder wrote: | Depends on what easily means. | | Sqlite can't do custom format date parsing and regex | extract. How do we extend something like this? | | If we go beyond a simple function to window function, I | imagine it would be even harder. | | At this point, we nlmight as well use postgres. | simonw wrote: | Funny you should mention those specific examples - I have | Datasette plugins adding custom SQL functions to SQLite | for both of them! | | - https://datasette.io/plugins/datasette-dateutil | | - https://datasette.io/plugins/datasette-rure | polyrand wrote: | Adding user-defined functions to SQLite is not difficult, | and the mechanism is quite flexible. You can create | extensions and load them when you create the SQLite | connection to have the functions available in queries. I | wrote a blog post explaining how to do that using Rust, | and the example is precisely a `regex_extract` function | [0]. | | If you need them, you also have a "stdlib" implemented | for Go [1] and a pretty extensive collection of | extensions [2] | | [0]: https://ricardoanderegg.com/posts/extending-sqlite- | with-rust... | | [1]: https://github.com/multiprocessio/go-sqlite3-stdlib | | [2]: https://github.com/nalgeon/sqlean | ergocoder wrote: | Wow this is helpful. I'm using sqlite for some of my | projects and always bothered that some functions are | missing. WITH RECURSIVE is too mind bending. | | This seems like I can add a lot more functions to it, not | just regex extract. | | Came here to complain and learned something useful. | Groxx wrote: | Probably also worth noting: you don't need to build (many | kinds of) extensions as C-compatible code and separate | .so files that you load. | | SQLite is an _in-process_ database. You can give it a | _callback func_ to execute. So your regex-extract can | literally just call a function in your code: | https://sqlite.org/appfunc.html | | edit: Python's stdlib documentation concisely shows how | easy this can be: https://docs.python.org/3/library/sqlit | e3.html#sqlite3.Conne... Basically every SQLite library | should have something similar. This extreme ease of | extending is a big part of why SQLite has so little | built-in. | eatonphil wrote: | I like SQLite (qualifying not for you, simonw, but for | others). But I hate that I can't be lazy by using arrays in | SQLite... because they don't exist. group_concat is a poor | approximation. | | Also, I genuinely dislike how loose SQLite is with allowed | syntax. Probably it's preference. But even interactively I | prefer to know immediately that I messed up a query. SQLite | is so forgiving I've often wasted time trying to understand | why my results are nonsense (because I typoed in the query | and SQLite didn't fail the query). | | But I also strongly dislike Python for that reason and I | know where you stand there. Maybe SQLite/PostgreSQL is | similar to the dynamic/static language preference divide. | simonw wrote: | I'm increasingly finding myself using SQLite's JSON | features for array stuff - they're surprisingly | convenient once you get into the habit of using them. A | couple of recent examples: | | - https://til.simonwillison.net/sqlite/sort-by-number-of- | json-... | | - https://til.simonwillison.net/sqlite/function- | list#user-cont... | nikeee wrote: | The documentation offers some advice on this: | | https://www.sqlite.org/whentouse.html | thomascgalvin wrote: | This is basically the exact use case SQLite was designed for; | PostgreSQL is a marvel, and at the end of the day presents a | much more robust RDBMS, but it's never going to beat SQLite _at | the thing SQLite was designed for_. | samatman wrote: | Postgres obviously. | | Sorry, just thought I'd buck the trend and assume a very write- | heavy workload with like 64 cores. | | If you don't have significant write contention, SQLite every | time. | dinosaurdynasty wrote: | If you can have one "database" thread and 63 "worker" | threads, send messages back and forth, and don't hold open | transactions, this would probably work with sqlite. Aka treat | sqlite like redis. | innocenat wrote: | Where is write contention coming from if it's operated | locally? | d3nj4l wrote: | ... you can get tons of requests on a server? | dinosaurdynasty wrote: | Redis has the same limitation (only one transaction at a | time) and is used a lot for webapps. It solves this by | requiring full transactions up front. The ideal case for | sqlite for performance is to have only a single | process/thread directly interacting with the database and | having other process/threads send messages to and from | the database process. | Thaxll wrote: | SQLite is "single" threaded for writes. | ledgerdev wrote: | Here's sqlite doing 100 million inserts in 33 seconds which | should fit into nearly every workload, though it is batched. | https://avi.im/blag/2021/fast-sqlite-inserts/ | | So write contention from multiple connections is what you're | talking about, versus a single process using sqlite? | endisneigh wrote: | No durability guarantee is a showstopper for any serious | use case | axelthegerman wrote: | Not sure what you mean by durability. Sqlite has WAL that | can be replicated (see litestream) | endisneigh wrote: | https://en.m.wikipedia.org/wiki/Durability_(database_syst | ems... | dinosaurdynasty wrote: | https://www.sqlite.org/atomiccommit.html | | sqlite is as good at durability as any non-replicated | database, though you can configure it to be non-durable | (most other databases too tbf). | | https://www.sqlite.org/pragma.html#pragma_synchronous | | By default WAL mode can rollback committed transactions | in cases of power failure, but you can do `PRAGMA | synchronous = FULL` to trade speed for durability. | endisneigh wrote: | I'm talking about the post I originally commented on. | Things were disabled so durability is not guaranteed. | dinosaurdynasty wrote: | Yeah that's not great | ledgerdev wrote: | If it's good enough for avionics and nuclear subs, it's | probably good enough for most web apps. | endisneigh wrote: | Web apps do more concurrent writes than subs, plus you | can configure SQLite for more durability | sph wrote: | I don't have the data for subs, but there's web app and | web app. No one is talking about using SQLite for 5k | queries/s. | | It might work, but I reckon 90% of web applications live | beneath this relatively small threshold and 80% probably | don't even reach 50 q/s. | lvass wrote: | Keyword here is transactions, not processes. You can model | any workload to be transaction-efficient, but it might not | be easy. | samatman wrote: | That's correct, I meant the many cores to allude to many | processes. | RedShift1 wrote: | SQLite is always going to win in that category just from the | fact that there are less layers of code to be worked through to | execute a query. | remram wrote: | Latency-wise _maybe_ , but throughput can be more important | for a lot of applications or bigger databases. | | I say "maybe" because even there, SQLite is much more limited | in terms of query-planning (very simple statistics) and the | use of multiple indexes. | | That's assuming we're talking about reads, PostgreSQL will | win for write-heavy workloads. | sophacles wrote: | > just from the fact that there are less layers of code to be | worked through | | This is not an invariant. I've seen be true, and I've seen it | be false. Sometimes that extra code is just cruft yes. Other | times though it is worth it to set up your data (or whatever) | to take advantage of mechanical sympathies in hot paths, or | filter the data before the expensive processing step, etc. | RedShift1 wrote: | I'm not talking about extra code, I'm talking about | _layers_ of code. With PostgreSQL you're still sending data | over TCP/IP or a UNIX socket, and are copying things around | in memory. Compare that to SQLite that runs in the memory | space of the program, thus no need for copying and socket | traffic. There's just less middlemen (middlepersons?) with | SQLite that are unavoidable with PostgreSQL. So less layers | = less | interpreting/serialization/deserialization/copying/... = | higher performance. I will even argue that even if the | SQLite query engine is slightly less efficient than | PostgreSQL, you're still winning because of less memory | copying going around. | fuckstick wrote: | > less | interpreting/serialization/deserialization/copying/... = | higher performance | | Unfortunately for many database workloads you are | overestimating the relative cost of this factor. | | > even if the SQLite query engine is slightly less | efficient than PostgreSQL | | And this is absurd - the postgresql query engine isn't | just "slightly" more efficient. It is tremendously more | sophisticated. People using a SQL datastore as a | glorified key-value store are not going to notice - which | seems to be a large percentage of the sqlite install | base. It's not really a fair comparison. | ok_dad wrote: | With SQLite, though, you could reasonably just skip doing | fancy joins and do everything in tiny queries in tight | loops because SQLite is literally embedded in your app's | code. You can be careless with SQLite in ways you cannot | with a monolithic database server because of that reason. | I still agree there are use cases where a centralized | database is better, but SQLite is a strange beast that | needs a special diet to perform best. | electroly wrote: | As long as you turn it into a throughput race instead of a | latency race, PostgreSQL can definitely win. SQLite has a | primitive query builder and a limited selection of query | execution steps to choose from. For instance, all joins in | SQLite are inner loop joins. It can't do hash or merge joins. | It can't do GIN or columnstore indexes. If a query needs | those things, PostgreSQL can provide them and can beat | SQLite. | ac2u wrote: | out of interest, what columnstore indexes are available to | postgres? Would be happy to find out that I'm missing | something. | | I know citus can provide columnar tables but I can't find | columnar indexes for regular row-based tables in their | docs. (use case of keeping an OLTP table but wanting to | speed up a tiny subset of queries) | | Closest thing I could find was Swarm64 for columnar indexes | but it doesn't seem to be available anymore. | Kalanos wrote: | i wish it had an optional server for more concurrent and | networked transactions in the cloud | jjtheblunt wrote: | you could make one pretty easily, no? | axelthegerman wrote: | I'd like to see that. I also think the single write situation | is not great for web applications, but I don't see an easy | way around it without sacrificing things like consistency | dinosaurdynasty wrote: | If you can treat sqlite transactions like redis | transactions (send the entire transaction up front) it can | work. | bob1029 wrote: | See: LMAX Disruptor and friends. The magic spell that | serializes many threads of events into one without relying | on hard contention. You can even skip the hot busy waiting | if you aren't trying to arbitrage the stock market. | | The way I would do it is a MPSC setup wherein the single | consumer holds an exclusive connection to the SQLite | database and writes out transactions in terms of the batch | size. Basically: BEGIN -> iterate & process event batch -> | END. This is very very fast due to how the CPU works at | hardware level. It's also a good place to insert stuff like | [a]synchronous replication logic. | | Completion is handled with busy/yield waiting on a status | flag attached to the original event instance. You'd | typically do a thing where all flags are acknowledged at | the batch grain (i.e. after you committed the txn). This | has some overhead, but the throughput & latency figures are | really hard to argue with. It also has very compelling | characteristics at the extremes in terms of system load. | The harder you push, the faster it goes. | bityard wrote: | You may be interested in rqlite: | https://github.com/rqlite/rqlite | simonw wrote: | I shared some notes on this on my blog, because I'm guessing a | lot of people aren't quite invested enough to read through the | whole paper: https://simonwillison.net/2022/Sep/1/sqlite-duckdb- | paper/ | thunderbong wrote: | That's a very comprehensive review. Thank you. | airstrike wrote: | Thank you for this. Big fan of your blog and all your | contributions to Django | sph wrote: | I waited for a tl;dr but this is even better. Much appreciated. | naikrovek wrote: | my ipad won't let me search through the PDF, but i couldn't find | where "SSB" was defined, if anywhere. i did not see it defined in | the first paragraph, which is where it is first used. | | everyone: not all of your readers are domain experts. omissions | like this are infuriating. | ryanworl wrote: | Star Schema Benchmark | https://www.cs.umb.edu/~poneil/StarSchemaB.PDF | jrochkind1 wrote: | came here to ask this. I wondered if it was a typo for SSD! | glhaynes wrote: | Just fyi: if you're viewing the PDF in Safari on your iPad, you | can search by typing into Safari's Location Bar and then | choosing "Find 'xyz'" from the popup that appears. | polyrand wrote: | Regarding hash joins, the SQLite documentation mentions the | absence of real hash tables [0] SQLite constructs | a transient index instead of a hash table in this instance | because it already has a robust and high performance B-Tree | implementation at hand, whereas a hash-table would need to | be added. Adding a separate hash table implementation to | handle this one case would increase the size of the library | (which is designed for use on low-memory embedded devices) for | minimal performance gain. | | It's already linked in the paper, but here's the link to the code | used in the paper [1] | | The paper mentions implementing Bloom filters for analytical | queries an explains how they're used. I wonder if this is related | to the query planner enhancements that landed on SQLite 3.38.0 | [2] Use a Bloom filter to speed up large analytic | queries. | | [0]: https://www.sqlite.org/optoverview.html#hash_joins | | [1]: https://github.com/UWHustle/sqlite-past-present-future | | [2]: https://www.sqlite.org/releaselog/3_38_0.html | stonemetal12 wrote: | >SQLite is primarily designed for fast online transaction | processing (OLTP), employing row-oriented execution and a B-tree | storage format. | | I found that claim to be fairly surprising, SQLite is pretty bad | when it comes to transactions per second. SQLite even owns up to | it in the FAQ: | | >it will only do a few dozen transactions per second. | tiffanyh wrote: | > SQLite is pretty bad when it comes to transactions per | second. SQLite even owns up to it in the FAQ: "it will only do | a few dozen transactions per second." | | That is an extremely poor quote taken way out of context. | | The full quote is: | | FAQ: "[Question] INSERT is really slow - I can only do few | dozen INSERTs per second. [Answer] Actually, SQLite will easily | do 50,000 or more INSERT statements per second on an average | desktop computer. But it will only do a few dozen transactions | per second. Transaction speed is limited by the rotational | speed of your disk drive. A transaction normally requires two | complete rotations of the disk platter, which on a 7200RPM disk | drive limits you to about 60 transactions per second." | | https://www.sqlite.org/faq.html#q19 | hnfong wrote: | Given the prevalence of SSDs these days the figure might be | out of date as well. | hu3 wrote: | Yeah my GP got me confused. I remember doing 40k inserts/s in | a trading strategy backtesting program with Go and SQLite. | Reads were on the same magnitude, I want to say around 90k/s. | My bottleneck was CPU. | jessermeyer wrote: | Please quote the entire statement. And stop the needless "even | owns up to it" FUD. | | > Actually, SQLite will easily do 50,000 or more INSERT | statements per second on an average desktop computer. But it | will only do a few dozen transactions per second. Transaction | speed is limited by the rotational speed of your disk drive. A | transaction normally requires two complete rotations of the | disk platter, which on a 7200RPM disk drive limits you to about | 60 transactions per second. | manimino wrote: | TFA appears to be about adapting SQLite for OLAP workloads. I do | not understand the rationale. Why try to adapt a row-based | storage system for OLAP? Why not just use a column store? | didgetmaster wrote: | It is certainly possible to have a single system that can | effectively process high volumes of OLTP traffic while at the | same time performing OLAP operations. While there are systems | that are designed to do one or the other type of operation | well, very few are able to do both. | https://www.youtube.com/watch?v=F6-O9v4mrCc | Comevius wrote: | SQLite is significantly better at OLTP and being a blob | strorage than DuckDB, and it doesn't want to sacrifice those | advantages and compatibility if OLAP performance can be | improved independently. In my experience for diverse workloads | it is more practical to start with a row-based structure and | incrementally transform it into a column-based one. Indeed in | the paper there is a suggested approach that trades space for | improved OLAP performance. | [deleted] ___________________________________________________________________ (page generated 2022-09-01 23:00 UTC)