[HN Gopher] The Untold Story of SQLite (2021) ___________________________________________________________________ The Untold Story of SQLite (2021) Author : xrayarx Score : 209 points Date : 2023-01-28 16:45 UTC (6 hours ago) (HTM) web link (corecursive.com) (TXT) w3m dump (corecursive.com) | throwaway29303 wrote: | Shane Harrelson did this for us about 10 years ago. He came up | with this huge corpus of SQL statements, and he ran them against | every database engine that he could get his hands on. We wanted | to make sure everybody got the same answer, and he managed to | segfault every single database engine he tried, including SQLite, | except for Postgres. Postgres always ran and gave the correct | answer. We were never able to find a fault in that. The Postgres | people tell me that we just weren't trying hard enough. It is | possible to fault Postgres, but we were very impressed. | We crashed Oracle, including commercial versions of Oracle. We | crashed DB2. Anything we could get our hands on, we tried it and | we managed to crash it, but the point was that we wanted to make | sure that SQLite got the same answers for all of these queries, | or equivalent answers, because a lot of these queries, they're | indeterminate and the rows might come out in a different order | because you [crosstalk 00:25:10] order by clause, so we wanted to | make sure that all the database engines got equivalent answers. | Mostly, we wanted to make sure that SQLite was getting the same | answers everybody else is. That's another test suite, | and then we have lots of smaller ones, as well. Between them all, | it's a lot of testing code, and it takes a long time to run. | | I hope some people are/were paying attention to this. ;) | marcosdumay wrote: | You mean fixing their databases? I would expect Postgres dev to | act on it, but not much from the other ones. | elteto wrote: | "Shane Harrelson [...] came up with this huge corpus of SQL | statements, and he ran them against every database engine that he | could get his hands on. We wanted to make sure everybody got the | same answer, and he managed to segfault every single database | engine he tried, including SQLite, except for Postgres. Postgres | always ran and gave the correct answer. We were never able to | find a fault in that. The Postgres people tell me that we just | weren't trying hard enough. It is possible to fault Postgres, but | we were very impressed." | | Did not expect to find such a cool anecdote about Postgres here! | fijiaarone wrote: | "Freedom means taking care of yourself." | posharma wrote: | There's an industry wide fad (like ChatGPT) and then there's a | hacker news wide fad which is SQLite :-). Not sure how it all | started. Last few months (or my be even an year) there has been | an unprecedented number of SQLite posts. What gives? | jonas21 wrote: | Last few months? I feel like SQLite has been extremely popular | on HN for at least the last 5 years (and the data seems to | confirm this [1]). In which case, maybe it's not so much of a | fad? | | [1] https://hn.algolia.com/?q=sqlite | ok123456 wrote: | Calling SQLite a fad is like calling Linux a fad. | nindalf wrote: | He doesn't mean using it. He means posting about SQLite on | HN. The story of SQLite, how it came to be developed, how | it's tested, how it really works under the hood, funky ways | to use it on the server for storage (fly.io, Cloudflare do | this). All of these have become more common in the last year. | jestarray wrote: | how is sqlite a fad? its used pretty much everywhere, is | simple, and yet surprisingly scalable to some | posharma wrote: | SQLite is not a fad. It's just that it never gained so much | attention here before. | cdcarter wrote: | Google Trends [0] indicates that sqlite is about as popular | as usual. There's been some recent growth in the last year, | but that's after a bit of a dip. A regression of the time | series would be fairly flat. | | I suspect popularity here comes in waves through a Katamari | Damacy effect. People start reading about a topic, and | start posting, thus more people read, research, post, | etc... until a saturation point, a cooling off period, and | then a rebuild. | | [0]: obviously not the same as whats happening on HN, I'd | love to see someone pull these numbers from the HN API! | TheRealPomax wrote: | They were talking about it being a fad here on HN, in | terms of submitted (and upvoted to FP) posts per | week/month/year. It's always been a topic getting the | incidental FP attention, but the number of times | something relating to SQLite makes it to the front page | here seems to have increased considerably in recent | months. | lukevp wrote: | I think as single computers get more and more power, and | engineers work in these super slow, eventually consistent cloud | infrastructures, there's a visceral appeal of an architecture | that can run on my laptop and also scale to a reasonable user | base. The only limiting factor for most applications at this | point that would prevent a single machine from serving them is | having HA/DR. So a lot of effort is being made at the | replication side of SQLite so it can be part of this new stack. | I am seeing it emerge as a reaction to the crazy amount of | microservices and distributed computing just to solve small | problems that could fit on a machine. And the inability to work | locally as well. | sangnoir wrote: | SQLite (like curl and libcurl) have been pervasive and awesome | for a long time and you'd always find posts about both on HN | since forever, because of their ubiquity. | | SQLite has been appearing a lot more often on HN because of a | different more recent fad: edge computing. | jskulski wrote: | Haha yeah I feel like SQLite is a very _told_ story on HN. | Great story and tech for real, inspirational but also very | trendy right now! | posharma wrote: | Yeah, exactly. May be I should've used the word trendy as | some ppl take offense to the word fad for SQLite. | geysersam wrote: | It might be more than a year actually. I'm pretty sure it was | trendy 1.5 years ago. Remember I read a really cool blog post | around then where someone used range requests to query SQLite | statically. It used WASM + SQLite and hyped hard here. | Seattle3503 wrote: | For me, it is always the best place to start when I need a DB | for personal projects. The serverless nature makes it very easy | to use. It is easy to fall in love with the convience it | provides, because lets you focus on the fun parts of your | project. | lliamander wrote: | I think most programmers would like to be remembered for having | written some really cool, useful piece of software - ideally by | themselves or with a small team of people, where they have | creative control over the requirements and design. | | SQLite (and it's founder, Richard Hipp) are an inspirational | example of such success. | ravi-delia wrote: | I feel like it's just that it's one of the coolest pieces of | software that every developer- web, backend, systems, or | otherwise, has at least seen before. It runs in the browser, it | runs on your phone, and unlike other databases (all of which | are also very cool) it is completely within the domain of even | a single inexperienced developer. It's just a little file on | disk, just a library for accessing it, there's no magic! | Especially if you're used to taking a database for granted it's | just an extremely cool thing to think about. | ethbr0 wrote: | The brilliance of SQLite is the brilliance of interfaces. | | Value (in the abstract, not just $ sense) accrued around SQL. | | At some point, so much value accrued that people were using | it for things it wasn't designed to do. | | SQLite provided a solution for "people who want to use a | database, but don't look like traditional database | operators." Turns out there's a lot of those. | | That this large userbase existed was a brilliant observation, | combined with brilliant execution in shepherding and evolving | SQLite since. | | And none of the above would've been possible if the SQL | interface hadn't been standardized and adopted over the last | few decades*. | | * Turns out, SQL's 50th anniversary will be 2024 | therealdrag0 wrote: | SQLite has excited HN for years | vagrantJin wrote: | I doubt ChatGPT is a fad. Feels like the second coming of the | gun and it will have an outsized impact. | jacob019 wrote: | The underlying tech anyway. | [deleted] | jacob019 wrote: | Should I be using file extension .db or .sqlite3 ? | simonw wrote: | My experience is that it doesn't matter at all, but .db appears | to me a lot more common than .sqlite3 | panda_fish wrote: | Thanks for that amazing trip down memory lane... | | > Wow, I've got an SQL database running on my Palm Pilot. | | Weirdly, my thesis in 2000 was to write an SQL parser for Palm | (Palm V - still have it). I remember there being such a massive | gap in the market for a pervasive standards-compliant data | storage solution. I used javacc, which is still around I think - | I can't imagine it covered 1% of the features of SQLite though. | Bravo! | rompic wrote: | My favourite random sqlite story: The company I once worked for | used an outdated version of sqlite (3.8.6) in one of their | products. The databases used got bigger and bigger and in a very | big project one of the "already known to be slow"-queries took | more than an hour on my laptop making the tool unusable. | | On a quiet day, I was able to save the temporary table used as | part of the process and run the problematic query against it in | an isolated fashion. | | The query returned an extremely high number of results and when I | discovered this I questioned my SQL-fu, my sanity and my trust in | computers. | | I found that we were hit by a bug that was fixed 6 years before I | discovered it (https://sqlite.org/src/info/6f2222d550f5b0ee7ed). | Sqlite's query planner assumed that a field with a not null | constraint can never be null, which isn't the case for the right | hand table in a left join. | | I fixed it by adding a not null check in the query and then later | by updating the library. After that the 1 hour query ran in ~700 | ms. | | This faster run time also helped with smaller projects and in the | end allowed extending our test suite considerably. | | Tldr: Keep your dependencies up to date. | EGreg wrote: | Now what about the untold _true_ story of SQLite? | dmitriid wrote: | A Netflix documentary series | [deleted] | fijiaarone wrote: | I'm interested in looking at fossil SCM - not the built in web | server, wiki, and issue tracking, but the auto sync and | simplified interface appeal for personal projects, and maybe an | embedded versioned file based CMS. | maxk42 wrote: | I've used git since it was first released to the public and | recently I feel like I know less about it than ever before. I | switched to fossil for personal projects around a year ago and | haven't regretted it one bit! | thunderbong wrote: | Previously - | | https://news.ycombinator.com/item?id=27718701 (2 years ago, 95 | comments) | counttheforks wrote: | [flagged] | bobobob420 wrote: | marked as duplicate | https://news.ycombinator.com/item?id=34504085 | https://news.ycombinator.com/item?id=34511831 | | These were not taken down. Anyways who cares what the BBC has | to say about India. They are not exactly a known unbiased | source when it comes to coverage of India and never have been. | | I slightly agree with some of your thoughts on speaking your | mind freely on this board however its probably for the best. | Dang knows this place turning into a cesspool is not useful. I | will say to other hacker news commentators please do not | downvote if you disagree with a comment. Discourse is important | but I do agree anonymous discourse is much more nuanced | | I do not understand how this post is number one. Are you saying | Hacker News can promote certain posts to the top and it is not | organically curated? | counttheforks wrote: | A dupe of 3 and 4 day old posts, which got nearly 200 | comments? No. It is not reasonable to just remove that. This | is censorship. Especially when you consider that dang tends | to instead just post a comment saying: "Previously discussed | at ..." | | > I do not understand how this post is number one. Are you | saying Hacker News can promote certain posts to the top and | it is not organically curated? | | Absolutely, yes. The mods can influence the ranking directly | and they have openly talked about this. If you search | https://hn.algolia.com/ for "penalized", "demoted" or | "promoted" you will find discussions regarding this. | | --- | | Since dang banned me from posting new replies: | | > Maybe you haven't been here long enough to have read the | FAQ or know that generally all dupes from 3-4 days ago are | marked, and only allowed after at least several months? | | I've been here for many years. I just make a new account | periodically. Maybe you haven't been paying enough attention | to see the difference in behavior that dang exhibits when he | wants to censor something? Notice how he's usually all over | the comments explaining why something was removed as dupe, | but now there is nothing except him linking to the older | posts (https://news.ycombinator.com/item?id=34559614), | waiting for an hour or two, and then silently deleting the | new one. | | Not to mention that the subject of the post is completely | different. The latest post is about the documentary being | removed from archive.org at the request of the BBC. None of | the earlier posts cover this. | dahart wrote: | Maybe you haven't been here long enough to have read the | FAQ or know that generally all dupes from 3-4 days ago are | marked, and only allowed after at least several months? | | https://news.ycombinator.com/newsfaq.html | | "If a story has not had significant attention in the last | year or so, a small number of reposts is ok. Otherwise we | bury reposts as duplicates." | dahart wrote: | I'm confused by your comment and edits. You both asked for | and complained about being given a reason why the article | is a dupe. Which is it? The comment explains why using | those two links. Why has this particular article gotten you | so riled up? | | I come here in part for the moderation, it increases the | quality of discussions. It's not surprising that if you | keep making a lot of noise and trying to generate off-topic | outrage, it might be met with some active moderation, | that's exactly what I would expect, and I'm sorry it's | bothering you so much, but FWIW I don't mind that it's | happening, and it does seem reasonable and in-bounds from | my perspective. BTW to me it seems like you might be making | some incorrect assumptions about what's happening and why; | the news topic does seem to be covered adequately with the | first one or two articles. | counttheforks wrote: | > You both asked for and complained about being given a | reason why the article is a dupe. Which is it? | | Dang posted the links to the previous stories, didn't | mark the post as a dupe or say anything about it being a | dupe. Because it's not. Then two hours later it silently | got deleted. | dahart wrote: | It is marked as dupe, and not deleted, we can still see | it. Why does it matter if there was a time delay between | the two things you saw? What does that have to do with | anything? | croes wrote: | It's not about what the BBC says but what India does to | prevent others seeing what the BBC says. | codetrotter wrote: | Elaborate | putna wrote: | wow, amazing story. | | > it's the old joke of, you get 95% of the functionality with the | first 95% of your budget, and the last 5% on the second 95% of | your budget. | | repeats every time | aidenn0 wrote: | I think that's overly optimistic. It implies you can have a | complete project with less than double your budget. | pflanze wrote: | I thought that the implication was that the budget grew by a | factor of ~20, not 2. As in, the first 95% finished in 95% of | the original budget, the rest finished in 95% of the final | budget, so the first 95% took 5% of the final budget. | | Hmm, maybe I didn't detect the sarcasm in your reply. | breck wrote: | One thing not emphasized enough in this interview: SQLite is | public domain. So was the web, which also won against its better | funded licensed competitors at the time. | | The lesson is clear: if you want to win in the long run, you need | not just great skills and tech, but you have to go public domain. | Or to put it bluntly: #LicensesAreForLosers. | | https://breckyunits.com/how-the-public-domain-can-win.html | simonw wrote: | I think I've heard SQLite creator Richard D Hipp say in the | past that he thinks using the public domain declaration rather | than a regular open source license was a mistake. | | Public domain raises all sorts of challenges for potential | adopters of the software that aren't an issue with a more | deliberately designed license. | | UPDATE: I misremembered this. He does talk about some of the | surprise challenges in this interview, but does not go as far | as saying that he regretted it: | https://changelog.com/podcast/201#transcript-215 | buttspelunker wrote: | Are you talking about CERN's decision to put their web client | and server source code into the public domain in 1993? It seems | like a stretch - to me - to attribute the success of the web to | that decision. | andylynch wrote: | It was probably a big factor. Gopher was a real competitor | initially but the University of Minnesota which owned the IP | started trying to charge license fees in Feb '93- I suspect | CERN's decision was a response to this as well as MOSAIC's | similar efforts around the same time. | breck wrote: | You couldn't predict that CERN's protocol would win, but you | COULD predict that a public domain protocol would win. | | Proof: | | For every pair {protocolX,protocolY} where | functionality(protocolX) = functionality(protocolY) && | isPublicDomain(protocolX) == true && | isPublicDomain(protocolY) == false, then | speedAndUtility(protocolX) >> speedAndUtility(protocolY). | | https://breckyunits.com/how-the-public-domain-can-win.html | revskill wrote: | Interview question: Given a 2GB csv file, write a script to do | calculation with it. | | Me: Put all in SQLite and write a SQL query. | cauthon wrote: | how/where do y'all suggest hosting a shared SQLite db? | simonw wrote: | That's exactly what my https://datasette.io project aims to | solve! | | Recent examples: | | - https://simonwillison.net/2023/Jan/27/exploring-musiccaps/ | | - https://simonwillison.net/2022/Aug/21/scotrail/ | | - https://simonwillison.net/2022/Sep/5/laion-aesthetics- | weekno... | dfinninger wrote: | Yeah, there are actually a number of scripts I have at my | company that use Python's baked in SQLite3 module to do | something complicated very easily. | | There's even a lib for Parquet making analysis on a small | number of problematic files quite easy. | pletnes wrote: | Have a go with duckdb next time - you can query csv files | without loading them first. | simonw wrote: | You can do that with SQLite too: | https://til.simonwillison.net/sqlite/one-line-csv-operations | | (DuckDB is a lot more ergonomic for that kind of thing though | - it's really fantastic tech) | sam_lowry_ wrote: | Unlike sqlite, DuckDB is a very complex and much less | polished. | deaddodo wrote: | The spirit/answer being sought of the question is to not have | to load all of the data in memory first. | papito wrote: | The origins of SQLite are pretty wild. This was one of my | favorite episodes. ___________________________________________________________________ (page generated 2023-01-28 23:00 UTC)