[HN Gopher] The Untold Story of SQLite (2021)
       ___________________________________________________________________
        
       The Untold Story of SQLite (2021)
        
       Author : xrayarx
       Score  : 209 points
       Date   : 2023-01-28 16:45 UTC (6 hours ago)
        
 (HTM) web link (corecursive.com)
 (TXT) w3m dump (corecursive.com)
        
       | throwaway29303 wrote:
       | Shane Harrelson did this for us about 10 years ago. He came up
       | with this huge corpus of SQL statements, and he ran them against
       | every database engine that he could get his hands on. We wanted
       | to make sure everybody got the same answer, and he managed to
       | segfault every single database engine he tried, including SQLite,
       | except for Postgres. Postgres always ran and gave the correct
       | answer. We were never able to find a fault in that. The Postgres
       | people tell me that we just weren't trying hard enough. It is
       | possible to fault Postgres, but we were very impressed.
       | We crashed Oracle, including commercial versions of Oracle. We
       | crashed DB2. Anything we could get our hands on, we tried it and
       | we managed to crash it, but the point was that we wanted to make
       | sure that SQLite got the same answers for all of these queries,
       | or equivalent answers, because a lot of these queries, they're
       | indeterminate and the rows might come out in a different order
       | because you [crosstalk 00:25:10] order by clause, so we wanted to
       | make sure that all the database engines got equivalent answers.
       | Mostly, we wanted to make sure that SQLite was getting the same
       | answers everybody else is.            That's another test suite,
       | and then we have lots of smaller ones, as well. Between them all,
       | it's a lot of testing code, and it takes a long time to run.
       | 
       | I hope some people are/were paying attention to this. ;)
        
         | marcosdumay wrote:
         | You mean fixing their databases? I would expect Postgres dev to
         | act on it, but not much from the other ones.
        
       | elteto wrote:
       | "Shane Harrelson [...] came up with this huge corpus of SQL
       | statements, and he ran them against every database engine that he
       | could get his hands on. We wanted to make sure everybody got the
       | same answer, and he managed to segfault every single database
       | engine he tried, including SQLite, except for Postgres. Postgres
       | always ran and gave the correct answer. We were never able to
       | find a fault in that. The Postgres people tell me that we just
       | weren't trying hard enough. It is possible to fault Postgres, but
       | we were very impressed."
       | 
       | Did not expect to find such a cool anecdote about Postgres here!
        
       | fijiaarone wrote:
       | "Freedom means taking care of yourself."
        
       | posharma wrote:
       | There's an industry wide fad (like ChatGPT) and then there's a
       | hacker news wide fad which is SQLite :-). Not sure how it all
       | started. Last few months (or my be even an year) there has been
       | an unprecedented number of SQLite posts. What gives?
        
         | jonas21 wrote:
         | Last few months? I feel like SQLite has been extremely popular
         | on HN for at least the last 5 years (and the data seems to
         | confirm this [1]). In which case, maybe it's not so much of a
         | fad?
         | 
         | [1] https://hn.algolia.com/?q=sqlite
        
         | ok123456 wrote:
         | Calling SQLite a fad is like calling Linux a fad.
        
           | nindalf wrote:
           | He doesn't mean using it. He means posting about SQLite on
           | HN. The story of SQLite, how it came to be developed, how
           | it's tested, how it really works under the hood, funky ways
           | to use it on the server for storage (fly.io, Cloudflare do
           | this). All of these have become more common in the last year.
        
         | jestarray wrote:
         | how is sqlite a fad? its used pretty much everywhere, is
         | simple, and yet surprisingly scalable to some
        
           | posharma wrote:
           | SQLite is not a fad. It's just that it never gained so much
           | attention here before.
        
             | cdcarter wrote:
             | Google Trends [0] indicates that sqlite is about as popular
             | as usual. There's been some recent growth in the last year,
             | but that's after a bit of a dip. A regression of the time
             | series would be fairly flat.
             | 
             | I suspect popularity here comes in waves through a Katamari
             | Damacy effect. People start reading about a topic, and
             | start posting, thus more people read, research, post,
             | etc... until a saturation point, a cooling off period, and
             | then a rebuild.
             | 
             | [0]: obviously not the same as whats happening on HN, I'd
             | love to see someone pull these numbers from the HN API!
        
               | TheRealPomax wrote:
               | They were talking about it being a fad here on HN, in
               | terms of submitted (and upvoted to FP) posts per
               | week/month/year. It's always been a topic getting the
               | incidental FP attention, but the number of times
               | something relating to SQLite makes it to the front page
               | here seems to have increased considerably in recent
               | months.
        
         | lukevp wrote:
         | I think as single computers get more and more power, and
         | engineers work in these super slow, eventually consistent cloud
         | infrastructures, there's a visceral appeal of an architecture
         | that can run on my laptop and also scale to a reasonable user
         | base. The only limiting factor for most applications at this
         | point that would prevent a single machine from serving them is
         | having HA/DR. So a lot of effort is being made at the
         | replication side of SQLite so it can be part of this new stack.
         | I am seeing it emerge as a reaction to the crazy amount of
         | microservices and distributed computing just to solve small
         | problems that could fit on a machine. And the inability to work
         | locally as well.
        
         | sangnoir wrote:
         | SQLite (like curl and libcurl) have been pervasive and awesome
         | for a long time and you'd always find posts about both on HN
         | since forever, because of their ubiquity.
         | 
         | SQLite has been appearing a lot more often on HN because of a
         | different more recent fad: edge computing.
        
         | jskulski wrote:
         | Haha yeah I feel like SQLite is a very _told_ story on HN.
         | Great story and tech for real, inspirational but also very
         | trendy right now!
        
           | posharma wrote:
           | Yeah, exactly. May be I should've used the word trendy as
           | some ppl take offense to the word fad for SQLite.
        
         | geysersam wrote:
         | It might be more than a year actually. I'm pretty sure it was
         | trendy 1.5 years ago. Remember I read a really cool blog post
         | around then where someone used range requests to query SQLite
         | statically. It used WASM + SQLite and hyped hard here.
        
         | Seattle3503 wrote:
         | For me, it is always the best place to start when I need a DB
         | for personal projects. The serverless nature makes it very easy
         | to use. It is easy to fall in love with the convience it
         | provides, because lets you focus on the fun parts of your
         | project.
        
         | lliamander wrote:
         | I think most programmers would like to be remembered for having
         | written some really cool, useful piece of software - ideally by
         | themselves or with a small team of people, where they have
         | creative control over the requirements and design.
         | 
         | SQLite (and it's founder, Richard Hipp) are an inspirational
         | example of such success.
        
         | ravi-delia wrote:
         | I feel like it's just that it's one of the coolest pieces of
         | software that every developer- web, backend, systems, or
         | otherwise, has at least seen before. It runs in the browser, it
         | runs on your phone, and unlike other databases (all of which
         | are also very cool) it is completely within the domain of even
         | a single inexperienced developer. It's just a little file on
         | disk, just a library for accessing it, there's no magic!
         | Especially if you're used to taking a database for granted it's
         | just an extremely cool thing to think about.
        
           | ethbr0 wrote:
           | The brilliance of SQLite is the brilliance of interfaces.
           | 
           | Value (in the abstract, not just $ sense) accrued around SQL.
           | 
           | At some point, so much value accrued that people were using
           | it for things it wasn't designed to do.
           | 
           | SQLite provided a solution for "people who want to use a
           | database, but don't look like traditional database
           | operators." Turns out there's a lot of those.
           | 
           | That this large userbase existed was a brilliant observation,
           | combined with brilliant execution in shepherding and evolving
           | SQLite since.
           | 
           | And none of the above would've been possible if the SQL
           | interface hadn't been standardized and adopted over the last
           | few decades*.
           | 
           | * Turns out, SQL's 50th anniversary will be 2024
        
         | therealdrag0 wrote:
         | SQLite has excited HN for years
        
         | vagrantJin wrote:
         | I doubt ChatGPT is a fad. Feels like the second coming of the
         | gun and it will have an outsized impact.
        
           | jacob019 wrote:
           | The underlying tech anyway.
        
       | [deleted]
        
       | jacob019 wrote:
       | Should I be using file extension .db or .sqlite3 ?
        
         | simonw wrote:
         | My experience is that it doesn't matter at all, but .db appears
         | to me a lot more common than .sqlite3
        
       | panda_fish wrote:
       | Thanks for that amazing trip down memory lane...
       | 
       | > Wow, I've got an SQL database running on my Palm Pilot.
       | 
       | Weirdly, my thesis in 2000 was to write an SQL parser for Palm
       | (Palm V - still have it). I remember there being such a massive
       | gap in the market for a pervasive standards-compliant data
       | storage solution. I used javacc, which is still around I think -
       | I can't imagine it covered 1% of the features of SQLite though.
       | Bravo!
        
       | rompic wrote:
       | My favourite random sqlite story: The company I once worked for
       | used an outdated version of sqlite (3.8.6) in one of their
       | products. The databases used got bigger and bigger and in a very
       | big project one of the "already known to be slow"-queries took
       | more than an hour on my laptop making the tool unusable.
       | 
       | On a quiet day, I was able to save the temporary table used as
       | part of the process and run the problematic query against it in
       | an isolated fashion.
       | 
       | The query returned an extremely high number of results and when I
       | discovered this I questioned my SQL-fu, my sanity and my trust in
       | computers.
       | 
       | I found that we were hit by a bug that was fixed 6 years before I
       | discovered it (https://sqlite.org/src/info/6f2222d550f5b0ee7ed).
       | Sqlite's query planner assumed that a field with a not null
       | constraint can never be null, which isn't the case for the right
       | hand table in a left join.
       | 
       | I fixed it by adding a not null check in the query and then later
       | by updating the library. After that the 1 hour query ran in ~700
       | ms.
       | 
       | This faster run time also helped with smaller projects and in the
       | end allowed extending our test suite considerably.
       | 
       | Tldr: Keep your dependencies up to date.
        
       | EGreg wrote:
       | Now what about the untold _true_ story of SQLite?
        
         | dmitriid wrote:
         | A Netflix documentary series
        
       | [deleted]
        
       | fijiaarone wrote:
       | I'm interested in looking at fossil SCM - not the built in web
       | server, wiki, and issue tracking, but the auto sync and
       | simplified interface appeal for personal projects, and maybe an
       | embedded versioned file based CMS.
        
         | maxk42 wrote:
         | I've used git since it was first released to the public and
         | recently I feel like I know less about it than ever before. I
         | switched to fossil for personal projects around a year ago and
         | haven't regretted it one bit!
        
       | thunderbong wrote:
       | Previously -
       | 
       | https://news.ycombinator.com/item?id=27718701 (2 years ago, 95
       | comments)
        
       | counttheforks wrote:
       | [flagged]
        
         | bobobob420 wrote:
         | marked as duplicate
         | https://news.ycombinator.com/item?id=34504085
         | https://news.ycombinator.com/item?id=34511831
         | 
         | These were not taken down. Anyways who cares what the BBC has
         | to say about India. They are not exactly a known unbiased
         | source when it comes to coverage of India and never have been.
         | 
         | I slightly agree with some of your thoughts on speaking your
         | mind freely on this board however its probably for the best.
         | Dang knows this place turning into a cesspool is not useful. I
         | will say to other hacker news commentators please do not
         | downvote if you disagree with a comment. Discourse is important
         | but I do agree anonymous discourse is much more nuanced
         | 
         | I do not understand how this post is number one. Are you saying
         | Hacker News can promote certain posts to the top and it is not
         | organically curated?
        
           | counttheforks wrote:
           | A dupe of 3 and 4 day old posts, which got nearly 200
           | comments? No. It is not reasonable to just remove that. This
           | is censorship. Especially when you consider that dang tends
           | to instead just post a comment saying: "Previously discussed
           | at ..."
           | 
           | > I do not understand how this post is number one. Are you
           | saying Hacker News can promote certain posts to the top and
           | it is not organically curated?
           | 
           | Absolutely, yes. The mods can influence the ranking directly
           | and they have openly talked about this. If you search
           | https://hn.algolia.com/ for "penalized", "demoted" or
           | "promoted" you will find discussions regarding this.
           | 
           | ---
           | 
           | Since dang banned me from posting new replies:
           | 
           | > Maybe you haven't been here long enough to have read the
           | FAQ or know that generally all dupes from 3-4 days ago are
           | marked, and only allowed after at least several months?
           | 
           | I've been here for many years. I just make a new account
           | periodically. Maybe you haven't been paying enough attention
           | to see the difference in behavior that dang exhibits when he
           | wants to censor something? Notice how he's usually all over
           | the comments explaining why something was removed as dupe,
           | but now there is nothing except him linking to the older
           | posts (https://news.ycombinator.com/item?id=34559614),
           | waiting for an hour or two, and then silently deleting the
           | new one.
           | 
           | Not to mention that the subject of the post is completely
           | different. The latest post is about the documentary being
           | removed from archive.org at the request of the BBC. None of
           | the earlier posts cover this.
        
             | dahart wrote:
             | Maybe you haven't been here long enough to have read the
             | FAQ or know that generally all dupes from 3-4 days ago are
             | marked, and only allowed after at least several months?
             | 
             | https://news.ycombinator.com/newsfaq.html
             | 
             | "If a story has not had significant attention in the last
             | year or so, a small number of reposts is ok. Otherwise we
             | bury reposts as duplicates."
        
             | dahart wrote:
             | I'm confused by your comment and edits. You both asked for
             | and complained about being given a reason why the article
             | is a dupe. Which is it? The comment explains why using
             | those two links. Why has this particular article gotten you
             | so riled up?
             | 
             | I come here in part for the moderation, it increases the
             | quality of discussions. It's not surprising that if you
             | keep making a lot of noise and trying to generate off-topic
             | outrage, it might be met with some active moderation,
             | that's exactly what I would expect, and I'm sorry it's
             | bothering you so much, but FWIW I don't mind that it's
             | happening, and it does seem reasonable and in-bounds from
             | my perspective. BTW to me it seems like you might be making
             | some incorrect assumptions about what's happening and why;
             | the news topic does seem to be covered adequately with the
             | first one or two articles.
        
               | counttheforks wrote:
               | > You both asked for and complained about being given a
               | reason why the article is a dupe. Which is it?
               | 
               | Dang posted the links to the previous stories, didn't
               | mark the post as a dupe or say anything about it being a
               | dupe. Because it's not. Then two hours later it silently
               | got deleted.
        
               | dahart wrote:
               | It is marked as dupe, and not deleted, we can still see
               | it. Why does it matter if there was a time delay between
               | the two things you saw? What does that have to do with
               | anything?
        
           | croes wrote:
           | It's not about what the BBC says but what India does to
           | prevent others seeing what the BBC says.
        
             | codetrotter wrote:
             | Elaborate
        
       | putna wrote:
       | wow, amazing story.
       | 
       | > it's the old joke of, you get 95% of the functionality with the
       | first 95% of your budget, and the last 5% on the second 95% of
       | your budget.
       | 
       | repeats every time
        
         | aidenn0 wrote:
         | I think that's overly optimistic. It implies you can have a
         | complete project with less than double your budget.
        
           | pflanze wrote:
           | I thought that the implication was that the budget grew by a
           | factor of ~20, not 2. As in, the first 95% finished in 95% of
           | the original budget, the rest finished in 95% of the final
           | budget, so the first 95% took 5% of the final budget.
           | 
           | Hmm, maybe I didn't detect the sarcasm in your reply.
        
       | breck wrote:
       | One thing not emphasized enough in this interview: SQLite is
       | public domain. So was the web, which also won against its better
       | funded licensed competitors at the time.
       | 
       | The lesson is clear: if you want to win in the long run, you need
       | not just great skills and tech, but you have to go public domain.
       | Or to put it bluntly: #LicensesAreForLosers.
       | 
       | https://breckyunits.com/how-the-public-domain-can-win.html
        
         | simonw wrote:
         | I think I've heard SQLite creator Richard D Hipp say in the
         | past that he thinks using the public domain declaration rather
         | than a regular open source license was a mistake.
         | 
         | Public domain raises all sorts of challenges for potential
         | adopters of the software that aren't an issue with a more
         | deliberately designed license.
         | 
         | UPDATE: I misremembered this. He does talk about some of the
         | surprise challenges in this interview, but does not go as far
         | as saying that he regretted it:
         | https://changelog.com/podcast/201#transcript-215
        
         | buttspelunker wrote:
         | Are you talking about CERN's decision to put their web client
         | and server source code into the public domain in 1993? It seems
         | like a stretch - to me - to attribute the success of the web to
         | that decision.
        
           | andylynch wrote:
           | It was probably a big factor. Gopher was a real competitor
           | initially but the University of Minnesota which owned the IP
           | started trying to charge license fees in Feb '93- I suspect
           | CERN's decision was a response to this as well as MOSAIC's
           | similar efforts around the same time.
        
           | breck wrote:
           | You couldn't predict that CERN's protocol would win, but you
           | COULD predict that a public domain protocol would win.
           | 
           | Proof:
           | 
           | For every pair {protocolX,protocolY} where
           | functionality(protocolX) = functionality(protocolY) &&
           | isPublicDomain(protocolX) == true &&
           | isPublicDomain(protocolY) == false, then
           | speedAndUtility(protocolX) >> speedAndUtility(protocolY).
           | 
           | https://breckyunits.com/how-the-public-domain-can-win.html
        
       | revskill wrote:
       | Interview question: Given a 2GB csv file, write a script to do
       | calculation with it.
       | 
       | Me: Put all in SQLite and write a SQL query.
        
         | cauthon wrote:
         | how/where do y'all suggest hosting a shared SQLite db?
        
           | simonw wrote:
           | That's exactly what my https://datasette.io project aims to
           | solve!
           | 
           | Recent examples:
           | 
           | - https://simonwillison.net/2023/Jan/27/exploring-musiccaps/
           | 
           | - https://simonwillison.net/2022/Aug/21/scotrail/
           | 
           | - https://simonwillison.net/2022/Sep/5/laion-aesthetics-
           | weekno...
        
         | dfinninger wrote:
         | Yeah, there are actually a number of scripts I have at my
         | company that use Python's baked in SQLite3 module to do
         | something complicated very easily.
         | 
         | There's even a lib for Parquet making analysis on a small
         | number of problematic files quite easy.
        
         | pletnes wrote:
         | Have a go with duckdb next time - you can query csv files
         | without loading them first.
        
           | simonw wrote:
           | You can do that with SQLite too:
           | https://til.simonwillison.net/sqlite/one-line-csv-operations
           | 
           | (DuckDB is a lot more ergonomic for that kind of thing though
           | - it's really fantastic tech)
        
             | sam_lowry_ wrote:
             | Unlike sqlite, DuckDB is a very complex and much less
             | polished.
        
           | deaddodo wrote:
           | The spirit/answer being sought of the question is to not have
           | to load all of the data in memory first.
        
       | papito wrote:
       | The origins of SQLite are pretty wild. This was one of my
       | favorite episodes.
        
       ___________________________________________________________________
       (page generated 2023-01-28 23:00 UTC)