[HN Gopher] Fly.io buys Litestream
       ___________________________________________________________________
        
       Fly.io buys Litestream
        
       Author : dpeck
       Score  : 383 points
       Date   : 2022-05-09 19:35 UTC (3 hours ago)
        
 (HTM) web link (fly.io)
 (TXT) w3m dump (fly.io)
        
       | swaraj wrote:
       | Looks v cool, but I feel like I'm missing a big part of the
       | story, how do 2 app 'servers/process' connect to same
       | sqlite/litestream db?
       | 
       | Do you 'init' (restore) the db from each app process? When one
       | app makes a write, is it instantly reflected on the other app's
       | local sqlite?
        
         | judofyr wrote:
         | Each server would have one copy of the SQLite database. Only
         | one of the server would support writes -- and those write will
         | be replicated to the other server. Reads in the other server
         | will be transactionally safe, but might be slightly out of
         | date.
        
           | swaraj wrote:
           | This is my main q: are the writes replicated in real-time? Do
           | the apps that just need read access have to repeatedly call
           | 'restore'?
        
             | tptacek wrote:
             | https://litestream.io/getting-started/#continuous-
             | replicatio...
        
         | thruflo wrote:
         | Also how does the WAL page based replication maintain
         | consistency / handle concurrent updates?
        
           | infogulch wrote:
           | It doesn't, this gives you a read-only replica only.
        
         | johnrrk wrote:
         | I also investigated SQLite and it's not clear how we can use it
         | with multiple servers.
         | 
         | The WAL documentation [1] says "The wal-index greatly improves
         | the performance of readers, but the use of shared memory means
         | that all readers must exist on the same machine. This is why
         | the write-ahead log implementation will not work on a network
         | filesystem."
         | 
         | So it seems that we can't have 2 Node.js servers accessing the
         | same SQLite file on a shared volume.
         | 
         | I'm not sure how to do zero downtime deployment (like starting
         | server 2, checking it works, and shutting down server 1, seems
         | risky since we'll have 2 servers accessing the same SQLite file
         | temporarily)
         | 
         | [1] https://sqlite.org/wal.html
        
           | tptacek wrote:
           | The point of Litestream is that you don't have multiple
           | servers accessing the same SQLite file. They all have their
           | own SQLite databases. Of course, you only write to one of
           | them, but that's a common constraint for database clusters.
        
       | rwho wrote:
        
       | mwcampbell wrote:
       | Congratulations to Ben on getting a well-funded player like Fly
       | to buy into this vision. I'm looking forward to seeing a
       | complete, ready-to-deploy sample app, when the upcoming
       | Litestream enhancements are ready.
       | 
       | I know that Fly also likes Elixir and Phoenix; they hired Chris
       | McCord, after all. So would it make sense for Phoenix
       | applications deployed in production on Fly to use SQLite and
       | Litestream? Is support for SQLite in the Elixir ecosystem,
       | particularly Ecto, good enough for this?
        
         | warmwaffles wrote:
         | > Is support for SQLite in the Elixir ecosystem, particularly
         | Ecto, good enough for this?
         | 
         | Why yes it is. I maintain the `exqlite` and `ecto_sqlite3`
         | libraries and it was just integrated in with `kino_db` which is
         | used by `livebook`.
         | 
         | https://github.com/elixir-sqlite/exqlite
        
       | swlkr wrote:
       | The reduction in complexity from using sqlite + litestream as a
       | server side database is great to see!
        
       | netcraft wrote:
       | This is similar to what I hoped websql had eventually grown into.
       | sqlite in the browser, but let me sync it up and down with a
       | server. Every user gets their own database, the first time to the
       | app they "install" the control and system data, then their data,
       | then writes are synced to the server. If it became standard, it
       | could be super easy - conflict resolution notwithstanding.
        
         | bambax wrote:
         | You can make webapps using exactly this approach, with json in
         | localstorage as the client db, and occasiona, asynchronous,
         | writes to the server. I'm now building a simple webapp exactly
         | like this, and the server db is sqlite. So far it works
         | perfectly fine.
        
       | tyingq wrote:
       | Dqlite is also interesting, and in a similar space. It seems to
       | have evolved from the LXC/LXD team wanting a replacement for
       | Etcd. It's Sqlite with raft replication and also a networked
       | client protocol.
       | 
       | https://dqlite.io/docs/architecture
        
         | tptacek wrote:
         | There's also rqlite. There's definitely a place for this kind
         | of stuff. But we already use a bunch of stuff that does
         | distributed consensus in our stack, and the experience has left
         | us wary of it, especially for global distribution. We almost
         | used rqlite for a statekeeping feature internally, but today
         | we'd certainly just use sqlite+litestream for the same kinds of
         | features, just because it's easier to reason about and to deal
         | with operationally when there's problems.
         | 
         | https://fly.io/blog/a-foolish-consistency/
        
           | otoolep wrote:
           | rqlite author here. Anything else you can tell me about why
           | you decided against it? Just simpler, as you say, to avoid a
           | distributed system when you can (something I understand).
        
             | tptacek wrote:
             | We like rqlite a lot. There's some comments in your issue
             | tracker from Jerome about it at the time. The decision
             | wasn't against rqlite as a piece of software so much as it
             | was us deliberately deciding not to introduce more Raft
             | into our architecture; any place there is Raft, we're
             | concerned we'll essentially need to train our whole on-call
             | rotation on how to handle issues.
             | 
             | The annoying thing about global consensus is that the
             | operational problems tend to be global as well; we had an
             | outage last night (correlated disk failure on 3 different
             | machines!) in Chicago, and it slowed down deploys all the
             | way to Sydney, essentially because of invariants maintained
             | by a global Raft consensus and fed in part from
             | malfunctioning machines.
             | 
             | I think rqlite would make a lot of sense for us for
             | applications where we run multiple regional clusters; it's
             | just that our problems today tend to be global. We're not
             | just looking for opportunities to rip Raft out of our
             | stack; we're also trying to build APIs that regionalize
             | nicely. In nicely-regionalized, contained settings, rqlite
             | might work a treat for us.
        
       | RcouF1uZ4gsC wrote:
       | I love Litestream! It is so simple and it just works!
       | 
       | Congratulations, Ben, on making a great product and on the sale!
       | 
       | One thing I have had in the back of my mind, but have not had the
       | time to pursue is using SQLite replication to make something
       | similar to CloudFlare's durable objects but more open.
       | 
       | A "durable object" would be an SQLite database and some program
       | that processes requests and accesses the SQLite database. There
       | would be a runtime that transparently replicates the (database,
       | program) pair where they are needed and routes to them.
       | 
       | That way, I can just start out locally developing my program with
       | an SQLite database, and then run a command and have it available
       | globally. At the same time, since it is just accessing an SQLite
       | database, there would be much less risk of lockin.
        
       | krts- wrote:
       | A great project with awesome implications. Well deserved, and the
       | fly.io team are very pragmatic.
       | 
       | This will be even more _brilliant_ than it already is when fly.io
       | can get some slick sidecar /multi-process stuff.
       | 
       | I ended up back with Postgres after my misconfigs left me a bit
       | burned with S3 costs and data stuff. But I think a master VM
       | backed by persistent storage on fly with read replicas as
       | required is maybe the next step: I love the simplicity of SQLite.
        
       | foodstances wrote:
       | Just curious, is there any financial compensation/support going
       | to Richard Hipp with all of this money changing hands?
       | 
       | When I see these startups making a business that is so heavily
       | based on open-source software (like Tailscale on top of
       | Wireguard), I have to wonder what these companies do to actually
       | support the author(s) of the software that so much of their
       | company is based on.
        
         | mrkurt wrote:
         | Yes. We (Fly.io) are buying a sqlite support agreement. We also
         | send money WireGuard's way. I'm pretty sure Tailscale does too.
         | 
         | We have also given OSS authors advisor equity. A couple of
         | folks wrote libraries that were important to keeping us going,
         | and we've granted them shares the same way some startups would
         | to MBA advisors.
        
           | foodstances wrote:
           | That's great to hear, thank you!
        
         | qbasic_forever wrote:
         | I agree Richard Hipp should be compensated but he explicitly
         | licensed and releases SQLite under a public domain license:
         | https://www.sqlite.org/copyright.html Not Apache, not MIT, not
         | GPL... public domain. You can do almost anything with it and
         | not be beholden to any demands. You can tell people you built
         | your business on SQLite... or not. It's public domain.
         | 
         | That said SQLite has a business model of selling support and
         | premium features like encryption:
         | https://www.sqlite.org/prosupport.html
        
           | foodstances wrote:
           | Sure, but Apache, MIT, and GPL licenses don't require payment
           | to the author either. That's why it's up to the company to
           | decide to offer compensation without being required to, and
           | why I'm curious which companies actually do it.
           | 
           | It's like when RedHat when public and offered pre-IPO stock
           | to open source developers.
        
       | otoolep wrote:
       | Congratulations to Ben! This project has been like a rocket ship.
        
         | benbjohnson wrote:
         | Thanks, Philip!
        
       | no_wizard wrote:
       | This a great and interesting offering! I think this fits well
       | with fly.io and their model of computing.
       | 
       | I now wish that I had engaged with this idea that was very
       | similar to litestream that I had about a year and half ago. I
       | always thought SQLite just needed a distribution layer to be
       | extremely effective as a distributed database of sorts. Its flat
       | file architecture means its easy to provision, restore and
       | backup. SQLite also has incremental snapshotting and re-
       | producible WAL logs that can be used to do incremental backups,
       | restores, writes etc. It just needs a "frontend" to handle those
       | bits. Latency has gotten to the point where you can replicate a
       | database by its continued snapshots (which is, on a high level,
       | what litestream appears to be doing) being propagated out to
       | object / blob storage. You could even achieve brute force
       | consensus with this approach if you ran it in a truly distributed
       | way (though RAFT is probably more efficient).
       | 
       | Reason I didn't do this? I thought to myself - why in the world
       | in 2020 would someone choose to use SQLite at scale instead of
       | something like Firebase, Spanner, Fauna, or even Postgres? So
       | after I did an initial prototype (long gone, never pushed it to
       | GitHub) I just felt like...there was no appetite for it.
       | 
       | Now I regret!
       | 
       | Just a long winded way of saying, congrats! This is awesome!
       | Thanks for doing exactly what I wanted to do but didn't have the
       | guts to follow through with.
        
         | epilys wrote:
         | I implemented exactly this setup, in Rust, last year for a
         | client. Distributed WAL with write locks on a RAFT scheme.
         | Custom VFS in Rust for sqlite3 to handle the IO. I asked the
         | client to opensource it but it's probably not gonna happen...
         | It's definitely doable though.
        
           | ComputerGuru wrote:
           | Did you write your own rust raft implementation or reuse
           | something already available?
        
             | epilys wrote:
             | Reused a well known library that uses raft. I don't know if
             | I should mention any more details since it was a private
             | project.
        
         | Serow225 wrote:
         | there's some stuff out there:
         | 
         | - https://github.com/rqlite/rqlite -
         | https://github.com/chiselstrike/chiselstore -
         | https://dqlite.io/
         | 
         | I'm sure there's more, those are just the ones I remember.
        
       | mrcwinn wrote:
       | I have really enjoyed using Fly. Great service and support.
        
       | scwoodal wrote:
       | > According to the conventional wisdom, SQLite has a place in
       | this architecture: as a place to run unit tests.
       | 
       | Be careful with this approach. Frameworks like Django have DB
       | engine specific features[1]. When you start using them in your
       | application you can no longer use a different DB (SQLite) to run
       | your unit tests.
       | 
       | [1]
       | https://docs.djangoproject.com/en/4.0/ref/contrib/postgres/f...
        
       | seanwilson wrote:
       | SQLite uses dynamic types? Is this an issue in practice,
       | especially for large apps? Don't you lose guarantees about your
       | data which makes it messy to handle on the backend?
       | 
       | Context from https://www.sqlite.org/datatype3.html: "SQLite uses
       | a more general dynamic type system. In SQLite, the datatype of a
       | value is associated with the value itself, not with its
       | container. The dynamic type system of SQLite is backwards
       | compatible with the more common static type systems of other
       | database engines in the sense that SQL statements that work on
       | statically typed databases work the same way in SQLite. However,
       | the dynamic typing in SQLite allows it to do things which are not
       | possible in traditional rigidly typed databases. Flexible typing
       | is a feature of SQLite, not a bug."
        
         | aliswe wrote:
         | This sounds like schemalessness to me? serious "question".
        
           | jamie_ca wrote:
           | Not schemaless, but typeless. SQLite will let you declare a
           | column to be an integer and then dump a string into it, but
           | you're still defining a table with specific columns.
           | 
           | It's like the opposite problem Mysql has when you try to
           | write data larger than the field definition - Mysql will
           | truncate, Sqlite will store the data you gave it.
        
             | seanwilson wrote:
             | Typeless is the default though? Why wouldn't you want the
             | types to be reliable when you're reading/writing from the
             | backend in the general case?
        
         | ripley12 wrote:
         | You can use SQLite in strict mode if you prefer.
         | https://www.sqlite.org/stricttables.html
        
       | rco8786 wrote:
       | All of the action around SQLite recently is very exciting!
        
       | bob1029 wrote:
       | > SQLite isn't just on the same machine as your application, but
       | actually built into your application process. When you put your
       | data right next to your application, you can see per-query
       | latency drop to 10-20 microseconds. That's micro, with a m. A
       | 50-100x improvement over an intra-region Postgres query.
       | 
       | This is the #1 reason my exuberant technical mind likes that we
       | use SQLite for all the things. Latency is the exact reason you
       | would have a problem scaling any large system in the first place.
       | Forcing it all into one cache-coherent domain is a really good
       | way to begin eliminating entire universes of bugs.
       | 
       | Do we all appreciate just how much more throughput you can get in
       | the case described above? A 100x latency improvement doesn't
       | translate _directly_ into the same # of transactions per second,
       | but its pretty damn close if your I /O subsystem is up to the
       | task.
        
         | throwaway894345 wrote:
         | If you're pushing the database up into the application layer,
         | do you have to route all write operations through a single
         | "master" application instance? If not, is there some multi-
         | master scheme, and if so, is it cheaper to propagate state all
         | the time than it is to have the application write to a master
         | database instance over a network? Moreover, how does it affect
         | the operations of your application? Are you still as
         | comfortable bouncing an application instance as you would
         | otherwise be?
        
         | closeparen wrote:
         | This is a large part of what Rich Hickey emphasizes about
         | Datomic, too. We're so used to the database being "over there"
         | but it's actually very nice to have it locally. Datomic solves
         | this in the context of a distributed database by having the
         | read-only replicas local to client applications while the
         | transaction-running parts are remote.
        
           | abraxas wrote:
           | Only trouble with that particular implementation is that the
           | Datomic Transactor is a single threaded single process that
           | serializes every transaction going through it. As long as you
           | don't need to scale writes it works like a charm. However,
           | the workloads I somehow always end up working with are write
           | heavy or at best 50/50 between read and write.
        
         | vmception wrote:
         | > SQLite isn't just on the same machine as your application,
         | but actually built into your application process.
         | 
         | How is that different than whats commonly happening? Android
         | and iOS do this... right? ... but its still accessing the
         | filesystem to use it.
         | 
         | Am I missing something or is what they are describing just
         | completely commonplace that is only interesting to people that
         | use microservices and never knew what was normal.
        
           | tlb wrote:
           | It's normal (and HN does something similar, working from in-
           | process data) for systems that don't have to scale beyond one
           | server. If you need multiple servers you have to do
           | something, such as Litestream.
        
           | mrkurt wrote:
           | This is how client apps use sqlite, yes. Single instance
           | client apps. Litestream is one method of making sqlite work
           | for server side apps. The hard part on the server is solving
           | for multiple processes/vms/containers writing to one sqlite
           | db.
        
             | nicoburns wrote:
             | > the hard part on the server is solving for multiple
             | processes/vms/containers writing to one sqlite db.
             | 
             | I feel like if you have multiple apps writing to the
             | database then you shouldn't be using SQLite. That's where
             | Postgres etc completely earn their place in the stack.
             | Where litestream is really valuable is when you have a
             | single writer, but you want point-in-time backups like you
             | can get with postgres.
        
             | vmception wrote:
             | interesting, such a weird way to describe it then. but I
             | guess some people are more familiar with that problem.
        
         | funstuff007 wrote:
         | This is exactly the reason I am so skeptical of the cloud. I
         | don't care how easy it is to stand up VMs, containers, k8s,
         | etc. What I need to know is how hard is it to lug my data to my
         | application and vice a versa. My feelings on this are so strong
         | as I work mostly on database read-heavy applications.
        
         | WJW wrote:
         | How do any writes end up on other horizontally scaled machines
         | though? To me the whole point of a database on another machine
         | is that it is the single point of truth that many horizontally
         | scaled servers can write to and read each others' updates from.
         | If you don't need that, you might as well read the entire
         | dataset into memory and be done with it.
         | 
         | I know TFA says that you can "soon" automagically replicate
         | your sqlite db to another server, but it only allows writes on
         | a single server and all other will be readers. Now you need to
         | think about how to move all write traffic to a single app
         | server. All writes to that server will still take several
         | milliseconds (possibly more, since S3 is eventually consistent)
         | to propagate around all replicas.
         | 
         | In short, 100x latency improvement for reads is great but a bit
         | of a red herring since if you have read-only traffic you don't
         | need sqlite replication. If you do have write traffic, then
         | routing it through S3 will definitely not give you a 100x
         | latency improvement over Postgres or MySQL anymore. Litestream
         | is definitely on my radar, but as a continuous backup system
         | for small apps ("small" meaning it runs and will always run on
         | a single box) rather than a wholesale replacement of
         | traditional client-server databases.
         | 
         | PS: Congrats Ben!
        
           | jolux wrote:
           | S3 is strongly consistent now:
           | https://aws.amazon.com/s3/consistency/
        
           | bob1029 wrote:
           | What if, due to ridiculous latency reductions, your business
           | no longer requires more than 1 machine to function at scale?
           | 
           | I'm talking more about sqlite itself than any given product
           | around it at this point, but I still think it's an
           | interesting thought experiment in this context.
        
             | WJW wrote:
             | I'll point out that the ridiculous latency reductions don't
             | apply to replicating the writes to S3 and/or any replica
             | servers, that still takes as long as it would to any other
             | server across a network. The latency reductions are _only_
             | for pure read traffic. Also, every company I ever worked at
             | had a policy to run at least two instances of a service in
             | case of hardware failure. (Is this reasonable to
             | extrapolate this policy to a company which might want to
             | run on a single sqlite instance? I don 't know, but just as
             | a datapoint I don't think any business should strive to run
             | on a single instance)
             | 
             | This write latency _might_ be fine, although more than one
             | backend app I know renewed the expiry time of a user
             | session on every hit and would thus do at least one DB
             | write per HTTP call. I don 't think this is optimal, but it
             | does happen and simply going "well don't do write traffic
             | then" does not always line up with how apps are actually
             | built. Replicated sqlite over litestream is very cool, but
             | definitely you need to build your app around and also
             | definitely something that costs you one of your innovation
             | tokens.
        
               | tptacek wrote:
               | There's no magic here (that there is no magic is part of
               | the point). You have the same phenomenon in n-tier
               | Postgres deployments: to be highly available, you need
               | multiple instances; you're going to have a write leader,
               | because you're not realistically want to run a Raft
               | consensus for every write; etc.
               | 
               | The point of the post is just that if you can get rid of
               | most of the big operational problems with using server-
               | side SQLite in a distributed application --- most
               | notably, failing over and snapshotting --- then SQLite
               | can occupy a much more interesting role in your stack
               | than it's conventionally been assigned. SQLite has some
               | very attractive properties that have been largely ignored
               | because people assume they won't be able to scale it out
               | and manage it. Well, you can scale it out and manage it.
               | Now you've got an extremely simple database layer that's
               | easy to reason about, doesn't require you to run a
               | database server (or even a cache server) next to all your
               | app instances, and happens to be extraordinarily fast.
               | 
               | Maybe it doesn't make sense for your app? There are
               | probably lots of apps that really want Postgres and not
               | SQLite. But the architecture we're proposing is one
               | people historically haven't even considered. Now, they
               | should.
               | 
               | I'm not sure "litestream replicate <file>" really costs a
               | whole innovation token. It's just SQLite. You should get
               | an innovation rebate for using it. :)
        
             | toolz wrote:
             | I have to imagine having your service highly available
             | (i.e. you need a failover machine) is far more likely to be
             | the reason to need multiple machines than exhausting the
             | resources on some commodity tier machine.
        
           | ok_dad wrote:
           | With Postgres, you might have one server, or one cluster of
           | servers that are coordinated, and then inside there you have
           | tables with users and the users' data with foreign keys tying
           | them together.
           | 
           | With SQLite, you would instead have one database (one file)
           | per user as close to the user as possible that has all of the
           | user's data and you would just read/write to that database.
           | If your application needs to aggregate multiple user's data,
           | then you use something like Litestream to routinely back it
           | up to S3, then when you need to aggregate data you can just
           | access it all there and use a distributed system to do the
           | aggregation on the SQLite database files.
        
             | danappelxx wrote:
             | Hold on, doesn't one-database-per-user totally absolve all
             | ACID guarantees? You can't do cross-database transactions
             | (to my knowledge), which means you can end up with
             | corrupted data during aggregations. What am I missing?
        
               | mwcampbell wrote:
               | One database per tenant only makes sense in multi-tenant
               | applications that don't have any cross-tenant actions. I
               | imagine there are many B2B applications that fall into
               | this category.
        
           | nicoburns wrote:
           | > If you don't need that, you might as well read the entire
           | dataset into memory and be done with it.
           | 
           | Over in-memory data structures,SQLite gives you:
           | 
           | - Persistence
           | 
           | - Crash tolerance
           | 
           | - Extremely powerful declarative querying capabilities
           | 
           | > if you have read-only traffic you don't need sqlite
           | replication.
           | 
           | I agree with you that the main use-case here is backup and
           | data durability for small apps. Which is pretty big deal, as
           | a database server is often the most expensive part of running
           | a small app. That said, there are definitely systems where
           | latency of returning a snapshot of the data is important, but
           | which snapshot isn't (if updates take a while to percolate
           | that's fine).
        
           | mrkurt wrote:
           | Litestream does a couple of things. It started as a way to
           | continuously back sqlite files up to s3. Then Ben added read
           | replicas - you can configure Litestream to replicate from a
           | "primary" litestream server. It's still limited to a single
           | writer, but there's no s3 in play. You get async replication
           | to other VMs: https://github.com/fly-apps/litestream-base
           | 
           | We have a feature for redirecting HTTP requests that perform
           | writes to a single VM. This makes Litestream + replicas
           | workable for most fullstack apps:
           | https://fly.io/blog/globally-distributed-postgres/
           | 
           | It's not a perfect setup, though. You have to take the writer
           | down to do a deploy. The next big Litestream release should
           | solve that, and is part of what's teased in the post.
        
             | throwoutway wrote:
             | > We have a feature for redirecting HTTP requests that
             | perform writes to a single VM. This makes Litestream +
             | replicas workable for most fullstack apps:
             | https://fly.io/blog/globally-distributed-postgres/
             | 
             | Thereby making it a constraint and (without failover) a
             | single point of failover? What's the upper limit here?
        
               | tptacek wrote:
               | This constraint is common to most n-tier architectures
               | (with Postgres or MySQL) as well. Obviously, part of
               | what's interesting about Litestream is that it simplifies
               | fail-over with SQLite.
        
         | a-dub wrote:
         | if you can tolerate eventual consistency and have the disk/ram
         | on the application vms, then sure, keeping the data and the
         | indices close to the code has the added benefit of keeping
         | request latency down.
         | 
         | downside of course is the complexity added in synchronization,
         | which is what they're tackling here.
         | 
         | personally i like the idea of per-tenant databases with
         | something like this to scale out for each tenant. it encourages
         | architectures that are more conducive for e2ee or procedures
         | that allow for better guarantees around customer privacy than
         | big central databases with a customer id column.
        
         | judofyr wrote:
         | > Latency is the exact reason you would have a problem scaling
         | any large system in the first place.
         | 
         | Let's not forget why we started using separate database server
         | in the first now...
         | 
         | A web server does quite a lot of things: Parsing/formatting
         | HTTP/JSON/HTML, restructuring data, calculating stuff. This is
         | typically very separate from the data loading aspect and as you
         | get more requests you'll have to put more CPU in order to keep
         | up (regardless of the language).
         | 
         | By separating the web server from the database server you
         | introduce more latency in favor of enabling scalability. Now
         | you can spin up hundreds of web servers which all talk to a
         | single database server. This is a typical strategy for
         | scalability: _decouple_ the logic and _scale up individually_.
         | 
         | If you couple them together it's more difficult to scale. First
         | of all, in order to spin up a server you need a full version of
         | the database. Good luck autoscaling on-demand! Also, now every
         | write will have to be replicated to _all_ the readers. That 's
         | a lot more bandwidth.
         | 
         | There are _definitely_ use cases for Litestream, but it 's far
         | from a replacement for your typical Node + PostgreSQL stack. I
         | can see it being useful as a lower-level component: You can use
         | Litestream to build your "own" database server with customized
         | logic which you can talk to using an internal protocol (gRPC?)
         | from your web servers.
        
           | nicoburns wrote:
           | > There are definitely use cases for Litestream, but it's far
           | from a replacement for your typical Node + PostgreSQL stack
           | 
           | If you're a language like Node.js then horizontal scaling
           | makes a lot of sense, but I've been working with Rust a lot
           | recently. And Rust is so efficient that you typically end up
           | in a place where a single application server can easily
           | saturate the database. At that point moving them both onto
           | the same box can start to make sense.
           | 
           | This is especially true for a low-traffic apps. I could
           | probably run most of my Rust apps on a VM with 128MB RAM (or
           | even less) and not even a whole CPU core and still get
           | excellent performance. In that context, sticking a SQLite
           | database that backs up to object storage on the same box
           | becomes very attractive from a cost perspective.
        
             | judofyr wrote:
             | This is "vertical scaling" and that is indeed a very valid
             | approach! You just have to be aware that vertical scaling
             | has some fundamental limits and it's going to suck big time
             | if it comes at a surprise to you.
        
               | mwcampbell wrote:
               | Considering that more powerful machines continue to
               | become more affordable, it's a safe bet that most of us
               | will never hit those limits.
        
               | Karrot_Kream wrote:
               | Not sure about that. It would be smarter to just failure
               | test your apps. Once you cross some threshold, you scale.
               | Lots of companies build formulas costing out their cloud
               | spend based on infra needs and failure tests.
        
               | judofyr wrote:
               | Alternatively, instead of just betting on it, you could
               | do a benchmark, figure out the limits of your system and
               | check if your current implementation is capable of
               | handling the future needs.
        
             | [deleted]
        
           | tptacek wrote:
           | I don't think anyone's seriously arguing that the n-tier
           | database architecture is, like, intrinsically bankrupt. Most
           | applications are going to continue to be built with Postgres.
           | We like Postgres; we have a Postgres offering; we're friends
           | with Postgres-providing services; our product uses Postgres.
           | 
           | The point the post is making is that we think people would be
           | surprised how far SQLite can get a typical application.
           | There's a clear win for it in the early phases of an
           | application: managing a database server is operationally (and
           | capitally) expensive, and, importantly, it tends to pin you
           | to a centralized model where it really only makes sense for
           | your application to run in Ashburn --- every request is
           | getting backhauled their anyways.
           | 
           | As the post notes, there's a whole ecosystem of bandaids ---
           | err, tiers --- that mitigate this problem; it's one reason
           | you might sink a lot of engineering work into a horizontally-
           | scaling sharded cache tier, for instance.
           | 
           | The alternative the post proposes is: just use SQLite. Almost
           | all of that complexity melts away, to the point where even
           | your database access code in your app gets simpler (N+1 isn't
           | a game-over problem when each query takes microseconds). Use
           | Litestream and read-only replicas to scale read out
           | horizontally; scale the write leader vertically.
           | 
           | Eventually you'll need to make a decision: scale "out" of
           | SQLite into Postgres (or CockroachDB or whatever), or start
           | investing engineering dollars into making SQLite scale (for
           | instance: by using multiple databases, which is a SQLite
           | feature people sleep on). But the bet this post is making is
           | that the actual value of "eventually" is "surprisingly far
           | into the future", "far enough that it might not make sense to
           | prematurely optimize for it", especially early on when all
           | your resources, cognitively and financially and temporally,
           | are scarce.
           | 
           | We might be very wrong about this! There isn't an interesting
           | blog post (or technical bet) to make about "I'm all in on the
           | n-tier architecture of app servers and database servers".
           | We're just asking people to think about the approach, not
           | saying you're crazy if you don't adopt it.
        
           | ithrow wrote:
           | As they say, "you are not twitter" ;)
           | 
           | Access to monstrous machines is easy today and you have very
           | fast runtimes like Go and the JVM that can leverage this
           | hardware.
        
       | plesiv wrote:
       | I absolutely love this. I think so called n-tier architecture as
       | a pattern should be aggressively battled in the attempt to reduce
       | the n. Software is so much more reliable when the communication
       | between different computational modules of the system are
       | function calls as opposed to IPC calls. Why does everything that
       | computes something or provides some data need to be a process? It
       | doesn't.
       | 
       | Postgresql and every other server/process should have first class
       | support for a single CLI command that: spins up the DB that
       | slurps up the config and the data storage, takes the SQL command
       | provided through the CLI arguments, runs it, returns results and
       | terminates. Effectively, every server/process software should be
       | a library first, since it's easy to make a server out of a
       | library and the reverse is anything but.
        
         | jjeaff wrote:
         | If you want to maintain much of the data in memory, wouldn't
         | that require a process?
        
           | plesiv wrote:
           | Sure. If you need your software to be a process I think you
           | should build it to be both: a library first and a process
           | second. Libraries are so much easier to use, test and reason
           | about.
        
       | beck5 wrote:
       | I have found it easy to overload SQLite with too many write
       | operations (20+ Concurrently), is this typical behaviour referred
       | to in the post, or a write heavy workload?
        
         | Scarbutt wrote:
         | How big are the writes? are you storing blobs?
        
         | benbjohnson wrote:
         | It can depends on a lot of factors such as the journaling mode
         | you're using as well as your hardware. SQLite has a single-
         | writer-at-a-time restriction so it's important manage the size
         | of your writes. I typically see very good write throughput
         | using WAL mode and synchronous=normal on modern SSDs.
        
       | NeutralForest wrote:
       | There's something I don't understand, it says that the "data is
       | next to the application", what does it mean? Where is stored and
       | how is it accessed by the application?
        
         | tptacek wrote:
         | The data lives in a file the application reads/writes directly
         | (and in a cache that the sqlite libraries can park inside the
         | application itself). The point is that you're not calling out
         | over the network to a "database server"; your app server is the
         | database server.
        
         | ledauphin wrote:
         | it means the data is stored in a file on the local drive of a
         | computer that is also running the application.
         | 
         | it also means that it is the application itself (via the SQLite
         | library) that reads and modifies that database file. There is
         | no separate database process.
        
       | anyfactor wrote:
       | Story time!
       | 
       | A client told me that they will use a DigitalOcean droplet for a
       | web app. Because the database was very small I chose to use
       | SQLite3.
       | 
       | After delivery the client said their devops guy wasn't available
       | they would like to deploy to Heroku. Heroku being a ephemeral
       | cloud service couldn't handle the same directory SQLite3 db I had
       | there. The only solution was to use their Postgres database
       | service.
       | 
       | For some reason, it was infuriating that I have to use a database
       | like that to store few thousand rows of data. Moreover, I would
       | have to rewrite a ton of stuff accommodate the change to
       | Postgres.
       | 
       | I ended up using firestore.
       | 
       | ---
       | 
       | I think something like this could have saved me a ton of hassle
       | that day.
        
         | luhn wrote:
         | It was too much work to migrate from SQLite to PostgreSQL, so
         | you migrated to... a NoSQL DB?
        
           | pjot wrote:
           | I think they're referring to the trade from managing one
           | system (DO + SQLite) to two (Heroku + pg) and instead
           | choosing Firestore instead as it's only one system to manage.
        
           | [deleted]
        
           | szundi wrote:
           | He wrote it was a "day" at the end. This guy is fast.
        
         | me_me_mu_mu wrote:
         | Please let me know if you've ever had to move data out of
         | firestore. I'm currently using firestore for some real time
         | requirements but the data is written to Postgres before the
         | relevant data for real time needs (client needs to show some
         | data updating constantly) is written to firestore.
         | 
         | Just curious if you've ever had to migrate data out of
         | firestore.
        
       | ilrwbwrkhv wrote:
       | For how much?
        
         | benbjohnson wrote:
         | Litestream author here. I've been on the fence about disclosing
         | the amount. I'm generally open about everything but I know some
         | people get weird about money stuff. I'm also autistic so I tend
         | to not navigate social norms very well. That all being said,
         | the project was acquired for $500k.
        
           | scottlamb wrote:
           | Thanks for sharing that. I've never really looked at open
           | source projects as acquisition targets. I see in another
           | comment that you're going to continue releasing it under the
           | Apache license. It's easy for me to see why fly.io would want
           | to hire you, with an agreed percentage (anywhere from
           | 0%-100%) of your time continuing to go into Litestream. If
           | you forgive the blunt question, what more do they get for the
           | $500k (acquisition cost / signing bonus)? (Part of me is
           | wondering if an open source project of mine, which various
           | startups have shown some degree of interest in, is holding a
           | significant payday I hadn't realized. Probably not, but it
           | seems more possible than a moment ago.)
        
             | tartakovsky wrote:
             | I would also be interested in understanding whether there
             | is a proper pricing model for such things. Wordle comes to
             | mind. Or a friend that has an IPad app that took 2 years to
             | build that is something novel but not released. Some
             | projects are open-source and some aren't. Some are acquired
             | for users and some are acqui-hired for continued
             | development. Any interesting advice or links here for folks
             | that don't want to be founders but want to make a solid
             | chunk of cash, have an expertise of value and love the
             | development work.
        
               | benbjohnson wrote:
               | There's not any real pricing model that I know of. I
               | think it comes down to a question of what value an
               | acquisition brings and that's always kinda fuzzy. If you
               | want specific numbers, the project was at ~5k GitHub
               | stars at the time of acquisition so I guess it's a
               | hundred bucks per star. :)
        
             | benbjohnson wrote:
             | Good question. I think the folks at Fly realize that they
             | get a lot of benefit from enabling open source projects
             | that work well on their platform. They have a somewhat
             | similar approach with the Phoenix project in that they
             | hired Chris McCord to work on it full-time.
             | 
             | Litestream has a lot of potential in being a lightweight,
             | fast, globally-distributed database and that aligns really
             | well with Fly. Continuing to release it as open source
             | means more folks can benefit from it and give feedback --
             | even if they don't use it on Fly.
        
             | [deleted]
        
       | mtlynch wrote:
       | Super cool! Congrats, Ben!
       | 
       | I've been building all of my projects for the last year with
       | SQLite + fly.io + Litestream. It's already such a great
       | experience, but I'm excited to see what develops now that
       | Litestream is part of fly.
        
       | learndeeply wrote:
       | Since both Fly.io and Litestream founders are here - why not
       | disclose the price?
        
         | benbjohnson wrote:
         | Litestream author here. I just posted it as a reply here:
         | https://news.ycombinator.com/item?id=31319556
        
       | tiffanyh wrote:
       | @dang, the actual title is " I'm All-In on Server-Side SQLite"
       | 
       | Maybe I missed it but where in the article does it say Fly
       | acquired Litestream?
       | 
       | EDIT: Ben Johnson says he just joined Fly. Nothing about Fly
       | "acquiring" Litestream.
       | 
       | https://mobile.twitter.com/benbjohnson/status/15237489883352...
        
         | gamblor956 wrote:
         | "Litestream has a new home at Fly.io, but it is and always will
         | be an open-source project"
         | 
         | Very bottom of the post. Technically, Litestream remains an
         | open-source project, so it's more accurate to say that Fly.io
         | acquired the brand IP and the owner of that IP.
        
         | lnsp wrote:
         | > Litestream has a new home at Fly.io, but it is and always
         | will be an open-source project. My plan for the next several
         | years is to keep making it more useful, no matter where your
         | application runs, and see just how far we can take the SQLite
         | model of how databases can work.
         | 
         | As far as I understood it, Fly.io hired the person working on
         | Litestream and pays them to keep working on Litestream.
        
           | tiffanyh wrote:
           | That's how I understood it and that's radically different
           | than how this HN post got titled.
           | 
           | Ben Johnson confirms how you framed it here:
           | 
           | https://mobile.twitter.com/benbjohnson/status/15237489883352.
           | ..
        
             | tptacek wrote:
             | We wrote a different title for this blog post, and we did
             | in fact buy Litestream (to the extent that anyone can "buy"
             | a FOSS project, of course).
        
         | bussetta wrote:
         | The tweet[1] links the blog post and says Litestream is part of
         | fly.io now.
         | 
         | [1]https://twitter.com/flydotio/status/1523743433109692416
        
       | jrochkind1 wrote:
       | While the title is about a business acquisition, the article is
       | mostly about the technology itself -- replicating SQLite,
       | suggested as a superior option to a more traditional separate-
       | process rdbms, for real large-scale production workloads.
       | 
       | I'd be curious to hear reactions to/experiences with that
       | suggestion/technology, inside or outside the context of fly.io.
        
       | LunaSea wrote:
       | I wonder if we'll ever see an embedded version of PostgreSQL?
        
         | nicoburns wrote:
         | That's basically what SQLite is (notably, SQLite makes an
         | effort to be compatible with Postgres's SQL syntax). If you
         | mean based off the actual PostgreSQL codebase, then I highly
         | doubt it.
        
           | LunaSea wrote:
           | I doubt it as well.
           | 
           | That's sad though because SQLite is really missing a lot of
           | features that PostgreSQL has.
        
             | nicoburns wrote:
             | > That's sad though because SQLite is really missing a lot
             | of features that PostgreSQL has.
             | 
             | It is, but luckily it's not standing still. It's added JSON
             | support and window functions in recent years for example.
        
       | melony wrote:
       | Note that the popular Node.js ORM Prisma does not support WAL.
       | 
       | https://github.com/prisma/prisma/issues/3303
        
         | tylergetsay wrote:
         | It also crashes if you try to write to the DB while its open
         | https://github.com/prisma/prisma/issues/2955
        
       | quintes wrote:
       | What's the use case here, a single web app with inproc db?
       | 
       | More complex use cases?
       | 
       | I remember I could do this on azure at one point in time with app
       | services, not Sure if it's still a thing.. but heavy writes and
       | scaling of those types of apps would lead to to rethink this
       | approach right?
        
       | paulhodge wrote:
       | Wow Litestream sounds really interesting to me. I was just
       | starting on an architecture, that was either stupid or genius, of
       | using many SQLite databases on the server. Each user's account
       | gets their own SQLite file. So the service's horizontal scaling
       | is good (similar to the horizontal scaling of a document DB), and
       | it naturally mitigates data leaks/injections. Also opens up a few
       | neat tricks like the ability to do blue/green rollouts for schema
       | changes. Anyway Litestream seems pretty ideal for that, will be
       | checking it out!
        
         | freedomben wrote:
         | I actually did something very similar to this for an app that
         | produced _a lot_ of data. I wrote a small middleware that
         | automatically figured out which shard to use so the app logic
         | could pretend that it was all just one big db. The app
         | ultimately ended up in the can so it never needed to scale, but
         | I always wonder how it would have gone.
        
         | Scarbutt wrote:
         | _Each user 's account gets their own SQLite file._
         | 
         | So now you need one database connection per user...
        
           | robertlagrant wrote:
           | If by connection you mean in-process database.
        
           | freedomben wrote:
           | Without knowing details about the app, it's hard to know if
           | that would matter. If a small number of concurrent users
           | would ever be using it, I would think it would be NBD.
        
           | tptacek wrote:
           | And? It's SQLite; it's a file handle and some cache, not a
           | connection pool.
        
           | mwcampbell wrote:
           | Depending on how you define "account", that can be quite
           | reasonable. In a B2B application, each business customer
           | could get their own SQLite database, and the number of SQLite
           | connections would likely be quite manageable, even though
           | some customers have many users.
        
         | mwcampbell wrote:
         | An architecture like yours has certainly been done before,
         | though AFAIK it never went mainstream. In particular, check out
         | this post from Glyph Lefkowitz of Twisted Python fame,
         | particularly the section about the (apparently dead) Mantissa
         | application server:
         | 
         | https://glyph.twistedmatrix.com/2008/06/this-word-scaling.ht...
        
         | [deleted]
        
       | ok_dad wrote:
       | I was just about to start using this for a project, I hope the
       | license won't change.
       | 
       | Congrats to the author though, no matter what! I wish everyone
       | could be so successful.
        
         | [deleted]
        
         | benbjohnson wrote:
         | Litestream author here. It'll continue to be open source under
         | an Apache 2 license.
        
       | wasd wrote:
       | Fly is putting together a pretty great team and interesting tech
       | stack. It's the service I see as a true disruptor to Heroku
       | because it's doing something novel (not just cheaper).
       | 
       | I'm still a little murky on the tradeoffs with Fly (and
       | litestream). @ben / @fly, you should write a tutorial on hosting
       | a todo app using rails with litestream and any expected hurdles
       | at different levels of scale (maybe comparing to Heroku).
        
       | pbowyer wrote:
       | Not surprised. Congratulations Ben!
        
       | endisneigh wrote:
       | What's an example of a popular app (more than 100K users) that
       | uses lite stream? Curious to see how this looks like in
       | production
        
         | [deleted]
        
         | jkaplowitz wrote:
         | Tailscale: https://tailscale.com/blog/database-for-2022/
         | 
         | I don't know their user count, but they are growing well and
         | just raised their Series B.
        
         | benbjohnson wrote:
         | Litestream author here. That's a good question. There's not
         | very good visibility into open source usage so it's hard to say
         | unless folks write blog posts about it. For example, I know
         | Tailscale runs part of their infrastructure with SQLite &
         | Litestream[1].
         | 
         | I wrote a database called BoltDB before and I have no idea how
         | widespread it is exactly. It's used in a lot of open source
         | projects like Consul & etcd but I don't know anything about
         | non-public usage.
         | 
         | [1]: https://tailscale.com/blog/database-for-2022/
        
           | gfd wrote:
           | For non-public usages, I remember Boltdb being named as one
           | of the root causes that took down Roblox for three days!
           | https://blog.roblox.com/2022/01/roblox-return-to-
           | service-10-...
        
             | benbjohnson wrote:
             | Yep! That's usually how I find out usage inside companies.
             | :)
        
         | [deleted]
        
       | kall wrote:
       | I am as obsessed with sub 100ms responses as the people at
       | fly.io, so I think the one writer and many, many readers
       | architecture is smart and fits quite a few applications. When
       | litestream adds actual replication it will get really exciting.
       | 
       | > it won't work well on ephemeral, serverless platforms or when
       | using rolling deployments
       | 
       | That's... a lot of new applications these days.
        
         | mwcampbell wrote:
         | > it won't work well on ephemeral, serverless platforms or when
         | using rolling deployments
         | 
         | I assumed that was what Fly was hiring Ben to work on.
        
         | emptysea wrote:
         | Yeah the rolling deployments gotcha really stuck out to me. I
         | think most PaaS will provide that by default anyways because
         | who wants downtime during deploys?
        
           | mwcampbell wrote:
           | mrkurt specifically mentioned that a solution for that is in
           | the works. https://news.ycombinator.com/item?id=31319544
        
       | thdxr wrote:
       | in practice how do you make a single application node the writer?
       | 
       | do you now need your nodes to be clustered + electing a leader
       | and shipping writes there?
       | 
       | know fly.io did this with PG + Elixir but BEAM makes this type of
       | stuff pretty easy
        
       ___________________________________________________________________
       (page generated 2022-05-09 23:00 UTC)