[HN Gopher] Fly.io buys Litestream ___________________________________________________________________ Fly.io buys Litestream Author : dpeck Score : 383 points Date : 2022-05-09 19:35 UTC (3 hours ago) (HTM) web link (fly.io) (TXT) w3m dump (fly.io) | swaraj wrote: | Looks v cool, but I feel like I'm missing a big part of the | story, how do 2 app 'servers/process' connect to same | sqlite/litestream db? | | Do you 'init' (restore) the db from each app process? When one | app makes a write, is it instantly reflected on the other app's | local sqlite? | judofyr wrote: | Each server would have one copy of the SQLite database. Only | one of the server would support writes -- and those write will | be replicated to the other server. Reads in the other server | will be transactionally safe, but might be slightly out of | date. | swaraj wrote: | This is my main q: are the writes replicated in real-time? Do | the apps that just need read access have to repeatedly call | 'restore'? | tptacek wrote: | https://litestream.io/getting-started/#continuous- | replicatio... | thruflo wrote: | Also how does the WAL page based replication maintain | consistency / handle concurrent updates? | infogulch wrote: | It doesn't, this gives you a read-only replica only. | johnrrk wrote: | I also investigated SQLite and it's not clear how we can use it | with multiple servers. | | The WAL documentation [1] says "The wal-index greatly improves | the performance of readers, but the use of shared memory means | that all readers must exist on the same machine. This is why | the write-ahead log implementation will not work on a network | filesystem." | | So it seems that we can't have 2 Node.js servers accessing the | same SQLite file on a shared volume. | | I'm not sure how to do zero downtime deployment (like starting | server 2, checking it works, and shutting down server 1, seems | risky since we'll have 2 servers accessing the same SQLite file | temporarily) | | [1] https://sqlite.org/wal.html | tptacek wrote: | The point of Litestream is that you don't have multiple | servers accessing the same SQLite file. They all have their | own SQLite databases. Of course, you only write to one of | them, but that's a common constraint for database clusters. | rwho wrote: | mwcampbell wrote: | Congratulations to Ben on getting a well-funded player like Fly | to buy into this vision. I'm looking forward to seeing a | complete, ready-to-deploy sample app, when the upcoming | Litestream enhancements are ready. | | I know that Fly also likes Elixir and Phoenix; they hired Chris | McCord, after all. So would it make sense for Phoenix | applications deployed in production on Fly to use SQLite and | Litestream? Is support for SQLite in the Elixir ecosystem, | particularly Ecto, good enough for this? | warmwaffles wrote: | > Is support for SQLite in the Elixir ecosystem, particularly | Ecto, good enough for this? | | Why yes it is. I maintain the `exqlite` and `ecto_sqlite3` | libraries and it was just integrated in with `kino_db` which is | used by `livebook`. | | https://github.com/elixir-sqlite/exqlite | swlkr wrote: | The reduction in complexity from using sqlite + litestream as a | server side database is great to see! | netcraft wrote: | This is similar to what I hoped websql had eventually grown into. | sqlite in the browser, but let me sync it up and down with a | server. Every user gets their own database, the first time to the | app they "install" the control and system data, then their data, | then writes are synced to the server. If it became standard, it | could be super easy - conflict resolution notwithstanding. | bambax wrote: | You can make webapps using exactly this approach, with json in | localstorage as the client db, and occasiona, asynchronous, | writes to the server. I'm now building a simple webapp exactly | like this, and the server db is sqlite. So far it works | perfectly fine. | tyingq wrote: | Dqlite is also interesting, and in a similar space. It seems to | have evolved from the LXC/LXD team wanting a replacement for | Etcd. It's Sqlite with raft replication and also a networked | client protocol. | | https://dqlite.io/docs/architecture | tptacek wrote: | There's also rqlite. There's definitely a place for this kind | of stuff. But we already use a bunch of stuff that does | distributed consensus in our stack, and the experience has left | us wary of it, especially for global distribution. We almost | used rqlite for a statekeeping feature internally, but today | we'd certainly just use sqlite+litestream for the same kinds of | features, just because it's easier to reason about and to deal | with operationally when there's problems. | | https://fly.io/blog/a-foolish-consistency/ | otoolep wrote: | rqlite author here. Anything else you can tell me about why | you decided against it? Just simpler, as you say, to avoid a | distributed system when you can (something I understand). | tptacek wrote: | We like rqlite a lot. There's some comments in your issue | tracker from Jerome about it at the time. The decision | wasn't against rqlite as a piece of software so much as it | was us deliberately deciding not to introduce more Raft | into our architecture; any place there is Raft, we're | concerned we'll essentially need to train our whole on-call | rotation on how to handle issues. | | The annoying thing about global consensus is that the | operational problems tend to be global as well; we had an | outage last night (correlated disk failure on 3 different | machines!) in Chicago, and it slowed down deploys all the | way to Sydney, essentially because of invariants maintained | by a global Raft consensus and fed in part from | malfunctioning machines. | | I think rqlite would make a lot of sense for us for | applications where we run multiple regional clusters; it's | just that our problems today tend to be global. We're not | just looking for opportunities to rip Raft out of our | stack; we're also trying to build APIs that regionalize | nicely. In nicely-regionalized, contained settings, rqlite | might work a treat for us. | RcouF1uZ4gsC wrote: | I love Litestream! It is so simple and it just works! | | Congratulations, Ben, on making a great product and on the sale! | | One thing I have had in the back of my mind, but have not had the | time to pursue is using SQLite replication to make something | similar to CloudFlare's durable objects but more open. | | A "durable object" would be an SQLite database and some program | that processes requests and accesses the SQLite database. There | would be a runtime that transparently replicates the (database, | program) pair where they are needed and routes to them. | | That way, I can just start out locally developing my program with | an SQLite database, and then run a command and have it available | globally. At the same time, since it is just accessing an SQLite | database, there would be much less risk of lockin. | krts- wrote: | A great project with awesome implications. Well deserved, and the | fly.io team are very pragmatic. | | This will be even more _brilliant_ than it already is when fly.io | can get some slick sidecar /multi-process stuff. | | I ended up back with Postgres after my misconfigs left me a bit | burned with S3 costs and data stuff. But I think a master VM | backed by persistent storage on fly with read replicas as | required is maybe the next step: I love the simplicity of SQLite. | foodstances wrote: | Just curious, is there any financial compensation/support going | to Richard Hipp with all of this money changing hands? | | When I see these startups making a business that is so heavily | based on open-source software (like Tailscale on top of | Wireguard), I have to wonder what these companies do to actually | support the author(s) of the software that so much of their | company is based on. | mrkurt wrote: | Yes. We (Fly.io) are buying a sqlite support agreement. We also | send money WireGuard's way. I'm pretty sure Tailscale does too. | | We have also given OSS authors advisor equity. A couple of | folks wrote libraries that were important to keeping us going, | and we've granted them shares the same way some startups would | to MBA advisors. | foodstances wrote: | That's great to hear, thank you! | qbasic_forever wrote: | I agree Richard Hipp should be compensated but he explicitly | licensed and releases SQLite under a public domain license: | https://www.sqlite.org/copyright.html Not Apache, not MIT, not | GPL... public domain. You can do almost anything with it and | not be beholden to any demands. You can tell people you built | your business on SQLite... or not. It's public domain. | | That said SQLite has a business model of selling support and | premium features like encryption: | https://www.sqlite.org/prosupport.html | foodstances wrote: | Sure, but Apache, MIT, and GPL licenses don't require payment | to the author either. That's why it's up to the company to | decide to offer compensation without being required to, and | why I'm curious which companies actually do it. | | It's like when RedHat when public and offered pre-IPO stock | to open source developers. | otoolep wrote: | Congratulations to Ben! This project has been like a rocket ship. | benbjohnson wrote: | Thanks, Philip! | no_wizard wrote: | This a great and interesting offering! I think this fits well | with fly.io and their model of computing. | | I now wish that I had engaged with this idea that was very | similar to litestream that I had about a year and half ago. I | always thought SQLite just needed a distribution layer to be | extremely effective as a distributed database of sorts. Its flat | file architecture means its easy to provision, restore and | backup. SQLite also has incremental snapshotting and re- | producible WAL logs that can be used to do incremental backups, | restores, writes etc. It just needs a "frontend" to handle those | bits. Latency has gotten to the point where you can replicate a | database by its continued snapshots (which is, on a high level, | what litestream appears to be doing) being propagated out to | object / blob storage. You could even achieve brute force | consensus with this approach if you ran it in a truly distributed | way (though RAFT is probably more efficient). | | Reason I didn't do this? I thought to myself - why in the world | in 2020 would someone choose to use SQLite at scale instead of | something like Firebase, Spanner, Fauna, or even Postgres? So | after I did an initial prototype (long gone, never pushed it to | GitHub) I just felt like...there was no appetite for it. | | Now I regret! | | Just a long winded way of saying, congrats! This is awesome! | Thanks for doing exactly what I wanted to do but didn't have the | guts to follow through with. | epilys wrote: | I implemented exactly this setup, in Rust, last year for a | client. Distributed WAL with write locks on a RAFT scheme. | Custom VFS in Rust for sqlite3 to handle the IO. I asked the | client to opensource it but it's probably not gonna happen... | It's definitely doable though. | ComputerGuru wrote: | Did you write your own rust raft implementation or reuse | something already available? | epilys wrote: | Reused a well known library that uses raft. I don't know if | I should mention any more details since it was a private | project. | Serow225 wrote: | there's some stuff out there: | | - https://github.com/rqlite/rqlite - | https://github.com/chiselstrike/chiselstore - | https://dqlite.io/ | | I'm sure there's more, those are just the ones I remember. | mrcwinn wrote: | I have really enjoyed using Fly. Great service and support. | scwoodal wrote: | > According to the conventional wisdom, SQLite has a place in | this architecture: as a place to run unit tests. | | Be careful with this approach. Frameworks like Django have DB | engine specific features[1]. When you start using them in your | application you can no longer use a different DB (SQLite) to run | your unit tests. | | [1] | https://docs.djangoproject.com/en/4.0/ref/contrib/postgres/f... | seanwilson wrote: | SQLite uses dynamic types? Is this an issue in practice, | especially for large apps? Don't you lose guarantees about your | data which makes it messy to handle on the backend? | | Context from https://www.sqlite.org/datatype3.html: "SQLite uses | a more general dynamic type system. In SQLite, the datatype of a | value is associated with the value itself, not with its | container. The dynamic type system of SQLite is backwards | compatible with the more common static type systems of other | database engines in the sense that SQL statements that work on | statically typed databases work the same way in SQLite. However, | the dynamic typing in SQLite allows it to do things which are not | possible in traditional rigidly typed databases. Flexible typing | is a feature of SQLite, not a bug." | aliswe wrote: | This sounds like schemalessness to me? serious "question". | jamie_ca wrote: | Not schemaless, but typeless. SQLite will let you declare a | column to be an integer and then dump a string into it, but | you're still defining a table with specific columns. | | It's like the opposite problem Mysql has when you try to | write data larger than the field definition - Mysql will | truncate, Sqlite will store the data you gave it. | seanwilson wrote: | Typeless is the default though? Why wouldn't you want the | types to be reliable when you're reading/writing from the | backend in the general case? | ripley12 wrote: | You can use SQLite in strict mode if you prefer. | https://www.sqlite.org/stricttables.html | rco8786 wrote: | All of the action around SQLite recently is very exciting! | bob1029 wrote: | > SQLite isn't just on the same machine as your application, but | actually built into your application process. When you put your | data right next to your application, you can see per-query | latency drop to 10-20 microseconds. That's micro, with a m. A | 50-100x improvement over an intra-region Postgres query. | | This is the #1 reason my exuberant technical mind likes that we | use SQLite for all the things. Latency is the exact reason you | would have a problem scaling any large system in the first place. | Forcing it all into one cache-coherent domain is a really good | way to begin eliminating entire universes of bugs. | | Do we all appreciate just how much more throughput you can get in | the case described above? A 100x latency improvement doesn't | translate _directly_ into the same # of transactions per second, | but its pretty damn close if your I /O subsystem is up to the | task. | throwaway894345 wrote: | If you're pushing the database up into the application layer, | do you have to route all write operations through a single | "master" application instance? If not, is there some multi- | master scheme, and if so, is it cheaper to propagate state all | the time than it is to have the application write to a master | database instance over a network? Moreover, how does it affect | the operations of your application? Are you still as | comfortable bouncing an application instance as you would | otherwise be? | closeparen wrote: | This is a large part of what Rich Hickey emphasizes about | Datomic, too. We're so used to the database being "over there" | but it's actually very nice to have it locally. Datomic solves | this in the context of a distributed database by having the | read-only replicas local to client applications while the | transaction-running parts are remote. | abraxas wrote: | Only trouble with that particular implementation is that the | Datomic Transactor is a single threaded single process that | serializes every transaction going through it. As long as you | don't need to scale writes it works like a charm. However, | the workloads I somehow always end up working with are write | heavy or at best 50/50 between read and write. | vmception wrote: | > SQLite isn't just on the same machine as your application, | but actually built into your application process. | | How is that different than whats commonly happening? Android | and iOS do this... right? ... but its still accessing the | filesystem to use it. | | Am I missing something or is what they are describing just | completely commonplace that is only interesting to people that | use microservices and never knew what was normal. | tlb wrote: | It's normal (and HN does something similar, working from in- | process data) for systems that don't have to scale beyond one | server. If you need multiple servers you have to do | something, such as Litestream. | mrkurt wrote: | This is how client apps use sqlite, yes. Single instance | client apps. Litestream is one method of making sqlite work | for server side apps. The hard part on the server is solving | for multiple processes/vms/containers writing to one sqlite | db. | nicoburns wrote: | > the hard part on the server is solving for multiple | processes/vms/containers writing to one sqlite db. | | I feel like if you have multiple apps writing to the | database then you shouldn't be using SQLite. That's where | Postgres etc completely earn their place in the stack. | Where litestream is really valuable is when you have a | single writer, but you want point-in-time backups like you | can get with postgres. | vmception wrote: | interesting, such a weird way to describe it then. but I | guess some people are more familiar with that problem. | funstuff007 wrote: | This is exactly the reason I am so skeptical of the cloud. I | don't care how easy it is to stand up VMs, containers, k8s, | etc. What I need to know is how hard is it to lug my data to my | application and vice a versa. My feelings on this are so strong | as I work mostly on database read-heavy applications. | WJW wrote: | How do any writes end up on other horizontally scaled machines | though? To me the whole point of a database on another machine | is that it is the single point of truth that many horizontally | scaled servers can write to and read each others' updates from. | If you don't need that, you might as well read the entire | dataset into memory and be done with it. | | I know TFA says that you can "soon" automagically replicate | your sqlite db to another server, but it only allows writes on | a single server and all other will be readers. Now you need to | think about how to move all write traffic to a single app | server. All writes to that server will still take several | milliseconds (possibly more, since S3 is eventually consistent) | to propagate around all replicas. | | In short, 100x latency improvement for reads is great but a bit | of a red herring since if you have read-only traffic you don't | need sqlite replication. If you do have write traffic, then | routing it through S3 will definitely not give you a 100x | latency improvement over Postgres or MySQL anymore. Litestream | is definitely on my radar, but as a continuous backup system | for small apps ("small" meaning it runs and will always run on | a single box) rather than a wholesale replacement of | traditional client-server databases. | | PS: Congrats Ben! | jolux wrote: | S3 is strongly consistent now: | https://aws.amazon.com/s3/consistency/ | bob1029 wrote: | What if, due to ridiculous latency reductions, your business | no longer requires more than 1 machine to function at scale? | | I'm talking more about sqlite itself than any given product | around it at this point, but I still think it's an | interesting thought experiment in this context. | WJW wrote: | I'll point out that the ridiculous latency reductions don't | apply to replicating the writes to S3 and/or any replica | servers, that still takes as long as it would to any other | server across a network. The latency reductions are _only_ | for pure read traffic. Also, every company I ever worked at | had a policy to run at least two instances of a service in | case of hardware failure. (Is this reasonable to | extrapolate this policy to a company which might want to | run on a single sqlite instance? I don 't know, but just as | a datapoint I don't think any business should strive to run | on a single instance) | | This write latency _might_ be fine, although more than one | backend app I know renewed the expiry time of a user | session on every hit and would thus do at least one DB | write per HTTP call. I don 't think this is optimal, but it | does happen and simply going "well don't do write traffic | then" does not always line up with how apps are actually | built. Replicated sqlite over litestream is very cool, but | definitely you need to build your app around and also | definitely something that costs you one of your innovation | tokens. | tptacek wrote: | There's no magic here (that there is no magic is part of | the point). You have the same phenomenon in n-tier | Postgres deployments: to be highly available, you need | multiple instances; you're going to have a write leader, | because you're not realistically want to run a Raft | consensus for every write; etc. | | The point of the post is just that if you can get rid of | most of the big operational problems with using server- | side SQLite in a distributed application --- most | notably, failing over and snapshotting --- then SQLite | can occupy a much more interesting role in your stack | than it's conventionally been assigned. SQLite has some | very attractive properties that have been largely ignored | because people assume they won't be able to scale it out | and manage it. Well, you can scale it out and manage it. | Now you've got an extremely simple database layer that's | easy to reason about, doesn't require you to run a | database server (or even a cache server) next to all your | app instances, and happens to be extraordinarily fast. | | Maybe it doesn't make sense for your app? There are | probably lots of apps that really want Postgres and not | SQLite. But the architecture we're proposing is one | people historically haven't even considered. Now, they | should. | | I'm not sure "litestream replicate <file>" really costs a | whole innovation token. It's just SQLite. You should get | an innovation rebate for using it. :) | toolz wrote: | I have to imagine having your service highly available | (i.e. you need a failover machine) is far more likely to be | the reason to need multiple machines than exhausting the | resources on some commodity tier machine. | ok_dad wrote: | With Postgres, you might have one server, or one cluster of | servers that are coordinated, and then inside there you have | tables with users and the users' data with foreign keys tying | them together. | | With SQLite, you would instead have one database (one file) | per user as close to the user as possible that has all of the | user's data and you would just read/write to that database. | If your application needs to aggregate multiple user's data, | then you use something like Litestream to routinely back it | up to S3, then when you need to aggregate data you can just | access it all there and use a distributed system to do the | aggregation on the SQLite database files. | danappelxx wrote: | Hold on, doesn't one-database-per-user totally absolve all | ACID guarantees? You can't do cross-database transactions | (to my knowledge), which means you can end up with | corrupted data during aggregations. What am I missing? | mwcampbell wrote: | One database per tenant only makes sense in multi-tenant | applications that don't have any cross-tenant actions. I | imagine there are many B2B applications that fall into | this category. | nicoburns wrote: | > If you don't need that, you might as well read the entire | dataset into memory and be done with it. | | Over in-memory data structures,SQLite gives you: | | - Persistence | | - Crash tolerance | | - Extremely powerful declarative querying capabilities | | > if you have read-only traffic you don't need sqlite | replication. | | I agree with you that the main use-case here is backup and | data durability for small apps. Which is pretty big deal, as | a database server is often the most expensive part of running | a small app. That said, there are definitely systems where | latency of returning a snapshot of the data is important, but | which snapshot isn't (if updates take a while to percolate | that's fine). | mrkurt wrote: | Litestream does a couple of things. It started as a way to | continuously back sqlite files up to s3. Then Ben added read | replicas - you can configure Litestream to replicate from a | "primary" litestream server. It's still limited to a single | writer, but there's no s3 in play. You get async replication | to other VMs: https://github.com/fly-apps/litestream-base | | We have a feature for redirecting HTTP requests that perform | writes to a single VM. This makes Litestream + replicas | workable for most fullstack apps: | https://fly.io/blog/globally-distributed-postgres/ | | It's not a perfect setup, though. You have to take the writer | down to do a deploy. The next big Litestream release should | solve that, and is part of what's teased in the post. | throwoutway wrote: | > We have a feature for redirecting HTTP requests that | perform writes to a single VM. This makes Litestream + | replicas workable for most fullstack apps: | https://fly.io/blog/globally-distributed-postgres/ | | Thereby making it a constraint and (without failover) a | single point of failover? What's the upper limit here? | tptacek wrote: | This constraint is common to most n-tier architectures | (with Postgres or MySQL) as well. Obviously, part of | what's interesting about Litestream is that it simplifies | fail-over with SQLite. | a-dub wrote: | if you can tolerate eventual consistency and have the disk/ram | on the application vms, then sure, keeping the data and the | indices close to the code has the added benefit of keeping | request latency down. | | downside of course is the complexity added in synchronization, | which is what they're tackling here. | | personally i like the idea of per-tenant databases with | something like this to scale out for each tenant. it encourages | architectures that are more conducive for e2ee or procedures | that allow for better guarantees around customer privacy than | big central databases with a customer id column. | judofyr wrote: | > Latency is the exact reason you would have a problem scaling | any large system in the first place. | | Let's not forget why we started using separate database server | in the first now... | | A web server does quite a lot of things: Parsing/formatting | HTTP/JSON/HTML, restructuring data, calculating stuff. This is | typically very separate from the data loading aspect and as you | get more requests you'll have to put more CPU in order to keep | up (regardless of the language). | | By separating the web server from the database server you | introduce more latency in favor of enabling scalability. Now | you can spin up hundreds of web servers which all talk to a | single database server. This is a typical strategy for | scalability: _decouple_ the logic and _scale up individually_. | | If you couple them together it's more difficult to scale. First | of all, in order to spin up a server you need a full version of | the database. Good luck autoscaling on-demand! Also, now every | write will have to be replicated to _all_ the readers. That 's | a lot more bandwidth. | | There are _definitely_ use cases for Litestream, but it 's far | from a replacement for your typical Node + PostgreSQL stack. I | can see it being useful as a lower-level component: You can use | Litestream to build your "own" database server with customized | logic which you can talk to using an internal protocol (gRPC?) | from your web servers. | nicoburns wrote: | > There are definitely use cases for Litestream, but it's far | from a replacement for your typical Node + PostgreSQL stack | | If you're a language like Node.js then horizontal scaling | makes a lot of sense, but I've been working with Rust a lot | recently. And Rust is so efficient that you typically end up | in a place where a single application server can easily | saturate the database. At that point moving them both onto | the same box can start to make sense. | | This is especially true for a low-traffic apps. I could | probably run most of my Rust apps on a VM with 128MB RAM (or | even less) and not even a whole CPU core and still get | excellent performance. In that context, sticking a SQLite | database that backs up to object storage on the same box | becomes very attractive from a cost perspective. | judofyr wrote: | This is "vertical scaling" and that is indeed a very valid | approach! You just have to be aware that vertical scaling | has some fundamental limits and it's going to suck big time | if it comes at a surprise to you. | mwcampbell wrote: | Considering that more powerful machines continue to | become more affordable, it's a safe bet that most of us | will never hit those limits. | Karrot_Kream wrote: | Not sure about that. It would be smarter to just failure | test your apps. Once you cross some threshold, you scale. | Lots of companies build formulas costing out their cloud | spend based on infra needs and failure tests. | judofyr wrote: | Alternatively, instead of just betting on it, you could | do a benchmark, figure out the limits of your system and | check if your current implementation is capable of | handling the future needs. | [deleted] | tptacek wrote: | I don't think anyone's seriously arguing that the n-tier | database architecture is, like, intrinsically bankrupt. Most | applications are going to continue to be built with Postgres. | We like Postgres; we have a Postgres offering; we're friends | with Postgres-providing services; our product uses Postgres. | | The point the post is making is that we think people would be | surprised how far SQLite can get a typical application. | There's a clear win for it in the early phases of an | application: managing a database server is operationally (and | capitally) expensive, and, importantly, it tends to pin you | to a centralized model where it really only makes sense for | your application to run in Ashburn --- every request is | getting backhauled their anyways. | | As the post notes, there's a whole ecosystem of bandaids --- | err, tiers --- that mitigate this problem; it's one reason | you might sink a lot of engineering work into a horizontally- | scaling sharded cache tier, for instance. | | The alternative the post proposes is: just use SQLite. Almost | all of that complexity melts away, to the point where even | your database access code in your app gets simpler (N+1 isn't | a game-over problem when each query takes microseconds). Use | Litestream and read-only replicas to scale read out | horizontally; scale the write leader vertically. | | Eventually you'll need to make a decision: scale "out" of | SQLite into Postgres (or CockroachDB or whatever), or start | investing engineering dollars into making SQLite scale (for | instance: by using multiple databases, which is a SQLite | feature people sleep on). But the bet this post is making is | that the actual value of "eventually" is "surprisingly far | into the future", "far enough that it might not make sense to | prematurely optimize for it", especially early on when all | your resources, cognitively and financially and temporally, | are scarce. | | We might be very wrong about this! There isn't an interesting | blog post (or technical bet) to make about "I'm all in on the | n-tier architecture of app servers and database servers". | We're just asking people to think about the approach, not | saying you're crazy if you don't adopt it. | ithrow wrote: | As they say, "you are not twitter" ;) | | Access to monstrous machines is easy today and you have very | fast runtimes like Go and the JVM that can leverage this | hardware. | plesiv wrote: | I absolutely love this. I think so called n-tier architecture as | a pattern should be aggressively battled in the attempt to reduce | the n. Software is so much more reliable when the communication | between different computational modules of the system are | function calls as opposed to IPC calls. Why does everything that | computes something or provides some data need to be a process? It | doesn't. | | Postgresql and every other server/process should have first class | support for a single CLI command that: spins up the DB that | slurps up the config and the data storage, takes the SQL command | provided through the CLI arguments, runs it, returns results and | terminates. Effectively, every server/process software should be | a library first, since it's easy to make a server out of a | library and the reverse is anything but. | jjeaff wrote: | If you want to maintain much of the data in memory, wouldn't | that require a process? | plesiv wrote: | Sure. If you need your software to be a process I think you | should build it to be both: a library first and a process | second. Libraries are so much easier to use, test and reason | about. | beck5 wrote: | I have found it easy to overload SQLite with too many write | operations (20+ Concurrently), is this typical behaviour referred | to in the post, or a write heavy workload? | Scarbutt wrote: | How big are the writes? are you storing blobs? | benbjohnson wrote: | It can depends on a lot of factors such as the journaling mode | you're using as well as your hardware. SQLite has a single- | writer-at-a-time restriction so it's important manage the size | of your writes. I typically see very good write throughput | using WAL mode and synchronous=normal on modern SSDs. | NeutralForest wrote: | There's something I don't understand, it says that the "data is | next to the application", what does it mean? Where is stored and | how is it accessed by the application? | tptacek wrote: | The data lives in a file the application reads/writes directly | (and in a cache that the sqlite libraries can park inside the | application itself). The point is that you're not calling out | over the network to a "database server"; your app server is the | database server. | ledauphin wrote: | it means the data is stored in a file on the local drive of a | computer that is also running the application. | | it also means that it is the application itself (via the SQLite | library) that reads and modifies that database file. There is | no separate database process. | anyfactor wrote: | Story time! | | A client told me that they will use a DigitalOcean droplet for a | web app. Because the database was very small I chose to use | SQLite3. | | After delivery the client said their devops guy wasn't available | they would like to deploy to Heroku. Heroku being a ephemeral | cloud service couldn't handle the same directory SQLite3 db I had | there. The only solution was to use their Postgres database | service. | | For some reason, it was infuriating that I have to use a database | like that to store few thousand rows of data. Moreover, I would | have to rewrite a ton of stuff accommodate the change to | Postgres. | | I ended up using firestore. | | --- | | I think something like this could have saved me a ton of hassle | that day. | luhn wrote: | It was too much work to migrate from SQLite to PostgreSQL, so | you migrated to... a NoSQL DB? | pjot wrote: | I think they're referring to the trade from managing one | system (DO + SQLite) to two (Heroku + pg) and instead | choosing Firestore instead as it's only one system to manage. | [deleted] | szundi wrote: | He wrote it was a "day" at the end. This guy is fast. | me_me_mu_mu wrote: | Please let me know if you've ever had to move data out of | firestore. I'm currently using firestore for some real time | requirements but the data is written to Postgres before the | relevant data for real time needs (client needs to show some | data updating constantly) is written to firestore. | | Just curious if you've ever had to migrate data out of | firestore. | ilrwbwrkhv wrote: | For how much? | benbjohnson wrote: | Litestream author here. I've been on the fence about disclosing | the amount. I'm generally open about everything but I know some | people get weird about money stuff. I'm also autistic so I tend | to not navigate social norms very well. That all being said, | the project was acquired for $500k. | scottlamb wrote: | Thanks for sharing that. I've never really looked at open | source projects as acquisition targets. I see in another | comment that you're going to continue releasing it under the | Apache license. It's easy for me to see why fly.io would want | to hire you, with an agreed percentage (anywhere from | 0%-100%) of your time continuing to go into Litestream. If | you forgive the blunt question, what more do they get for the | $500k (acquisition cost / signing bonus)? (Part of me is | wondering if an open source project of mine, which various | startups have shown some degree of interest in, is holding a | significant payday I hadn't realized. Probably not, but it | seems more possible than a moment ago.) | tartakovsky wrote: | I would also be interested in understanding whether there | is a proper pricing model for such things. Wordle comes to | mind. Or a friend that has an IPad app that took 2 years to | build that is something novel but not released. Some | projects are open-source and some aren't. Some are acquired | for users and some are acqui-hired for continued | development. Any interesting advice or links here for folks | that don't want to be founders but want to make a solid | chunk of cash, have an expertise of value and love the | development work. | benbjohnson wrote: | There's not any real pricing model that I know of. I | think it comes down to a question of what value an | acquisition brings and that's always kinda fuzzy. If you | want specific numbers, the project was at ~5k GitHub | stars at the time of acquisition so I guess it's a | hundred bucks per star. :) | benbjohnson wrote: | Good question. I think the folks at Fly realize that they | get a lot of benefit from enabling open source projects | that work well on their platform. They have a somewhat | similar approach with the Phoenix project in that they | hired Chris McCord to work on it full-time. | | Litestream has a lot of potential in being a lightweight, | fast, globally-distributed database and that aligns really | well with Fly. Continuing to release it as open source | means more folks can benefit from it and give feedback -- | even if they don't use it on Fly. | [deleted] | mtlynch wrote: | Super cool! Congrats, Ben! | | I've been building all of my projects for the last year with | SQLite + fly.io + Litestream. It's already such a great | experience, but I'm excited to see what develops now that | Litestream is part of fly. | learndeeply wrote: | Since both Fly.io and Litestream founders are here - why not | disclose the price? | benbjohnson wrote: | Litestream author here. I just posted it as a reply here: | https://news.ycombinator.com/item?id=31319556 | tiffanyh wrote: | @dang, the actual title is " I'm All-In on Server-Side SQLite" | | Maybe I missed it but where in the article does it say Fly | acquired Litestream? | | EDIT: Ben Johnson says he just joined Fly. Nothing about Fly | "acquiring" Litestream. | | https://mobile.twitter.com/benbjohnson/status/15237489883352... | gamblor956 wrote: | "Litestream has a new home at Fly.io, but it is and always will | be an open-source project" | | Very bottom of the post. Technically, Litestream remains an | open-source project, so it's more accurate to say that Fly.io | acquired the brand IP and the owner of that IP. | lnsp wrote: | > Litestream has a new home at Fly.io, but it is and always | will be an open-source project. My plan for the next several | years is to keep making it more useful, no matter where your | application runs, and see just how far we can take the SQLite | model of how databases can work. | | As far as I understood it, Fly.io hired the person working on | Litestream and pays them to keep working on Litestream. | tiffanyh wrote: | That's how I understood it and that's radically different | than how this HN post got titled. | | Ben Johnson confirms how you framed it here: | | https://mobile.twitter.com/benbjohnson/status/15237489883352. | .. | tptacek wrote: | We wrote a different title for this blog post, and we did | in fact buy Litestream (to the extent that anyone can "buy" | a FOSS project, of course). | bussetta wrote: | The tweet[1] links the blog post and says Litestream is part of | fly.io now. | | [1]https://twitter.com/flydotio/status/1523743433109692416 | jrochkind1 wrote: | While the title is about a business acquisition, the article is | mostly about the technology itself -- replicating SQLite, | suggested as a superior option to a more traditional separate- | process rdbms, for real large-scale production workloads. | | I'd be curious to hear reactions to/experiences with that | suggestion/technology, inside or outside the context of fly.io. | LunaSea wrote: | I wonder if we'll ever see an embedded version of PostgreSQL? | nicoburns wrote: | That's basically what SQLite is (notably, SQLite makes an | effort to be compatible with Postgres's SQL syntax). If you | mean based off the actual PostgreSQL codebase, then I highly | doubt it. | LunaSea wrote: | I doubt it as well. | | That's sad though because SQLite is really missing a lot of | features that PostgreSQL has. | nicoburns wrote: | > That's sad though because SQLite is really missing a lot | of features that PostgreSQL has. | | It is, but luckily it's not standing still. It's added JSON | support and window functions in recent years for example. | melony wrote: | Note that the popular Node.js ORM Prisma does not support WAL. | | https://github.com/prisma/prisma/issues/3303 | tylergetsay wrote: | It also crashes if you try to write to the DB while its open | https://github.com/prisma/prisma/issues/2955 | quintes wrote: | What's the use case here, a single web app with inproc db? | | More complex use cases? | | I remember I could do this on azure at one point in time with app | services, not Sure if it's still a thing.. but heavy writes and | scaling of those types of apps would lead to to rethink this | approach right? | paulhodge wrote: | Wow Litestream sounds really interesting to me. I was just | starting on an architecture, that was either stupid or genius, of | using many SQLite databases on the server. Each user's account | gets their own SQLite file. So the service's horizontal scaling | is good (similar to the horizontal scaling of a document DB), and | it naturally mitigates data leaks/injections. Also opens up a few | neat tricks like the ability to do blue/green rollouts for schema | changes. Anyway Litestream seems pretty ideal for that, will be | checking it out! | freedomben wrote: | I actually did something very similar to this for an app that | produced _a lot_ of data. I wrote a small middleware that | automatically figured out which shard to use so the app logic | could pretend that it was all just one big db. The app | ultimately ended up in the can so it never needed to scale, but | I always wonder how it would have gone. | Scarbutt wrote: | _Each user 's account gets their own SQLite file._ | | So now you need one database connection per user... | robertlagrant wrote: | If by connection you mean in-process database. | freedomben wrote: | Without knowing details about the app, it's hard to know if | that would matter. If a small number of concurrent users | would ever be using it, I would think it would be NBD. | tptacek wrote: | And? It's SQLite; it's a file handle and some cache, not a | connection pool. | mwcampbell wrote: | Depending on how you define "account", that can be quite | reasonable. In a B2B application, each business customer | could get their own SQLite database, and the number of SQLite | connections would likely be quite manageable, even though | some customers have many users. | mwcampbell wrote: | An architecture like yours has certainly been done before, | though AFAIK it never went mainstream. In particular, check out | this post from Glyph Lefkowitz of Twisted Python fame, | particularly the section about the (apparently dead) Mantissa | application server: | | https://glyph.twistedmatrix.com/2008/06/this-word-scaling.ht... | [deleted] | ok_dad wrote: | I was just about to start using this for a project, I hope the | license won't change. | | Congrats to the author though, no matter what! I wish everyone | could be so successful. | [deleted] | benbjohnson wrote: | Litestream author here. It'll continue to be open source under | an Apache 2 license. | wasd wrote: | Fly is putting together a pretty great team and interesting tech | stack. It's the service I see as a true disruptor to Heroku | because it's doing something novel (not just cheaper). | | I'm still a little murky on the tradeoffs with Fly (and | litestream). @ben / @fly, you should write a tutorial on hosting | a todo app using rails with litestream and any expected hurdles | at different levels of scale (maybe comparing to Heroku). | pbowyer wrote: | Not surprised. Congratulations Ben! | endisneigh wrote: | What's an example of a popular app (more than 100K users) that | uses lite stream? Curious to see how this looks like in | production | [deleted] | jkaplowitz wrote: | Tailscale: https://tailscale.com/blog/database-for-2022/ | | I don't know their user count, but they are growing well and | just raised their Series B. | benbjohnson wrote: | Litestream author here. That's a good question. There's not | very good visibility into open source usage so it's hard to say | unless folks write blog posts about it. For example, I know | Tailscale runs part of their infrastructure with SQLite & | Litestream[1]. | | I wrote a database called BoltDB before and I have no idea how | widespread it is exactly. It's used in a lot of open source | projects like Consul & etcd but I don't know anything about | non-public usage. | | [1]: https://tailscale.com/blog/database-for-2022/ | gfd wrote: | For non-public usages, I remember Boltdb being named as one | of the root causes that took down Roblox for three days! | https://blog.roblox.com/2022/01/roblox-return-to- | service-10-... | benbjohnson wrote: | Yep! That's usually how I find out usage inside companies. | :) | [deleted] | kall wrote: | I am as obsessed with sub 100ms responses as the people at | fly.io, so I think the one writer and many, many readers | architecture is smart and fits quite a few applications. When | litestream adds actual replication it will get really exciting. | | > it won't work well on ephemeral, serverless platforms or when | using rolling deployments | | That's... a lot of new applications these days. | mwcampbell wrote: | > it won't work well on ephemeral, serverless platforms or when | using rolling deployments | | I assumed that was what Fly was hiring Ben to work on. | emptysea wrote: | Yeah the rolling deployments gotcha really stuck out to me. I | think most PaaS will provide that by default anyways because | who wants downtime during deploys? | mwcampbell wrote: | mrkurt specifically mentioned that a solution for that is in | the works. https://news.ycombinator.com/item?id=31319544 | thdxr wrote: | in practice how do you make a single application node the writer? | | do you now need your nodes to be clustered + electing a leader | and shipping writes there? | | know fly.io did this with PG + Elixir but BEAM makes this type of | stuff pretty easy ___________________________________________________________________ (page generated 2022-05-09 23:00 UTC)