[HN Gopher] D1: Improvements to performance and scalability
       ___________________________________________________________________
        
       D1: Improvements to performance and scalability
        
       Author : eallam
       Score  : 94 points
       Date   : 2023-05-19 13:08 UTC (9 hours ago)
        
 (HTM) web link (blog.cloudflare.com)
 (TXT) w3m dump (blog.cloudflare.com)
        
       | davnicwil wrote:
       | So this looks like basically a distributed SQLite with read
       | replication for free across cloudflare's edge. Is that right?
        
         | m3kw9 wrote:
         | So serverless meaning you don't manage the db servers, so they
         | may or may not put your stuff on the edge, cloudflare takes
         | care of maybe load balancing too?
        
           | jgrahamc wrote:
           | Yes, Cloudflare handles load balancing and neat tricks like
           | measuring the latency between your users, your code and
           | databases etc. and moving your code around the network to
           | make it run fast: https://blog.cloudflare.com/announcing-
           | workers-smart-placeme...
           | 
           | The vision of the Supercloud is that you give us your code
           | and we'll figure out where and how to execute it:
           | https://blog.cloudflare.com/welcome-to-the-supercloud-and-
           | de...
        
       | kneebonian wrote:
       | So I hate cloudflare with a burning passion, at some point
       | cloudflare decided me home IP was bad and has started flagging it
       | all over the place, which leads to vast swathes of the internet
       | now being inaccessible to me.
       | 
       | There is danger in centralized systems.
        
         | v0idzer0 wrote:
         | Change your IP? Most ISP change every modem reboot. Others you
         | can call and request a change
        
         | bombcar wrote:
         | You may be able to force your ISP to give you a new IP by
         | resetting your modem or leaving it off long enough, that may
         | help.
         | 
         | But once cloud flare hates you, you have to bow to them.
         | 
         | Try browsing the normal web over Tor sometime and see how bad
         | it can get.
        
         | depingus wrote:
         | When I started using Firefox with the Temporary Containers
         | extension I would get absolutely bombarded by Cloudflare's
         | captchas. To the point where if I clicked on a search result
         | that sent me to a captcha, I would just close the tab and click
         | the next result (which often resulted in another captcha, rinse
         | repeat). That's when I realized just how big Cloudflare had
         | gotten.
         | 
         | I still use Temporary Containers and lately, I've noticed a
         | sharp decline in these Cloudflare captchas. I don't know if its
         | because people are moving away from them, or Cloudflare just
         | found a better way to finger print me.
        
         | kentonv wrote:
         | Have you tried using Privacy Pass? It is a browser extension
         | which helps prove that your traffic is legitimate which should
         | reduce the incidence of challenges from Cloudflare, even if
         | your IP has bad rep. It uses advanced cryptography to do this
         | without creating any new tracking vectors.
         | 
         | (Disclosure/disclaimer: I work for Cloudflare but in a
         | different department; I'm not an expert on Privacy Pass.)
        
           | simiones wrote:
           | > It uses advanced cryptography to do this without creating
           | any new tracking vectors.
           | 
           | Sounds like something which is designed to create _hard to
           | detect_ tracking vectors.
        
             | kentonv wrote:
             | Note that Privacy Pass is not a Cloudflare product, it's an
             | open protocol which Cloudflare supports.
             | 
             | https://datatracker.ietf.org/doc/charter-ietf-privacypass/
        
       | threatofrain wrote:
       | I enjoy Cloudflare and have been using workers + D1 for a few
       | months, but warn that Cloudflare's definition of beta is far more
       | beta than what other companies mean. The API surface may change
       | repeatedly without corresponding documentation and their Discord
       | can be a bit sparse on help.
        
       | marcopicentini wrote:
       | Why and when should I use a serverless Postgresql instead of a
       | postresql hosted on a server?
        
         | rozenmd wrote:
         | this isn't postgres
        
       | fire wrote:
       | hmm, is this roughly equivalent to Neon[1], but sqlite based
       | rather than postgres?
       | 
       | 1: https://neon.tech
        
         | nikita wrote:
         | (neon CEO)
         | 
         | This is true. Neon however offers bottomless storage and D1 is
         | 100Mb currently going to 1Gb.
        
       | colesantiago wrote:
       | Can I connect directly to a D1 database with a SQLite URL? That
       | would be awesome.
        
       | ericstegemann wrote:
       | Eek 0.75 per GB of storage! AWS is 0.115 One of our DBs has 2.3TB
       | in it.
        
         | Akkuma wrote:
         | The costs as a whole has me worried. I'm not sure it'll be
         | better than just using https://turso.tech/pricing as that is
         | already a free tier of 8GB and it might cost less overall the
         | paid tier.
        
         | jFriedensreich wrote:
         | at least at the moment D1 is for a completely different use
         | case, d1 has currently a limit of 100mb and will now be
         | increased to 1gb.
        
         | yashap wrote:
         | Yeah, I'd imagine this will be cheaper than AWS RDS for low
         | storage use cases (lack of fixed monthly compute costs wins
         | out), but more expensive for high storage. Like quite cheap for
         | a 10-100 GB DB, quite expensive for a 1-10 TB GB.
         | 
         | Though they do say:
         | 
         | > when we enable global read replication, you won't have to pay
         | extra for it, nor will replication multiply your storage
         | consumption
         | 
         | With AWS you'll have at least one read replica for failover, so
         | $0.23/GB. And if you really want global read replicas, with AWS
         | you might end up with something like a primary in North America
         | and read replicas in South America, Europe and Asia. That would
         | work out to $0.46/GB, so closes the gap a bit.
        
       | nemothekid wrote:
       | This is Cloudflare's flavor of Fly.io?
        
         | pier25 wrote:
         | Fly doesn't really offer distributed data.
         | 
         | Edit:
         | 
         | It does!
        
           | avinassh wrote:
           | is this not same - https://fly.io/docs/litefs/ ? They mention
           | it as 'LiteFS - Distributed SQLite'
        
             | pier25 wrote:
             | Ah you're right!
             | 
             | I thought this was a wrapper for Litestream but apparently
             | it's a parallel project by the same author who Fly hired.
             | 
             | https://litestream.io/
             | 
             | https://github.com/benbjohnson/litestream/pull/411
        
       | trollied wrote:
       | Interesting, Time Travel works the same as Oracle Flashback. Hope
       | there aren't any patents to trip over.
        
       | pottertheotter wrote:
       | So when would someone use something like this? I learn better by
       | example if anyone has any.
       | 
       | And for reference, here's the original D1 announcement with some
       | additional info https://blog.cloudflare.com/introducing-d1/
        
         | leetrout wrote:
         | I use D1 through microfeed https://www.microfeed.org/
        
         | kentonv wrote:
         | You'd use it whenever you're building an application on
         | Cloudflare Workers and you need to store data. D1 provides you
         | with a SQL database. It's based on SQLite, so it's designed for
         | relatively small datasets but can serve them very quickly from
         | the edge.
         | 
         | Note there are several alternatives here, too. Workers Durable
         | Objects[0] provide a lower-level primitive for building
         | advanced distributed systems. But D1 is easier to use for
         | typical use cases. For blob storage you might use R2[1]. And
         | for large databases Workers can easily integrate with several
         | serverless database providers.[2]
         | 
         | [0] https://blog.cloudflare.com/introducing-workers-durable-
         | obje... [1] https://www.cloudflare.com/products/r2/ [2]
         | https://blog.cloudflare.com/announcing-database-integrations...
        
           | Akkuma wrote:
           | Is it a non-goal to be long term usable for larger databases?
           | That would force the usage of something like turso your
           | closest direct comparison as a possible migration strategy or
           | relying on "Smart Placement" (which from my point of view
           | reduces the benefit of global edge) for other serverless non-
           | global dbs.
        
             | jgrahamc wrote:
             | I mean, our object storage is called R2, our first database
             | offering is called D1, and if we were to offer a fully
             | Distributed Database then D2 seems like a good name.
        
               | Akkuma wrote:
               | And you can follow up D2 with D2: Lord of Distributed.
        
             | kentonv wrote:
             | Personally, I'm a firm believer that most "web app" use
             | cases are better served by many small databases (e.g. per-
             | user or per-document) rather than a single monolithic
             | databases. This is especially true when serving users all
             | around the world -- per-user databases can be located near
             | each user (both for speed and to comply with data locality
             | laws).
             | 
             | What I'd like to enable here is a progression where you
             | start out prototyping your app with a single D1 database,
             | which is easy to use and reasonably fast. Then as you grow
             | we provide tools to let you transition to many D1 databases
             | sharded in a way that makes sense (e.g. per-user). Apps
             | that want even more control can move to using full-on
             | Durable Objects (which will soon support a SQLite database
             | per-object).
             | 
             | That said, there are certainly many use cases out there
             | where simple monoliths make sense, especially non-
             | interactive data crunching. I'm not sure yet if D1 will
             | ever be the right choice for those, but the Workers
             | platform aims to provide many options.
        
               | Akkuma wrote:
               | Thanks for the insight, I greatly appreciate it! This
               | definitely is a reasonable idea for many things and I'm
               | looking forward to seeing something similar to the
               | sharding mechanism in the future.
               | 
               | I've only started to think about this and I'm thinking
               | the hardest part will be dealing with cross-cutting
               | concerns (in a non-auto sharded world manually creating
               | multiple database) and trying to find a way to keep each
               | database isolated without extra burden compared to using
               | a hosted Postgres.
               | 
               | As an aside, that lan optimized house was a gaming dream.
               | Hope your new house is as awesome.
        
               | Wallacy wrote:
               | " Apps that want even more control can move to using
               | full-on Durable Objects (which will soon support a SQLite
               | database per-object)."
               | 
               | Can you elaborate this little bit more? Im using DO today
               | and i have a bad time sharding my data (works, but i hate
               | it);
               | 
               | So i will have the option to use the standard store
               | or/and SQLite?
               | 
               | If so, i dont can keep with my DO (because i have control
               | of everything) and use SQLite for things that is bigger
               | than what the value store supports.
        
               | kentonv wrote:
               | Sorry, I don't quite understand what you're asking.
               | 
               | In the future each DO will have a private SQLite
               | database. The key/value store will actually be redirected
               | to store into a special table in this database, but
               | probably new apps will just use the database and not the
               | KV store.
               | 
               | Separately from that, I would like to develop tools that
               | make sharding Durable Objects (and D1 databases) easier.
               | Today it's a pain to do manually. This is independent
               | from the underlying storage model, though.
        
             | fyzix wrote:
             | Cloudflare are 285 pops. Surely you dont need your db
             | replication them all of them. A few locations per continent
             | should suffice.
             | 
             | For comparison, fly.io, turso's provider, has 34 locations
             | and well-documented reliability issues.
        
               | Akkuma wrote:
               | That is a fair point. A few centralized locations will
               | likely be more than sufficient for most use cases.
        
       | Mystery-Machine wrote:
       | "up to 37x faster" is NOT before: 37.81ms, now: 1.82ms
       | 
       | You can't just round down 1.82ms to 1ms.
        
         | elithrar wrote:
         | Fixing this. We ran a few benchmarks (and some were much faster
         | than 37x), but this was a more typical case. Not our goal to
         | inflate numbers.
        
       | ccorda wrote:
       | Kenton Varda (tech lead) has some more notes in this twitter
       | thread:
       | https://twitter.com/KentonVarda/status/1659551757796515846
        
         | rektide wrote:
         | > _Our new engine is based on intercepting SQLite 's disk
         | writes and doing clever stuff with them. It was so easy because
         | the file format is not just well designed but amazingly well-
         | documented._
         | 
         | Quite the hobby project for a lot of people too! Other folks
         | doing this:
         | 
         | Rqlite https://hn.algolia.com/?query=Rqlite&sort=byDate ,
         | Dqlite https://hn.algolia.com/?query=dqlite&sort=byDate ,
         | Litestream https://hn.algolia.com/?query=Litestream&sort=byDate
         | / LiteFS https://hn.algolia.com/?query=LiteFS&sort=byDate,
         | marmot, mvsqlite
         | https://hn.algolia.com/?query=mvsqlite&sort=byDate
        
           | otoolep wrote:
           | I've been doing it for almost 10 years. :-)
           | 
           | https://www.philipotoole.com/9-years-of-open-source-
           | database...
        
         | aranke wrote:
         | Curious if you prototyped DuckDB before deciding to invest
         | further into SQLite.
         | 
         | DuckDB works great as an in-memory database (it's also the
         | default mode).
        
           | kentonv wrote:
           | We didn't, no.
           | 
           | I'm sure there's a lot of really cool local-first databases
           | out there, but SQLite has the benefit of being incredibly
           | widely battle-tested, with literally billions of
           | installations worldwide. It has received thorough security
           | research and fuzzing (it's part of Chrome's attack surface
           | after all). And there's tons of resources online to help
           | people understand how to use it. Although I'm sure there are
           | alternatives that serve certain use cases better it's hard to
           | imagine anything coming close for ours.
           | 
           | That said, the storage engine we've built is not that heavily
           | dependent on SQLite specifically. Any database that uses a
           | write-ahead log like SQLite does should be possible to adapt
           | to it in the future. So maybe we'll eventually open it up to
           | a variety of choices, or even let you bring your own as a
           | Wasm module.
        
             | kentonv wrote:
             | Oh, I've been informed that DuckDB uses SQLite under the
             | hood, so maybe compiling DuckDB to Wasm and running it on
             | top of this will be possible, we'll see.
        
               | aranke wrote:
               | From https://news.ycombinator.com/item?id=23290512:
               | 
               | > DuckDB is indeed a free columnar database system, but
               | it is not entirely built on top of SQLite. It exposes the
               | same front-end and uses components of SQLite (the shell
               | and testing infrastructure), but the execution
               | engine/storage code is new.
        
           | rektide wrote:
           | I expect most workloads are more OLTP/transactional than
           | OLAP/analytical.
        
             | elithrar wrote:
             | Correct. DuckDB is really interesting technology, but it's
             | not a direct successor to SQLite for transactional
             | workloads. It's also very new: there's a LOT of new code in
             | DuckDB on top of the (heavily fuzzed) SQLite parts.
             | 
             | (I use it personally, but it's not the same thing as what
             | we're building with D1)
        
       | pier25 wrote:
       | Has the DX of Workers improved?
       | 
       | I think Deno is lightyears ahead.
        
         | kentonv wrote:
         | Hard to say without knowing what specific problem you have had
         | but we are improving all the time. Several improvements
         | announced just this week, take a look at the blog.
         | 
         | Among other things, we made Wrangler (Workers CLI tool) use the
         | open source workerd by default for local development, so local
         | dev should produce a much more precise simulation now (since
         | it's literally running the same code).
        
           | pier25 wrote:
           | And what about the DX of using Workers with Pages?
           | 
           | I tried to use that recently and it was a disaster. I wrote
           | about my experience here:
           | 
           | https://twitter.com/pierbover/status/1641474067013271552
           | 
           | I then opened these two issues:
           | 
           | https://github.com/cloudflare/workers-sdk/issues/2962
           | 
           | https://github.com/cloudflare/workers-sdk/issues/2964
           | 
           | I ended up moving the project over to Netlify + Edge
           | functions. I had it all working in like 5-10 mins as it
           | should. Took me two hours to figure out why Workers weren't
           | working in my Pages project, and could never get Workers
           | working properly with my Astro project.
           | 
           | I think you're working exclusively on the engine of Workers
           | which is really top notch, but Cloudflare really needs to
           | improve the outer layer which affects DX considerably.
        
             | kentonv wrote:
             | Sorry you experienced that. FWIW this announcement from
             | Wednesday should help address the problems you ran into:
             | 
             | https://blog.cloudflare.com/pages-and-workers-are-
             | converging...
        
         | KRAKRISMOTT wrote:
         | Still no compatibility with standard ORMs like Prisma either.
         | 
         | And the team has been aware of the issue for years now
         | 
         | https://github.com/cloudflare/workers-sdk/issues/2701
         | 
         | https://news.ycombinator.com/item?id=31341513
        
           | kentonv wrote:
           | Sorry that D1 has been slow out the gate. Now that we've
           | solved the basic technical issues we can really focus on
           | improving DX.
        
           | Akkuma wrote:
           | You can use drizzle-orm with Cloudflare and that is fully
           | compatible.
        
             | rozenmd wrote:
             | Drizzle is amazing.
        
               | Akkuma wrote:
               | This is my first time using it and I've been very pleased
               | with it so far. It keeps it simple, has solid typing &
               | schema building, and reminds me of LINQ. I'm also a thin
               | models kind of person and the fact this is just an object
               | without needing to build ORM classes is even better.
        
         | mscccc wrote:
         | I've been using workers on & off since it launched. Just tried
         | it again recently and the local dev experience with wrangler is
         | excellent now.
        
       ___________________________________________________________________
       (page generated 2023-05-19 23:00 UTC)