[HN Gopher] Globally Distributed Postgres
       ___________________________________________________________________
        
       Globally Distributed Postgres
        
       Author : woodrow
       Score  : 167 points
       Date   : 2021-06-30 17:08 UTC (5 hours ago)
        
 (HTM) web link (fly.io)
 (TXT) w3m dump (fly.io)
        
       | holistio wrote:
       | I first read the title as "Globally Distributed Progress" and I
       | was like "if an article with this title made it to the front page
       | of HN it must sure be some interesting read.
       | 
       | I am now slightly disappointed.
        
         | mrkurt wrote:
         | It is pretty boring. But "the most boring possible way to build
         | globally distributed postgres" is interesting, I think.
        
         | unnouinceput wrote:
         | Me too, having worked with PGSQL in past 15 years, I was
         | expecting...something else. But what bothers me the most is
         | their 2 times quote: "HTTP is cheap". No, it's not. Try having
         | local local servers in remote locations, like ski resorts or
         | country side fair, where internet connectivity is sketchy at
         | best, and then you'll realize how cheap a request is.
        
           | mrkurt wrote:
           | HTTP is cheap! Latency is expensive, but all our HTTP
           | shenanigans happen between our proxy and user VMs. We're not
           | inflicting multiple requests on slow clients at a ski resort.
        
       | frafra wrote:
       | > It is much, much faster to ship the whole HTTP request where it
       | needs to be than the move the database away from an app instance.
       | 
       | Does that mean that the app server would just fail, and then your
       | network replays the same HTTP request (made by the client to the
       | edge), toward an instance running the app server inside the
       | region where there the Postgres primary instance is running? If
       | so, you should then be able to figure out which client request
       | provoked the SQL error and the app server would report some
       | errors, right? Otherwise, the app server should be able to tell
       | you if it failed because the Postgres server does not support
       | write statements, which would require some minor adjustment in
       | the app server code, isn't it?
       | 
       | I am sorry if I missed something from your description, but I am
       | trying to figure out what the flow would look like. Thank you!
        
         | viraptor wrote:
         | > which would require some minor adjustment in the app server
         | code, isn't it?
         | 
         | Yes, that's what the code fragment does:
         | 
         | > if e.cause.is_a?(PG::ReadOnlySqlTransaction)
        
           | frafra wrote:
           | Ok, so that's an example of how the app server code should
           | look like, no middleware involved. Thank you!
        
         | VWWHFSfQ wrote:
         | > It is much, much faster to ship the whole HTTP request where
         | it needs to be
         | 
         | If you're already routing at the HTTP level then why not just
         | route the request based on the HTTP method? (assuming the
         | handlers for those are properly implemented.)
         | 
         | Why not just route a POST request to a backend connected to a
         | writable database? And GET to a read-only database? I don't
         | understand how it's any kind of an optimization to be
         | continually bouncing queries out of the read-only replicas
         | because Postgres threw an error.
        
           | frafra wrote:
           | I agree with you, but I just was trying to understand the
           | fly.io post. They also wrote:
           | 
           | "Most GETs are reads, but not all of them. The platform has
           | to work for all the requests, not just the orthodox ones."
           | 
           | So I think it all depends on which level of the application
           | you take the decision to gracefully fail and replay the
           | request elsewhere. In some case, it is trivial: as soon as
           | you see a POST/UPDATE/DELETE, because you trust your app;
           | sometimes you may need to take the decision later, or at a
           | lower level.
           | 
           | In the simplest scenario, fly.io could just forward the
           | request to the right region, without even bothering the app
           | server to reply with an error, but that would work only if
           | GET requires no writes.
        
             | mrkurt wrote:
             | You got it.
             | 
             | We can give people a library that catches Postgres readonly
             | errors and make it a reasonably standard experience. We
             | can't ensure that peoples' apps have good write hygiene. We
             | _can_, though, educate people and tell them what to look
             | for when they're trying to optimize performance.
             | 
             | There's also the graphql problem (and really any kind of
             | non-rest RPC). It's somewhat rare that applications use
             | HTTP verbs appropriately, APIs tend to bypass HTTP methods.
        
           | tptacek wrote:
           | You obviously can't just do that, because ordinary
           | applications are full of GET requests that cause database
           | updates.
        
       | picardo wrote:
       | I tried this a few months ago. Many features were missing then. I
       | couldn't set up backups, and connect directly to `psql`. It was
       | such a pain. I don't know if they've addressed those issues yet.
       | It'd be a hard sell for me to store a database system for a
       | production app in a Docker image.
        
         | tptacek wrote:
         | Just a note about directly managing Postgres here: we launched
         | Postgres like, a minute ago (a couple months in normal people
         | time). For a pretty long time, you couldn't do any real
         | persistent storage on a Fly instance; we were still mostly
         | targeting edge applications, where people slice components of
         | their apps off to run closer to the edge.
         | 
         | I think it's safe to say that while we still love edge apps and
         | are committed to them long term, we've switched gears towards
         | full-stack applications (that is, we've gotten a lot more
         | ambitious about what we want to be a good platform for). So
         | this stuff is shaping up pretty quickly.
         | 
         | Postgres (and volume support) happened at about the same time
         | as private networking did (Postgres was our first internal
         | application for private networks). In January, we started doing
         | user-mode WireGuard, so that `flyctl` runs its own TCP/IP stack
         | and speaks WireGuard directly, without involving the OS (so you
         | don't need root to use it). User-mode WireGuard gave us `flyctl
         | ssh console`, and is also the way we do direct `psql`. There's
         | more stuff in that vein coming; hopefully, it's gotten a lot
         | easier to manage Postgres as `flyctl` has gotten more capable.
         | 
         | Where we want to be is for it to be trivial to run something
         | like Postico.app to talk to a Fly Postgres database. You can do
         | that now, but it's not yet _totally_ trivial. Hopefully you can
         | sort of look at `flyctl` (it's open source) and see how that'll
         | work.
        
         | mrkurt wrote:
         | It's better, but still beta. We handle backups and you can
         | connect directly to psql. We also added migration support as
         | part of app releases. The global distribution feature, though,
         | works with any database, not just our postgres.
         | 
         | Between you and me, we tried really hard not to build a
         | Postgres service. We're committed now, it's gonna be good. For
         | future DBs we plan to get big enough that people like Timescale
         | to ship their products on Fly infrastructure.
        
           | jfyne wrote:
           | This might be in the docs somewhere, but can you bring your
           | own postgres image to your managed service?
        
             | [deleted]
        
             | mrkurt wrote:
             | Our postgres clusters are just a Fly app:
             | https://github.com/fly-apps/postgres-ha
             | 
             | You could run your own PG by modifying that app. Right now
             | we're calling it "automated" and not managed, though. All
             | alerts about health and other issues go straight to
             | customers, we don't have DBAs that will touch these things
             | yet.
        
               | jfyne wrote:
               | Ah ok, interesting. Thanks
        
       | mixmastamyk wrote:
       | I know this is hacker news and all and you're trying to be fun,
       | but as I'm looking at platforms like fly for a future saas, the
       | first sentence (cool hack) rubbed me the wrong way. Would rather
       | hear "we found an elegant design" and are working hard to make it
       | bulletproof... etc.
       | 
       | Databases are like that, the place to get serious.
       | 
       | Sure I won't let it deter me from a proper eval, but I could
       | definitely see it scaring away suits.
        
         | [deleted]
        
         | mrkurt wrote:
         | We are at the stage where attracting developers and scaring
         | suits is probably good.
         | 
         | I'd have a hard time calling something I was partially
         | responsible for elegant, though, so this might just be our self
         | deprecating nature rearing its head.
        
           | treis wrote:
           | >We are at the stage where attracting developers and scaring
           | suits is probably good.
           | 
           | As a developer the way it was written scared me off. It reads
           | like one of those recipe sites that game Google by adding a
           | nonsensical story to the recipe. It seems like it took half
           | the article to get to your "one trick". And ultimately you
           | didn't deliver what the title teased.
        
       | Thaxll wrote:
       | Running a master in us-west and having read replica in europe,
       | what could go wrong ...
       | 
       | My advice is don't do those things in the blog and keep the DB
       | and the app in the same region it will save you so many problems.
       | If you can't just use a DB that was designed for that like
       | Spanner or Cockroach.
        
         | tptacek wrote:
         | All the things that can go wrong with globally distributed
         | databases. You can just run an HA cluster in a single region if
         | you're looking to keep it simple, just the same way you can run
         | a single-region app on Fly if you like.
         | 
         | We try to go out of our way to say that this stuff isn't a
         | perfect fit for every application or a "best" way to do things.
         | I think if a deployment like this is attractive for your
         | application, you probably know it. I'm certainly not going to
         | evangelize against people scaling up single-region databases!
        
         | simonw wrote:
         | Genuine question: what could go wrong?
         | 
         | I'm a big fan of "pinning" mechanisms where any time a user
         | performs a write to your database you set a cookie that lasts
         | for a few seconds and causes any subsequent requests from that
         | user to go to the lesser, not a replica.
         | 
         | This solves the stale-read-after-write problem and ensures
         | users will see any changes that they have personally made.
         | 
         | This does assume a mostly-reads, occasional-write application
         | (which defines most of the applications I've worked on in my
         | career).
         | 
         | So given that kind of application, what are the problems I'm
         | not considering that I should look out for with a read-replica
         | on a different continent?
        
           | dkhenry wrote:
           | Of the many things that can go wrong, the one I would be most
           | worried about would be replication lag causing the disk on
           | the primary to fill with WAL logs bringing down the primary,
           | or if you have limited the WAL logs, the replica not being
           | able to keep up and detaching from the primary causing stale
           | reads.
           | 
           | Also long distances will make replication latency increase so
           | unless _all_ requests are sent to the primary region you will
           | likely have a read after write state data situation unless
           | you go out of your way to ensure consistency
        
       | chatmasta wrote:
       | Love this. We're building a similar architecture at Splitgraph
       | [0] which we refer to as a "Data Delivery Network." The basic
       | idea is we implement a proxy that forwards queries to backend
       | data (either live data sources via an FDW, or versioned snapshots
       | via our Docker-inspired "layered querying").
       | 
       | Soon anyone will be able to connect data sources by adding their
       | read only credentials in the web (already possible via CLI but
       | undocumented). The idea is to make exposing a database to the DDN
       | as simple as exposing a website to a CDN.
       | 
       | We've designed all this to be multi-tenant and horizontally
       | scalable, but we're not actually running it on a distributed
       | network yet. Personally, I've followed Fly for a long time and
       | always loved the angle you're taking. If any of you at Fly read
       | this and want to potentially collaborate on a solution in this
       | space, my email is in my profile.
       | 
       | [0] https://www.splitgraph.com
        
         | rozap wrote:
         | I work at Socrata, and I've skimmed splitgraph and it looks
         | pretty neat. How do you all deal with pushdowns or query
         | rewriting in general when the target datasource doesn't
         | implement the functionality your user writing real sql is
         | trying to use? Does it explode or take forever, because it
         | doesn't look like you're storing copies of everything in Real
         | Postgres?
         | 
         | We expose a pretty limited subset of sql through our APIs, and
         | an even more limited form to public APIs, so I'm interested in
         | how you're doing things providing joins to your users. "It
         | exists but is super slow" is a reasonable answer as well :)
        
           | chatmasta wrote:
           | Hey! Socrata is awesome :) The basic idea here is that you
           | can write a Splitgraph plugin for any type of data source
           | [0], so we like to use Socrata as a convenient example. Each
           | plugin includes a Postgres Foreign Data Wrapper (FDW) to
           | translate from SQL to whatever upstream query language (e.g.
           | Socrata FDW [1]).
           | 
           | By default we don't copy any data into Splitgraph (although
           | we do cache query results keyed by AST). When a query
           | arrives, we parse it for any table identifiers, and "mount" a
           | temporary foreign table for each, using the appropriate FDW.
           | Foreign tables are mostly transparent to the Postgres query
           | planner, so a JOIN will work (example [2]), and an FDW can
           | even optionally implement aggregate pushdown. But yes,
           | especially in the case of cross-dataset JOINS, there are some
           | pathological cases. When your use case outgrows live querying
           | (e.g. due to pathological JOIN queries), you can ingest any
           | data source into a Splitgraph image, a versioned snapshot of
           | data similar to a Docker image. [3]
           | 
           | Our focus right now is on the internal enterprise use case.
           | We use the Socrata plugin as a demo, but our customers are
           | more interested in plugins for data sources they use
           | internally, e.g. Postgres, Snowflake, Google Search Console,
           | CSV in S3, etc. We support all those too; you can mount them
           | all locally, and you can also mount them on Splitgraph.com
           | (or an internal deployment), but still thin docs / no web UI
           | yet.
           | 
           | [0] https://www.splitgraph.com/blog/foreign-data-wrappers
           | 
           | [1] https://github.com/splitgraph/splitgraph/blob/master/spli
           | tgr...
           | 
           | [2] https://www.splitgraph.com/workspace/ddn?layout=hsplit&qu
           | ery...
           | 
           | [3] https://www.splitgraph.com/docs/concepts/images
        
       | frafra wrote:
       | Also relevant: https://fly.io/docs/getting-started/multi-region-
       | databases/
        
       | simonw wrote:
       | The way this is implemented, with the ability for an application
       | server attached to a replica to say "error: this needs to perform
       | a write - hey CDN, replay this request against the region with
       | the database leader in it" is SO clever.
        
         | zozbot234 wrote:
         | There's probably a way to do this in native postgres by abusing
         | the FDW facility. No real need to involve the CDN or replay the
         | request.
        
           | mrkurt wrote:
           | We tried running something like pgpool's split write setup,
           | where the app or a proxy in front of the DB knows where to
           | send transactions:
           | https://www.pgpool.net/docs/latest/en/html/runtime-config-
           | lo...
           | 
           | It didn't work cross region because no one builds an app
           | expecting latency between their app server code and database.
           | What we'd see on write requests was something like:
           | 
           | 1. Query for data from read replica, perhaps for validation
           | (0ms)
           | 
           | 2. Do a write to the primary in a different region (20-400ms)
           | 
           | 3. Query primary for consistency (20-400ms)
           | 
           | 4. More queries against primary for consistency (20-400ms)
           | 
           | 5. Maybe another write (20-400ms)
           | 
           | 6. repeat
           | 
           | You can actually use our postgres this way if you want! But
           | it breaks for most apps. It is much, much faster to ship the
           | whole HTTP request where it needs to be than the move the
           | database away from an app instance.
        
         | foobarbazetc wrote:
         | I think most database engineers who think about ACID/isolation
         | levels/etc would look at this and go "uhm... _jumps out a
         | window_ " but for a subset of users I guess it kind of works?
         | 
         | I don't really see how a transaction that performs reads and
         | writes can be replayed like this with any sort of guarantee
         | about anything, though.
         | 
         | And there's no opportunity to build any sort of conflict
         | resolution in there because the replay is automatic.
         | 
         | Maybe I missed something though?
        
           | simonw wrote:
           | The replay is only automatic if you opt for the version where
           | you catch write errors and turn them into replay requests.
           | 
           | My experience of building web apps, where most business logic
           | runs within the scope of a single HTTP request, suggests that
           | a quite impressive number of common cases could be served by
           | the replay-request-against-the-leader pattern.
        
             | tptacek wrote:
             | I mean: just to be clear: you _can 't_ write to a read
             | replica. There's no way to introduce a conflict that way.
             | 
             | Also just for what it's worth: we agree with 'foobarbazetc.
             | There's a section about that in the post. If you're saying
             | there are important classes of applications this doesn't
             | work well for, that's true.
        
               | foobarbazetc wrote:
               | Just to be clear, I'm not trying to say it's bad or
               | whatever. It's super cool! I'm a big fan of fly.io.
               | 
               | I just don't see how something like this actually works
               | in a way where you can reason about transaction ordering
               | without global serialisation of requests against the
               | writable database, which I presume isn't happening behind
               | the scenes?
               | 
               | Basically, this would seem to work great if every request
               | did INSERTs only and you based no logic to run INSERTs on
               | anything you SELECT'ed from the read only replicas.
               | 
               | But you could have situations where, e.g., you DELETE or
               | UPDATE something in a request to one region, and it goes
               | to replay that against the writable region, and in
               | "replay gap" another request modifies the same rows or
               | objects, and based on various factors such as latency
               | etc, a DELETE ... WHERE or UPDATE ... WHERE clause might
               | no longer hold. Or you UPDATE the wrong objects, or
               | DELETE the wrong data, etc.
               | 
               | I did read the post, but I guess I need to re-read it. I
               | have written a globally distributed database before so
               | I'm always intrigued by how these work. :)
        
               | mrkurt wrote:
               | INSERT, DELETE, UPDATE will all fail in the readonly
               | regions. This is normal for Postgres read replicas, we're
               | not doing anything special here.
               | 
               | What we're doing works identically to a vanilla HTTP
               | based app. Requests that modify the DB always run against
               | the primary database. Requests that perform reads _then_
               | modify the DB always run against the primary database.
               | 
               | HTTP services all have an underlying eventual consistency
               | problem. If you are viewing a page with a database ID on
               | it, then click delete for that ID, the record could be
               | gone before the time the request hits the database
               | (because someone else might've clicked while you were
               | reading).
               | 
               | Does that help? I think we didn't describe this well
               | enough, it's way simpler under the covers than you might
               | expect.
        
               | [deleted]
        
               | JackC wrote:
               | Think of it like, if your read-only regions don't have
               | anything else stateful they're able to do before the
               | postgres write fails, then their existence is
               | indistinguishable from network latency. An app that
               | doesn't break because of latency won't break because of
               | this.
               | 
               | If the readonlies might do something else stateful and
               | not unwind it on postgres write error, then this approach
               | won't work. (But wouldn't they be buggy anyway? Postgres
               | writes aren't guaranteed not to raise errors.)
        
           | [deleted]
        
         | rad_gruchalski wrote:
         | And probably adds quite a lot of latency under load.
        
           | simonw wrote:
           | The demo at https://fly-global-rails.fly.dev/ lets you see
           | replay times.
           | 
           | I'm in SJC (San Jose / Silicon Valley) and https://fly-
           | global-rails.fly.dev/regions/syd shows that my replay time to
           | their region in Sydney Australia is between 25 and 45ms.
           | That's pretty quick!
        
             | mrkurt wrote:
             | We need to optimize this. Right now, the edge proxy handles
             | the replay request. So if you hit SJC and a server in LAX
             | handles the request (which is probably what's happening for
             | you), you pay the SJC -> LAX latency penalty before the
             | replay happens.
             | 
             | There's no reason we can't replay from LAX, though, just
             | need to build it.
        
             | rad_gruchalski wrote:
             | The try / repeat pattern inevitably needs to do a full
             | round trip multiple times in different regions.
             | 
             | What would be awesome to have is something like this but
             | built on top of envoy postgres proxy or similar where it
             | would know where to send the query to based on the table +
             | column value / pk. But one then rebuilds Yugabyte.
        
               | zozbot234 wrote:
               | > postgres proxy or similar where it would know where to
               | send the query to based on the table + column value / pk.
               | 
               | That sounds like sharding, which Postgres supports
               | already.
        
               | rad_gruchalski wrote:
               | It does, doesn't it? Every database can be sharded. But
               | you have to do it yourself. This is how it works in
               | Yugabyte: https://docs.yugabyte.com/latest/explore/multi-
               | region-deploy....
        
             | VWWHFSfQ wrote:
             | > my replay time to their region in Sydney Australia is
             | between 25 and 45ms. That's pretty quick!
             | 
             | What on earth.. Are people really OK with adding 25-45ms
             | latency because they can't be bothered to route their query
             | to a database that can actually service it?
        
               | mrkurt wrote:
               | So far, yes they're ok with that. We can also optimize
               | most of that number away, down to <10ms for almost every
               | app.
               | 
               | They're ok with it because a (valuable) write request
               | that's 10% slower typically comes after a read request
               | that's much, much faster. In this scenario:
               | 
               | > my replay time to their region in Sydney Australia is
               | between 25 and 45ms. That's pretty quick!
               | 
               | The readonly requests were probably <100ms for simonw.
               | The replayed request (that he initiated with a click) was
               | probably 500ms. When a request already takes 500ms, an
               | additional 50ms of latency is basically noise.
               | 
               | The alternative is (often) for them to refactor their
               | apps and then enforce write restrictions on code to make
               | sure random GETs don't write to the DB unnecessarily.
               | 
               | Our goal was to make this work without requiring hairy
               | code changes. People who are ready to do the work to
               | optimize their apps can make things faster and use what
               | we're doing now as a fallback, if they want.
        
             | viraptor wrote:
             | That's not just quick, that's suspiciously quick. Here's a
             | round-trip chart for Azure: https://docs.microsoft.com/en-
             | us/azure/networking/media/azur...
             | 
             | Us West - Australia round-trip for them is ~140ms. And
             | ~50ms seems to be the physical limit unless I messed up the
             | numbers somewhere?
             | 
             | Edit:
             | 
             | Syd-syd response time 138ms
             | 
             | Syd-lax response time 427ms, replay time 16ms
             | 
             | Ok, not sure what the replay time means in this case (extra
             | latency from the cancellation?), but it's not the total
             | cost and you really don't want to send those packets around
             | the world :-)
        
               | mrkurt wrote:
               | simonw was referencing the replay overhead number. His
               | request hits Los Angeles first, the rails app in lax says
               | "go to Sydney", and the request gets replayed in syd.
               | 
               | The entire request to Sydney for me (from Chicago) takes
               | 516ms, 15ms of that is "replay overhead".
        
         | yashap wrote:
         | Clever, but only useful if your "pre-DB-write" work is cheap.
         | For example, I work at a company where part of what we do is
         | matching riders to drivers. This can be very expensive, and a
         | typical flow is:
         | 
         | 1) Read current state of riders/drivers from DB (slightly
         | expensive)
         | 
         | 2) Solve a vehicle routing problem with the new rider request
         | added to the current state (can be VERY expensive)
         | 
         | 3) If there's a good new solution, commit the changes that the
         | VRP solutions suggest (this is the DB write, and it's only
         | slightly expensive)
         | 
         | The approach in this blog post would have us duplicating the
         | most expensive thing our app does (step 2 above), for most
         | requests - not good. Much more load on our system, and these
         | already slow writes would take ~twice as long.
         | 
         | I'd hope Fly also lets you configure the load balancer - i.e.
         | have a way to send certain requests to the "writer" nodes by
         | default, vs. in a retry.
        
           | simonw wrote:
           | Rather than using their suggested "catch writes and throw an
           | exception" mechanism, I would instead write my own fast
           | application logic to identify if something is likely to be a
           | write.
           | 
           | For most of the applications I build the HTTP verb is good
           | enough for this - so I would add a tiny piece of Django
           | middleware which looks for a POST to a non-primary region and
           | sends fly-replay straight away at that point.
        
             | kenrose wrote:
             | Cries in GraphQL (where you generally POST, even for
             | queries).
        
               | simonw wrote:
               | GraphQL might work OK though, because it differentiates
               | queries from mutations - so you could have some early
               | logic that says "if this POST request includes a
               | mutation, replay against the leader - otherwise keep
               | running against the replica".
        
               | mrkurt wrote:
               | It's true! We designed this specifically because it works
               | with GraphQL. The exception->replay works by default, and
               | it's easy to send early replay command on GraphQL
               | mutations.
        
             | chachra wrote:
             | would one do it in the custom db router itself instead?
        
             | mrkurt wrote:
             | This is a good way to do it. Catching errors lets us
             | reliably ship a library that makes this work for almost
             | everyone, but it's not right for all apps:
             | https://github.com/soupedup/fly-rails/blob/main/lib/fly-
             | rail...
        
             | VWWHFSfQ wrote:
             | We follow this pattern as well with a CloudFront ->
             | OpenResty/Nginx router for the different HTTP methods. I've
             | often wondered by CloudFront can't route to custom origins
             | based on request method. You have to do it with a
             | Lambda@Edge function, which is annoying.
        
           | [deleted]
        
       | monocasa wrote:
       | I've been kicking around an idea like this for a while. The train
       | of thought that brought me there was the recognition of a
       | distinct memory hierarchy in today's common distributed
       | applications that parallels the one in your computer.
       | 
       | So your computer has memory banks that (as a rough first
       | approximation) each get ~10x bigger, but with ~10x greater
       | latency. The neat part though is that global consistency for
       | writes happens with L1 and L2 working together, so pretty damn
       | high up in the hierarchy. In contrast distributed applications
       | typically at best will have a write through cache where global
       | consistency happens all the day down at the actual data store.
       | Exploring the idea of distributed MOESI where one client, because
       | they had the permissions to write something in the first place,
       | can then be the owner for that database row as it's still being
       | flushed out seems like a great basis for a distributed system
       | that might not even need the dedicated datastore at all anymore,
       | but a sea of clients participating in coherency and replication.
       | Albeit this has oodles of consistency and availability problems
       | that very well might kill the whole concept, like how MOESI would
       | absolutely fall over when trying to hotplug CPUs at arbitrary
       | times.
        
       | zomglings wrote:
       | This is a nice solution, although I prefer for my application to
       | be aware of read vs. read-write connections.
       | 
       | Question about fly.io - how strict is the docker image
       | requirement? How easy would it be to deploy a go application +
       | systemd service?
        
         | tptacek wrote:
         | We'll accept any container that a Docker registry will; you
         | don't have to use Docker to build your container instance
         | (`flyctl launch` will build an application from a buildpack).
         | You can run systemd in a container.
         | 
         | We don't use Docker to run containers; in fact, we don't run
         | "containers" at all. Instead, we transmogrify OCI containers
         | into root filesystems for Firecracker micro-vms. Your
         | "container" runs as a virtual machine with its own Linux
         | kernel. You can do whatever you'd like with it.
         | 
         | https://fly.io/blog/docker-without-docker/
        
       | mkl95 wrote:
       | It reminds me of Yugabyte, which is Postgres compatible
       | 
       | > YugabyteDB is the open source, distributed SQL database for
       | global, internet- scale applications with low query latency,
       | extreme resilience against failures. *
       | 
       | However Fly is nicer from a developer's point of view, because it
       | doesn't require you to learn a new query language, you can write
       | good old Postgres.
       | 
       | * https://www.yugabyte.com/
        
         | qaq wrote:
         | What new query language do you need to learn for yugabyte ?
        
           | mkl95 wrote:
           | YSQL. From their docs:
           | 
           | > Successive YugabyteDB releases honor PostgreSQL syntax and
           | semantics, although some features (for example those that are
           | specific to the PostgreSQL monolithic SQL database
           | architecture) might not be supported for distributed SQL. The
           | YSQL documentation specifies the supported syntax and
           | extensions.
           | 
           | It is psql compatible, but you may run into missing features,
           | so you will have to find a way around that. The documentation
           | says that their latest release is based on psql 11.2, and
           | psql 14 will be released soon.
        
         | rad_gruchalski wrote:
         | Yugabyte with replica placement and tablespaces does this for
         | free without having to do any try / catch.
        
           | tptacek wrote:
           | All the next-gen distributed SQL databases are neat. We are
           | definitely not trying to compete with them; that's not at all
           | where we're heading (though we'd be happy to work with them
           | to bring their databases to our platform).
           | 
           | This is just a neat hack you can do with totally standard
           | Postgres. It's a thing other people do outside of Fly; we
           | just like how easy you can make it if you can program both
           | the CDN and the applications.
        
           | mrkurt wrote:
           | We observed two things from developers when we started
           | working on this:
           | 
           | 1. The first thing we did was give them CockroachDB. Modeling
           | data for geo distributed Cockroach is conceptually similar to
           | yugabyte. It's not easy, though, to do that modeling. For
           | read heavy apps most devs didn't want to do the work.
           | 
           | 2. We also experimented with cross regions writes at the DB
           | level (both with Postgres and Cockroach). The failure cases
           | on those were bad. Apps are pretty naive about queries, so
           | interleaving writes with >10ms of latency between reads
           | resulted in really slow requests. These were pretty frequent!
           | Most apps broke if the devs didn't carefully optimize that
           | sequence.
           | 
           | The free we're going for here is "free for most developers".
           | Changing a data model or optimizing how an app talks to a
           | database wasn't quite free enough for our purposes, and
           | neither were necessary for the types of workloads most people
           | deal with.
           | 
           | They _definitely_ need to use something that's not-just-
           | postgres to do multi region writes or distribute write heavy
           | workloads. That's why we wrote the whole section on
           | graduating to Cockroach. Graduating from Postgres to yugabyte
           | also works!
        
             | rad_gruchalski wrote:
             | Never touched cockroach because of their license but have
             | hands on experience with yugabyte. Creating geo distributed
             | data layer is basically:
             | 
             | create tablespace with replica placement, create table with
             | tablespace, give the role explicit access to a single
             | talespace
             | 
             | Yugabyte does everything else. It's all standard sql with a
             | couple of extra keywords.
        
               | mrkurt wrote:
               | We're talking about a different level of "easy" I think.
               | Easy: asking devs add a gem to their app.
               | 
               | Hard: asking them to write migration code of any kind,
               | much less understand tablespaces and replica placement.
               | 
               | You're advocating something we spent a lot of time
               | trying! Boring Postgres read replicas worked for almost
               | everyone we worked with, and it works for the vast
               | majority of what full stack folks are already building.
               | Our infrastructure will work great for the ones who need
               | Cockroach/Yugabyte, too! That's specifically why we
               | covered it in our post. :)
        
               | rad_gruchalski wrote:
               | Ha, true, I get your point. After 20+ years and dozens of
               | technologies things sometimes seem easy :P
               | 
               | Don't get me wrong, I like what you do, I'm simply a jerk
               | sometimes!
        
               | mrkurt wrote:
               | I, too, do jerky things sometimes. ;)
        
               | rad_gruchalski wrote:
               | Hey, so I'm actually in a hunt for a multi tenant psql
               | solution and so far, I'm settled on yb but can try other
               | things out, would you be up to discussing my use case?
               | What would be the best way to get in touch?
        
               | zozbot234 wrote:
               | > Hard: asking them to write migration code of any kind,
               | much less understand tablespaces and replica placement.
               | 
               | By that standard, any kind of global consistency (or even
               | weakened consistency of various kinds, but maintaining
               | some desirable properties) will always be 'hard'.
        
               | dijit wrote:
               | Yes, this is a hard problem.
        
               | mrkurt wrote:
               | It will always be hard. Consistency + latency is a huge
               | problem.
               | 
               | We can give _most_ developers the benefit of globally
               | distributed postgres without causing consistency issues,
               | though. Read replicas are simple to think about and
               | reasonably easy to write standard code for.
        
       ___________________________________________________________________
       (page generated 2021-06-30 23:00 UTC)