[HN Gopher] Jepsen: PostgreSQL 12.3
       ___________________________________________________________________
        
       Jepsen: PostgreSQL 12.3
        
       Author : aphyr
       Score  : 540 points
       Date   : 2020-06-12 13:03 UTC (9 hours ago)
        
 (HTM) web link (jepsen.io)
 (TXT) w3m dump (jepsen.io)
        
       | threeseed wrote:
       | I am still wondering when we will see PostgreSQL being tested in
       | a HA form.
       | 
       | It's just extraordinary to me that it's 2020 and it still does
       | not have a built-in, supported set of features for supporting
       | this use case. Instead we have to rely on proprietary vendor
       | solutions or dig through the many obsolete or unsupported
       | options.
        
         | castorp wrote:
         | There is a built-in supported set of features for high
         | availability. What exactly are you missing?
        
           | devit wrote:
           | Stolon or an equivalent being officially blessed by the
           | PostgreSQL team and made part of the official distribution.
           | 
           | Also, same for a multi-master solution.
        
           | phaemon wrote:
           | The option to install postgres on three instances, specify
           | that they're in cluster "foo" and then it just works,
           | including automatically fixing any issues when one of the
           | instances drops out and rejoins.
           | 
           | That's what other DBs have but it seems to be missing from
           | postgres. If it now exists could you point me to the doc
           | explaining how to do this?
        
       | [deleted]
        
       | mekoka wrote:
       | Props to Jensen for exposing this longtime bug. Props to the PG
       | team for identifying the culprit and their response. This report
       | just strengthens my faith in the project.
        
       | sandGorgon wrote:
       | > _PostgreSQL has an extensive suite of hand-picked examples,
       | called isolationtester, to verify concurrency safety. Moreover,
       | independent testing, like Martin Kleppmann's Hermitage has also
       | confirmed that PostgreSQL's serializable level prevents (at least
       | some!) G2 anomalies. Why, then, did we immediately find G2-item
       | with Jepsen? How has this bug persisted for so long?_
       | 
       | This is super interesting. Jepsen seems to be like Hypothesis for
       | race conditions: you specify the race condition to be triggered
       | and it generates tests to simulate it.
       | 
       | Yesterday, Gitlab acquired a fuzz testing company[1]. I wonder if
       | Jepsen was envisioned as a full CI integrated testing system
       | 
       | [1] https://m.calcalistech.com/Article.aspx?guid=3832552
        
         | aphyr wrote:
         | Yes. Jepsen and Hypothesis both descend from a long line of
         | property-based testing systems--mostly notably, Haskell &
         | Erlang's QuickCheck. Jepsen makes a number of unusual choices
         | specific to testing concurrent distributed systems: notably, we
         | don't do much shrinking (real-world systems are staggeringly
         | nondeterministic). Jepsen also includes tooling for automated
         | deploys, fault injection, a language for specifying complex
         | concurrent schedules, visualizations, storage, and an array of
         | sophisticated property checkers.
        
           | sandGorgon wrote:
           | Is Jepsen for testing - say the microservices for Uber?
           | 
           | Or is it specific to the people who build things like
           | databases, api frameworks,etc.
        
             | aphyr wrote:
             | You can test pretty much any kind of concurrent system
             | using Jepsen: in-memory data structures, filesystems,
             | databases, queues, APIs, services, etc. Not all the tooling
             | is applicable to every situation, but it's pretty darn
             | general.
        
               | theptip wrote:
               | Do you know of anyone using Jepsen to torture their
               | microservices? This sounds like a really interesting
               | usecase.
        
       | wildchild wrote:
       | Postgresql is a bullshit database.
        
       | popotamonga wrote:
       | What does this really mean? I just migrated from mongo to Pg.
        
         | petergeoghegan wrote:
         | The default isolation level is read committed mode, whereas the
         | bug in question only affected applications that use
         | serializable mode. You have to ask for serializable mode
         | explicitly; if you're not, then you cannot possibly be affected
         | by the bug. (Perhaps you _should_ consider using a higher
         | isolation level, but that would be equally true with or without
         | this bug.)
        
         | nkozyra wrote:
         | It's an isolation issue but if you're coming from Mongo I'd
         | broadly guess it's not one you're going to trigger. Also, look
         | at their other analyses ... they're very detailed and upfront
         | about serialization isolation issues in a lot of huge
         | databases/datastores.
         | 
         | Noteworthy: "In most respects, PostgreSQL behaved as expected:
         | both read uncommitted and read committed prevent write skew and
         | aborted reads."
        
           | castorp wrote:
           | Postgres does not support read uncommitted
        
             | petergeoghegan wrote:
             | Technically it does. You can ask for read uncommitted mode,
             | though you'll just get read committed mode. This is correct
             | because you're getting the minimal guarantees that you
             | asked for. The SQL standard allows this.
        
         | oauea wrote:
         | If you came from mongo that means everything will work far more
         | reliably than you're used to.
        
           | threeseed wrote:
           | This test only applies to a single instance of PostgreSQL.
           | 
           | If you're looking for HA or need to shard then it's
           | reliability is in question since it's never been tested.
        
         | redwood wrote:
         | Was Jepsen a key contributor to your choice to migrate? Are you
         | using PG in a distributed/replicated/HA mode like mongo?
        
           | popotamonga wrote:
           | -Yes but not the only one, was a succession of problems (why
           | did i use mongo on the first place, on a transaction heavy
           | callcenter database? Because the customer forced it because
           | it was the only thing he knew)
           | 
           | -No just a single huge instanced, managed on Azure
        
         | snuxoll wrote:
         | There were edge cases in PostgreSQL's SERIALIZABLE isolation
         | level - which is supposed to ensure that concurrent
         | transactions behave as if they were committed sequentially.
         | 
         | Specifically - if running a transaction as SERIALIZABLE there
         | was a very small chance that you might not see a rows inserted
         | by another transaction that committed before you in the order.
         | Many applications don't need this level of transaction
         | isolation - but for those that do it's somewhat scary to know
         | this was lurking under the bed.
         | 
         | Every implementation of a "bank" system where you keep track of
         | deposits and withdrawals is a use-case for SERIALIZABLE, and
         | this means a double-spend could happen because the next
         | transaction didn't see an account just had a transaction that
         | drained the balance, for example.
         | 
         | Props to Jepsen for finding this.
        
           | detaro wrote:
           | The common bank example as I understand it doesn't require
           | serializable, but only snapshot isolation: If two
           | transactions both drain the source balance, the one that
           | commits last will fail, because its snapshot doesn't match
           | the state anymore.
        
             | snuxoll wrote:
             | If you're UPDATEing a balance on some account table - yes.
             | If you're using a ledger and calculating balances (which
             | you SHOULD) then SERIALIZABLE is needed.
        
           | greggyb wrote:
           | The bank example is useful, because it tends to elicit the
           | right thinking for people, but banking has a long history of
           | eventual consistency.
           | 
           | For the vast majority of the history of banking, local
           | branches (which is a very loose term here, e.g. a family
           | member of the guy you know in your hometown, rather than an
           | actual physical establishment) would operate on local
           | knowledge only. Consistency is achieved only through a
           | regular reconciliation process.
           | 
           | Even in more modern, digital times, banks depend on large
           | batch processes and reconciliation processes.
        
           | rossmohax wrote:
           | I'd say MOST non trivial application require SERIALIZABLE.
           | Every time apps does `BEGIN; SELECT WHERE; INSERT/UPDATE;
           | COMMIT` it needs `SERIALIZABLE`, becuase it is only level
           | catching cases, where concurrent transaction adds rows so
           | that SELECT WHERE changes it's result set and therefore
           | subsequent INSERT/UPDDATE should be done with different
           | values.
        
         | mekoka wrote:
         | It means you will need to patch your pg in the next release
         | scheduled in August
         | https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit...
        
       | KingOfCoders wrote:
       | We laughed when this happend to MongoDB.
       | 
       | The difference though is the reaction from the vendor.
        
         | ldng wrote:
         | For me, MongoDB has track record of bolstering a lot
         | ("webscale") and hiding/denying mistakes.
         | 
         | PostgreSQL is quite the opposite on that front, confident yet
         | open to critics and abble to admit mistakes. Hell, I've even
         | them present their mistakes at conferences and ask for help.
        
           | blablabla123 wrote:
           | Yes, for instance not returning errors in some cases when
           | writes fail. I think this was until version 2 but to be fair
           | they fixed this kind of stuff and started to deal with this
           | differently later on. However their reputation never fully
           | recovered from this.
        
         | aphyr wrote:
         | Fun story: after the last report which called them out for not
         | talking about write loss by default, MongoDB updated their
         | Jepsen page to say that the analysis didn't observe lost
         | writes. I guess they assumed that people wouldn't... read the
         | abstract? Let alone the report?
        
       | micimize wrote:
       | This is my understanding of what a G2-Item Anti-dependecy Cycle
       | is from the linked paper example:                 -- Given
       | (roughly) the following transactions:              -- Transaction
       | 1 (SELECT, T1)       with all_employees as (         select
       | sum(salary) as salaries         from employees       ),
       | department as (         select department, sum(salary) as
       | salaries         from employees group by department       )
       | select sum(all_employees.salaries) - sum(department.salaries);
       | -- Transaction 2 (INSERT, T2)       insert into employees (name,
       | department, salary)       values ('Tim', 'Sales', 70000);
       | -- G2-Item is where the INSERT completes between all_employees
       | and department,       -- making the SELECT result inconsistent
       | 
       | This is called an "anti-dependency" issue because T2 clobbers the
       | data T1 depends on before it completes.
       | 
       | They say Elle found 6 such cases in 2 min, which I'm guessing is
       | a "very big number" of transactions, but can't figure out exactly
       | how big that number is based on the included logs/results.
       | 
       | Also, "Elle has found unexpected anomalies in every database
       | we've checked"
        
         | aphyr wrote:
         | Yeah, it was relatively infrequent in that particular workload
         | --dramatically less than PostgreSQL's "repeatable read"
         | exhibited. These histories are roughly 15K successful
         | transactions long--see the stats field in results.edn. I'm
         | hesitant to make strong statements about frequency, because I
         | suspect this kind of thing depends strongly on workload, but I
         | would hazard a gueesssss that it's not super common.
        
       | [deleted]
        
       | [deleted]
        
       | brandur wrote:
       | Personally, this kind of thing actually gives me _more_
       | confidence in Postgres rather than less. The core team's
       | responsiveness to this bug report was incredibly impressive.
       | 
       | Around June 4th, the article's author comes in with a bug report
       | that basically says "I hammered Postgres with a whole bunch of
       | artificial load and made something happen" [1].
       | 
       | By the 8th, a preliminary patch is ready for review [2]. That
       | includes all the time to get the author's testing bootstrap up
       | and running, reproduce, diagnose the bug (which, lest us forget,
       | is the part of all of this that is actually hard), and assemble a
       | fix. It's worth noting that it's no one's job per se on the
       | Postgres project of fix this kind of thing -- the hope is that
       | someone will take interest, step up, and find a solution -- and
       | as unlikely as that sounds to work in most environments,
       | amazingly, it usually does for Postgres.
       | 
       | Of note to the hacker types here, Peter Geoghegan was able to
       | track the bug down through the use of rr [4] [5], which allowed
       | an entire problematic run to be captured, and then stepped
       | through forwards _and_ backwards (the latter being the key for
       | not having to run the simulation over and over again) until the
       | problematic code was identified and a fix could be developed.
       | 
       | ---
       | 
       | [1] https://www.postgresql.org/message-
       | id/CAH2-Wzm9kNAK0cbzGAvDt...
       | 
       | [2] https://www.postgresql.org/message-
       | id/CAH2-Wzk%2BFHVJvSS9VPP...
       | 
       | [3] https://www.postgresql.org/message-
       | id/CAH2-WznTb6-0fjW4WPzNQ...
       | 
       | [4] https://en.wikipedia.org/wiki/Rr_(debugging)
       | 
       | [5] https://www.postgresql.org/message-
       | id/CAH2-WznTb6-0fjW4WPzNQ...
        
         | jwr wrote:
         | Indeed -- it's great to see a vendor (team, in this case) that
         | doesn't try to downplay a Jepsen result, and instead fixes the
         | issues.
         | 
         | However, there is one more takeaway here. I've heard too many
         | times "just use Postgres", repeated as an unthinking mantra.
         | But there are no obvious solutions in the complex world of
         | databases. And this isn't even a multi-node scenario!
        
           | arcticfox wrote:
           | > there is one more takeaway here
           | 
           | I don't think the "just use Postgres" mantra takes any hits
           | at all from this. (If anything, I feel better about it).
           | 
           | I've used maybe a dozen (?) databases/stores over the years -
           | graph databases, NoSQL databases, KV stores, the most boring
           | old databases, the sexiest new databases - and my general
           | approach is now to just use Postgres unless it really, really
           | doesn't fit. Works great for me.
        
             | jwr wrote:
             | All the answers to my post are missing the point.
             | 
             | I'm happy Postgres works for you. It works for me, too, in
             | a number of setups. But one should never accept advice like
             | "just use Postgres" without thinking and careful
             | consideration. As the Jepsen article above shows.
        
               | Carpetsmoker wrote:
               | I have rarely seen people give "just use PostgreSQL" as
               | advice, but rather "just use PostgreSQL unless you have a
               | compelling reason not to". There's a pretty big
               | difference between the two.
        
           | pnathan wrote:
           | "Use PostGres until you have an engineering - data driven
           | rationale not to" is my standard answer for non-blob data
           | storage when a project starts.
           | 
           | Why? because when `n` is small (tables, rows, connections),
           | postgres works well enough, and if `n` should ever become
           | large, we'll have interest in funding the work to do a
           | migration, if that's appropriate - and we'll be able to
           | evaluate the system at scale with real data.
        
           | Scarbutt wrote:
           | "Just use Postgres" may have become meme but for good reason
           | and is well grounded IMO.
           | 
           | Many immature databases with not much wide use are better
           | avoided though, we manage to break datomic three times during
           | development, the first two bugs were fixed in a week, the
           | third took a month, which they called in their changelog
           | "Fix: Prevent a rare scenario where retracting non-existent
           | entities could prevent future transactions from succeeding"
           | so yeah, we went back to "just use postgres", who wants to go
           | through the nightmare of hitting those bugs in production and
           | who knows how many more?scary situation.
        
         | aphyr wrote:
         | Yeah, the PostgreSQL team really knocked it out of the park on
         | this one. It was a pleasure working with them. :)
        
           | MoOmer wrote:
           | To be fair, you have a great batting average in identifying
           | issues to allow for improvement. Thanks for your work
        
           | AtlasBarfed wrote:
           | With distributed and multicore being the path forward with
           | the end of Moores law, your work has been instrumental in
           | helping open source distributed systems improve.
           | 
           | Since distributed systems are so difficult and complicated,
           | it enables salespeople and zealots to both deny issues and
           | overstate capability.
           | 
           | Your work is a shining star in that darkness. Thank you.
        
         | gen220 wrote:
         | Thank you for this comment that gives credit where it's due,
         | this is a very impressive set of threads to read through.
         | 
         | And I agree. For me, one of the most important measures of the
         | reliability of a system is how that system responds too
         | information that it might be wrong. If the response is
         | defensiveness, evasiveness, or persuasive in any way, i.e. of
         | the "it's not _that_ bad " variety, run for the hills. This, on
         | the other hand is technical, validating, and prompt.
         | 
         | Every system has bugs, but depending on these cultural
         | features, not every system is capable of systematically
         | removing those bugs. With logs like these, the pg community
         | continues to prove capable. Kudos!
        
           | tetha wrote:
           | >If the response is defensiveness, evasiveness, or persuasive
           | in any way, i.e. of the "it's not that bad" variety, run for
           | the hills. This, on the other hand is technical, validating,
           | and prompt.
           | 
           | This resonates with me with teams inside the company as well.
           | 
           | We have a few teams that just deflect issues. Find any issue
           | in the bug report, be it an FQDN in a log search, and poof it
           | goes. Back to sender, don't care. Engineers in my team just
           | don't care to report bugs there anymore, regardless how
           | simple. Usually, it's faster and less frustrating to just
           | work around it or ignore it. You could be fighting windmills
           | for weeks, or just fudge around it.
           | 
           | Other teams, far more receptive with bugs.. engineers end up
           | curious and just poke around until they understand what's up.
           | And then you have bug reports like "Ok, if I create these 57
           | things over here, and toggle thing 32 to off, and then toggle
           | 2 things indexed by prime numbers on, then my database
           | connection fails. I've reproduced this from empty VMs. If 32
           | is on, I need to toggle two perfect squares on, but not 3".
           | And then a lot of things just get fixed.
        
           | shawn-butler wrote:
           | What are the storage requirements for using rr for intense or
           | longer debugging sessions?
        
             | gen220 wrote:
             | this paper describes rr, which for context was designed to
             | be used on commodity hardware:
             | https://arxiv.org/pdf/1610.02144.pdf
             | 
             | section 4.4 talks about disk requirements:
             | 
             | > Memory-mapped files are almost entirely just the
             | executables and libraries loaded by tracees. As long as the
             | original files don't change and are not removed, which is
             | usually true in practice, their clones take up no
             | additional space and require no data writes
             | 
             | > Like cloned files, cloned file blocks do not consume
             | space as long as the underlying data they're cloned from
             | persists.
             | 
             | they conclude the section with:
             | 
             | > In any case, in real-world usage trace storage has not
             | been a concern
             | 
             | I imagine that over "longer debugging sessions" the
             | metadata footprint would expand linearly, but probably with
             | a constant smaller than the logs for the average program.
        
             | petergeoghegan wrote:
             | The exact recording in question was about 125MB, and that
             | was after I materialized it using "rr pack".
             | 
             | I'd say that the storage overhead is unlikely to be a
             | concern in almost all cases. It's just something that you
             | need to keep an eye on.
        
         | bredren wrote:
         | Thanks for this summary. I take for granted that I have a
         | Postgres, powerful And reliable database that I get to use for
         | free in all my projects and work.
        
       | emilyst wrote:
       | Ah, now I know why you hopped on IRC finally last week. :)
        
       | reitanqild wrote:
       | By the way: where does the Jepsen name come from?
       | 
       | I have wondered more than once and my browsing and searching
       | skills are failing me on this one.
       | 
       | Edit: The closest link I can find is "Call me maybe" but I am not
       | able to find a causation or even a direct link or mention for
       | now.
        
         | jdwithit wrote:
         | IIRC it's a joke referencing the pop song "Call Me Maybe" by
         | Carly Rae Jepsen and the unreliability of many of the systems
         | he tests.
        
         | aphyr wrote:
         | For legal reasons, Jepsen, the series on distributed database
         | safety, has nothing to do with any other thing, place, person,
         | or concept.
        
         | cp9 wrote:
         | it's named after the Carly Rae Jepsen song "Call Me Maybe"
        
         | ivanfon wrote:
         | There's an old Jepsen post that used to be referencing that
         | song, but it looks like it's been modified/renamed now:
         | https://aphyr.com/posts/284-call-me-maybe-mongodb
         | 
         | (you can still see it in the url)
        
           | amyjess wrote:
           | When I first discovered aphyr's site, all of the test
           | articles began with "Call Me Maybe:" rather than "Jepsen:",
           | and then one day all the articles were renamed.
           | 
           | I've always suspected he changed it for legal reasons, and
           | his comment elsewhere in this thread pretty much confirms it.
        
         | perlgeek wrote:
         | I don't actually know, but I could imagine it's a tribute to
         | Carly Rae Jepsen and their song "Call me maybe".
         | 
         | I dimly recall that either Aphyr's blog or the jepsen blog was
         | called "call me maybe" in the earlier days.
        
           | [deleted]
        
           | twunde wrote:
           | Here's at least one reference to it:
           | https://www.informationweek.com/database/the-man-who-
           | torture... And it looks like earlier versions of the github
           | project looked more like this:
           | https://github.com/threadwaste/jepsen with references to the
           | Carly Rae Jepson song in the project description in both the
           | README and in github.
           | 
           | Actually, it looks like the original talk (Slides:
           | https://aphyr.com/media/jepsen-ricon-east.pdf has multiple
           | references) and the original blog post has a slug referring
           | to the song https://aphyr.com/posts/281-call-me-maybe-carly-
           | rae-jepsen-a...
        
         | takeda wrote:
         | Yes, it started as a hobby and turned into a business, but yes,
         | the song is the inspiration. It basically was testing
         | distributed systems with network partitioning (i.e. services
         | not calling back etc)
         | 
         | https://aphyr.com/posts/281-jepsen-on-the-perils-of-network-...
         | 
         | https://aphyr.com/posts/281-call-me-maybe-carly-rae-jepsen-a...
        
         | tta wrote:
         | Slides 12 and 13 here should help:
         | https://aphyr.com/media/talk.pdf
        
           | reitanqild wrote:
           | That is a 403 for me, and based on aphyr's answer above
           | that's OK with me.
        
       | pkilgore wrote:
       | > Neither process crashes, multiple tables, nor secondary-key
       | access is required to reproduce our findings in this report. The
       | technical justification for including them in this workload is
       | "for funsies".
       | 
       | Always read the footnotes!
        
       | ordx wrote:
       | Any plans to test any other NoSQL databases? I'm interested in
       | MarkLogic
        
       | redwood wrote:
       | It would be great to see Jepsen testing on distributed Postgres
       | as this is a single node issue they've found here. In prod don't
       | folks run HA?
        
         | [deleted]
        
         | aphyr wrote:
         | I started this analysis intending to do just that--it's been
         | difficult, however, to figure out which of the dozens of
         | replication/HA configurations to actually test. I settled on
         | Stolon, since it seemed to make the strongest safety claims.
         | However, I found bugs which turned out to be PostgreSQL's
         | fault, so I backed off to investigate those first.
        
           | qeternity wrote:
           | And herein lies the rub: HA Postgres is an extremely painful
           | proposition. Based on our non-scientific research, Patroni
           | seems to be the most battle tested solution, and as popular
           | if not more so than Stolon.
        
             | ksec wrote:
             | Is there a proposed roadmap for basic / default solution of
             | HA Postgres? It seems MySQl has this well covered and
             | Postgres continue to think it is not a core part of their
             | DB and relies on third party. ( Not suggesting that is
             | necessary a bad thing )
        
             | zozbot234 wrote:
             | "HA in Postgres" does not have a very well-defined meaning.
             | The Postgres documentation provides an overview of
             | different viable solutions:
             | https://www.postgresql.org/docs/12/different-replication-
             | sol... with features and drawbacks for each. But to call it
             | "extremely painful" seems to be a bit overstated.
        
             | aphyr wrote:
             | Patroni's documentation also seems to suggest that even
             | with the strongest settings, it can lose transactions;
             | Stolon makes stronger claims.
        
               | satyanash wrote:
               | > documentation also seems to suggest that even with the
               | strongest settings, it can lose transactions;
               | 
               | Can be reproduced even on a single node postgres. Just
               | hammer it with inserts and maintain a local counter for
               | inserts performed. Then, kill9 the postgres process.
               | You'd expect your local counter to match the actual rows
               | inserted, but you'll find that your counter will always
               | be "less" than the actual rows inserted. Like any
               | "networked" system, it is possible to lose commit
               | acknowledgments even if the commit itself was successful.
               | 
               | So yes, you've not "lost" transactions per se. You've
               | "gained" them, but it is still a data issue in either
               | case.
        
               | anarazel wrote:
               | The classical solution to that is to use 2PC. But often
               | it's not worth it...
        
               | feike wrote:
               | Patroni does have synchronous_mode_strict setting, which
               | may be what you're looking for:
               | 
               | This parameter prevents Patroni from switching off the
               | synchronous replication on the primary when no
               | synchronous standby candidates are available. As a
               | downside, the primary is not be available for writes
               | (unless the Postgres transaction explicitly turns of
               | synchronous_mode), blocking all client write requests
               | until at least one synchronous replica comes up.
               | 
               | https://patroni.readthedocs.io/en/latest/replication_mode
               | s.h...
               | 
               | edit: seems I missed this discussion on twitter:
               | https://twitter.com/jepsen_io/status/1265626035380346881
        
               | aphyr wrote:
               | Er, again, the docs say "it is still possible to lose
               | transactions even when using synchronous_mode_strict".
               | I've talked about this with some of the Stolon folks on
               | Twitter, and we're not exactly sure how that manifests--
               | possibly an SI or SSI violation.
        
               | qeternity wrote:
               | Ah, I presumed you were talking about distributed failure
               | situations (split brain, etc) as opposed the to PG level
               | replication (which most solutions orchestrate anyway).
        
             | mwcampbell wrote:
             | > HA Postgres is an extremely painful proposition.
             | 
             | Does anyone here know how Amazon RDS's HA setup,
             | particularly their multi-AZ option, works? That seems to be
             | a switch that the AWS customer can just turn on. Do they
             | have a proprietary implementation, even for non-Aurora
             | Postgres?
        
               | threeseed wrote:
               | They basically have built a proprietary, distributed
               | block store.
               | 
               | And on top of this they have layered PostgreSQL, MySQL,
               | MongoDB, Cassandra etc.
               | 
               | I doubt they will never release the code for it since
               | it's very much a competitive advantage.
        
               | devit wrote:
               | So if the hardware running the database is suddenly
               | destroyed they try to start another instance really fast?
               | 
               | That seems inferior to having multiple sync replicas
               | ready to take over without having to start a process and
               | replay the WAL.
               | 
               | Also, such an HA block store seems very easy to replicate
               | ( I'd guess there would be something open source
               | already), not much of a competitive advantage.
        
               | redis_mlc wrote:
               | - using block-level replication allows them to support
               | multiple databases in a common way
               | 
               | - block-level replication can be more reliable in the
               | long run operationally than some types of database
               | replication, especially MySQL back in the day
               | 
               | - block-level replication has more scalable support staff
               | available than hiring DBAs to fix database replication
               | problems
               | 
               | - programming for all the edge cases is something that is
               | a competitive advantage
               | 
               | - no licensing required for it
               | 
               | - you can probably guess which Open Source project it's
               | based on
               | 
               | Source: DBA, worked there.
        
               | aeyes wrote:
               | Here is pretty much the most detailed post about how it
               | works you'll be able to find in public:
               | https://aws.amazon.com/blogs/database/amazon-rds-under-
               | the-h...
               | 
               | They basically do replication at the storage layer. Each
               | write has to be acknowledged by both the primary and
               | secondary EBS volume.
        
             | LunaSea wrote:
             | It's one of the reasons for which NoSQL databases got a lot
             | of publicity during the early 2010's.
        
               | threeseed wrote:
               | And are still widely used today.
               | 
               | People like to criticise NoSQL databases like MongoDB etc
               | but at least they took on the challenge of making
               | clustering easy enough to use and safe enough to rely on.
               | Especially because it such a complex and error prone
               | challenge.
        
               | biggestdummy wrote:
               | Odd that you would point out MongoDB as your named
               | example, as it is pretty awful at sharding/clustering.
               | For HA, the more better example would be Cassandra or
               | Scylla. Mongo's success is more tied to the ease of
               | development with a native JSON document DB, rather than
               | any claims to scalability. (Insert "Mongodb is webscale"
               | video here.)
        
               | threeseed wrote:
               | MongoDB was called out because of its ease of use. You
               | can create replica sets and shards in seconds. And for
               | many use cases it works great.
               | 
               | Cassandra is one of if not the best since it's multi-
               | master but it's a little bit more complex to setup.
        
       | camgunz wrote:
       | Reading through the source of Elle:
       | 
       | > "I cannot begin to convey the confluence of despair and
       | laughter which I encountered over the course of three hours
       | attempting to debug this issue. We assert that all keys have the
       | same type, and that at most one integer type exists. If you put a
       | mix of, say, Ints and Longs into this checker, you WILL question
       | your fundamental beliefs about computers" [1].
       | 
       | I feel like Jepsen/Elle is a great argument for Clojure, reading
       | the source is actually kind of fun. Not what you'd expect for a
       | project like this.
       | 
       | [1]: https://github.com/jepsen-
       | io/elle/blob/master/src/elle/txn.c...
        
         | agambrahma wrote:
         | Wonder if this "manual type constraints"-style code is
         | pre-"spec"
        
           | aphyr wrote:
           | Normally I'm a core.typed person, but static type constraints
           | don't quite make sense here. We _want_ heterogeneity in some
           | cases (e.g. you want to be able to mix nils and ints), but
           | not in others (e.g. this short and int mixing, which _could_
           | be intentional, but also, might not be)
           | 
           | I've considered spec as well, but spec has a weird insistence
           | that a keyword has exactly one meaning in a given namespace,
           | which is emphatically _not_ the case in pretty much any code
           | I 've tried to verify. Also its errors are... not exactly
           | helpful.
        
             | lemming wrote:
             | This is interesting to me as a Clojure person - you would
             | be approximately the first person I've seen using
             | core.typed since CircleCI's post in 2015 discussing why it
             | didn't work for them. Are you using more modern versions of
             | core.typed? What's the experience like these days?
        
               | aphyr wrote:
               | I don't use it often. In general, I've found the number
               | of bugs I catch with core.typed doesn't justify the time
               | investment in convincing things to typecheck--my tests
               | generally (not always, of course!) find type issues
               | first. I also tend to do a lot of weird performance-
               | oriented stateful stuff with java interop, which brings
               | me into untyped corners of the library.
               | 
               | That said, I've found core.typed helpful in managing
               | complex state transformations, especially in namespaces
               | which have, say, five or six similar representations of
               | the same logical thing. What do you do when a "node" is a
               | hostname, a logical identifier in Jepsen, an identifier
               | in the database itself, a UID, and a UID+signature pair?
               | Managing those names can be tricky, and having a type
               | system really helps.
        
           | camgunz wrote:
           | Elle is pretty new so I would guess not--unless it's been
           | lurking somewhere else. Dunno what aphyr's thoughts on spec
           | are, plus I'm an amateur clojurian so, I'm not sure what
           | community consensus is or if spec has drawbacks that make it
           | not a good fit.
        
       | [deleted]
        
       | arghwhat wrote:
       | It is very rare to see a Jepsen report that concludes with a note
       | that a project is being too humble about their consistency
       | promises.
       | 
       | Finding effectively only a single obscure and now fixed issue
       | where real-world consistency did not match the promised
       | consistency is pretty impressive.
        
         | rossmohax wrote:
         | > Finding effectively only a single obscure and now fixed issue
         | where real-world consistency did not match the promised
         | consistency is pretty impressive.
         | 
         | They also admitted, that testing framework cannot evaluate more
         | complex scenarios with subqueries, aggregates and predicates.
         | So it is possible, that PG consistency promises are spot on or
         | maybe even overpromising.
        
           | willvarfar wrote:
           | Let's hope the tests grow in scope!
        
       | rolls-reus wrote:
       | So this does not affect SSI guarantees if the transactions
       | involved all operate on the same row? Is my understanding
       | correct? For instance can I update a counter with serializable
       | isolation and not run into this bug?
        
         | aphyr wrote:
         | I think so, yeah. You _could_ theoretically have a G2-item
         | anomaly on a single key, but in PostgreSQL 's case, the usual
         | write-set conflict checking seems to prevent them.
        
       | feike wrote:
       | This postgresql mailing list thread allows you to read along with
       | the PostgreSQL developers and Jepsen, seems like a very useful
       | discussion: https://www.postgresql.org/message-
       | id/flat/db7b729d-0226-d16...
        
         | aeontech wrote:
         | This is just such a pleasure to read, even as someone that has
         | only surface awareness of database internals at all. Both for
         | the incredibly friendly and professional tone, and for the
         | obvious deep technical knowledge on both sides.
         | 
         | And that first email, my god, that should be titanium-and-gold-
         | plated standard of a bug report.
        
           | bloopernova wrote:
           | > that first email, my god, that should be titanium-and-gold-
           | plated standard of a bug report.
           | 
           | It's a thing of beauty. It even includes versions of software
           | used!
           | 
           | My daily experience with bug reports are that they 50/50
           | won't even include a description, just a title. It's such a
           | cliche already, but "project name is broken" makes my blood
           | boil. What environment? What were you doing? Is this
           | production? How do I test this bug? (from an Ops perspective)
           | When did you notice this? Has anything changed recently to
           | possibly cause an error?
           | 
           | Arg, my blood pressure!
           | 
           | /offtopic, sorry.
        
       ___________________________________________________________________
       (page generated 2020-06-12 23:00 UTC)