[HN Gopher] Jepsen: PostgreSQL 12.3 ___________________________________________________________________ Jepsen: PostgreSQL 12.3 Author : aphyr Score : 540 points Date : 2020-06-12 13:03 UTC (9 hours ago) (HTM) web link (jepsen.io) (TXT) w3m dump (jepsen.io) | threeseed wrote: | I am still wondering when we will see PostgreSQL being tested in | a HA form. | | It's just extraordinary to me that it's 2020 and it still does | not have a built-in, supported set of features for supporting | this use case. Instead we have to rely on proprietary vendor | solutions or dig through the many obsolete or unsupported | options. | castorp wrote: | There is a built-in supported set of features for high | availability. What exactly are you missing? | devit wrote: | Stolon or an equivalent being officially blessed by the | PostgreSQL team and made part of the official distribution. | | Also, same for a multi-master solution. | phaemon wrote: | The option to install postgres on three instances, specify | that they're in cluster "foo" and then it just works, | including automatically fixing any issues when one of the | instances drops out and rejoins. | | That's what other DBs have but it seems to be missing from | postgres. If it now exists could you point me to the doc | explaining how to do this? | [deleted] | mekoka wrote: | Props to Jensen for exposing this longtime bug. Props to the PG | team for identifying the culprit and their response. This report | just strengthens my faith in the project. | sandGorgon wrote: | > _PostgreSQL has an extensive suite of hand-picked examples, | called isolationtester, to verify concurrency safety. Moreover, | independent testing, like Martin Kleppmann's Hermitage has also | confirmed that PostgreSQL's serializable level prevents (at least | some!) G2 anomalies. Why, then, did we immediately find G2-item | with Jepsen? How has this bug persisted for so long?_ | | This is super interesting. Jepsen seems to be like Hypothesis for | race conditions: you specify the race condition to be triggered | and it generates tests to simulate it. | | Yesterday, Gitlab acquired a fuzz testing company[1]. I wonder if | Jepsen was envisioned as a full CI integrated testing system | | [1] https://m.calcalistech.com/Article.aspx?guid=3832552 | aphyr wrote: | Yes. Jepsen and Hypothesis both descend from a long line of | property-based testing systems--mostly notably, Haskell & | Erlang's QuickCheck. Jepsen makes a number of unusual choices | specific to testing concurrent distributed systems: notably, we | don't do much shrinking (real-world systems are staggeringly | nondeterministic). Jepsen also includes tooling for automated | deploys, fault injection, a language for specifying complex | concurrent schedules, visualizations, storage, and an array of | sophisticated property checkers. | sandGorgon wrote: | Is Jepsen for testing - say the microservices for Uber? | | Or is it specific to the people who build things like | databases, api frameworks,etc. | aphyr wrote: | You can test pretty much any kind of concurrent system | using Jepsen: in-memory data structures, filesystems, | databases, queues, APIs, services, etc. Not all the tooling | is applicable to every situation, but it's pretty darn | general. | theptip wrote: | Do you know of anyone using Jepsen to torture their | microservices? This sounds like a really interesting | usecase. | wildchild wrote: | Postgresql is a bullshit database. | popotamonga wrote: | What does this really mean? I just migrated from mongo to Pg. | petergeoghegan wrote: | The default isolation level is read committed mode, whereas the | bug in question only affected applications that use | serializable mode. You have to ask for serializable mode | explicitly; if you're not, then you cannot possibly be affected | by the bug. (Perhaps you _should_ consider using a higher | isolation level, but that would be equally true with or without | this bug.) | nkozyra wrote: | It's an isolation issue but if you're coming from Mongo I'd | broadly guess it's not one you're going to trigger. Also, look | at their other analyses ... they're very detailed and upfront | about serialization isolation issues in a lot of huge | databases/datastores. | | Noteworthy: "In most respects, PostgreSQL behaved as expected: | both read uncommitted and read committed prevent write skew and | aborted reads." | castorp wrote: | Postgres does not support read uncommitted | petergeoghegan wrote: | Technically it does. You can ask for read uncommitted mode, | though you'll just get read committed mode. This is correct | because you're getting the minimal guarantees that you | asked for. The SQL standard allows this. | oauea wrote: | If you came from mongo that means everything will work far more | reliably than you're used to. | threeseed wrote: | This test only applies to a single instance of PostgreSQL. | | If you're looking for HA or need to shard then it's | reliability is in question since it's never been tested. | redwood wrote: | Was Jepsen a key contributor to your choice to migrate? Are you | using PG in a distributed/replicated/HA mode like mongo? | popotamonga wrote: | -Yes but not the only one, was a succession of problems (why | did i use mongo on the first place, on a transaction heavy | callcenter database? Because the customer forced it because | it was the only thing he knew) | | -No just a single huge instanced, managed on Azure | snuxoll wrote: | There were edge cases in PostgreSQL's SERIALIZABLE isolation | level - which is supposed to ensure that concurrent | transactions behave as if they were committed sequentially. | | Specifically - if running a transaction as SERIALIZABLE there | was a very small chance that you might not see a rows inserted | by another transaction that committed before you in the order. | Many applications don't need this level of transaction | isolation - but for those that do it's somewhat scary to know | this was lurking under the bed. | | Every implementation of a "bank" system where you keep track of | deposits and withdrawals is a use-case for SERIALIZABLE, and | this means a double-spend could happen because the next | transaction didn't see an account just had a transaction that | drained the balance, for example. | | Props to Jepsen for finding this. | detaro wrote: | The common bank example as I understand it doesn't require | serializable, but only snapshot isolation: If two | transactions both drain the source balance, the one that | commits last will fail, because its snapshot doesn't match | the state anymore. | snuxoll wrote: | If you're UPDATEing a balance on some account table - yes. | If you're using a ledger and calculating balances (which | you SHOULD) then SERIALIZABLE is needed. | greggyb wrote: | The bank example is useful, because it tends to elicit the | right thinking for people, but banking has a long history of | eventual consistency. | | For the vast majority of the history of banking, local | branches (which is a very loose term here, e.g. a family | member of the guy you know in your hometown, rather than an | actual physical establishment) would operate on local | knowledge only. Consistency is achieved only through a | regular reconciliation process. | | Even in more modern, digital times, banks depend on large | batch processes and reconciliation processes. | rossmohax wrote: | I'd say MOST non trivial application require SERIALIZABLE. | Every time apps does `BEGIN; SELECT WHERE; INSERT/UPDATE; | COMMIT` it needs `SERIALIZABLE`, becuase it is only level | catching cases, where concurrent transaction adds rows so | that SELECT WHERE changes it's result set and therefore | subsequent INSERT/UPDDATE should be done with different | values. | mekoka wrote: | It means you will need to patch your pg in the next release | scheduled in August | https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit... | KingOfCoders wrote: | We laughed when this happend to MongoDB. | | The difference though is the reaction from the vendor. | ldng wrote: | For me, MongoDB has track record of bolstering a lot | ("webscale") and hiding/denying mistakes. | | PostgreSQL is quite the opposite on that front, confident yet | open to critics and abble to admit mistakes. Hell, I've even | them present their mistakes at conferences and ask for help. | blablabla123 wrote: | Yes, for instance not returning errors in some cases when | writes fail. I think this was until version 2 but to be fair | they fixed this kind of stuff and started to deal with this | differently later on. However their reputation never fully | recovered from this. | aphyr wrote: | Fun story: after the last report which called them out for not | talking about write loss by default, MongoDB updated their | Jepsen page to say that the analysis didn't observe lost | writes. I guess they assumed that people wouldn't... read the | abstract? Let alone the report? | micimize wrote: | This is my understanding of what a G2-Item Anti-dependecy Cycle | is from the linked paper example: -- Given | (roughly) the following transactions: -- Transaction | 1 (SELECT, T1) with all_employees as ( select | sum(salary) as salaries from employees ), | department as ( select department, sum(salary) as | salaries from employees group by department ) | select sum(all_employees.salaries) - sum(department.salaries); | -- Transaction 2 (INSERT, T2) insert into employees (name, | department, salary) values ('Tim', 'Sales', 70000); | -- G2-Item is where the INSERT completes between all_employees | and department, -- making the SELECT result inconsistent | | This is called an "anti-dependency" issue because T2 clobbers the | data T1 depends on before it completes. | | They say Elle found 6 such cases in 2 min, which I'm guessing is | a "very big number" of transactions, but can't figure out exactly | how big that number is based on the included logs/results. | | Also, "Elle has found unexpected anomalies in every database | we've checked" | aphyr wrote: | Yeah, it was relatively infrequent in that particular workload | --dramatically less than PostgreSQL's "repeatable read" | exhibited. These histories are roughly 15K successful | transactions long--see the stats field in results.edn. I'm | hesitant to make strong statements about frequency, because I | suspect this kind of thing depends strongly on workload, but I | would hazard a gueesssss that it's not super common. | [deleted] | [deleted] | brandur wrote: | Personally, this kind of thing actually gives me _more_ | confidence in Postgres rather than less. The core team's | responsiveness to this bug report was incredibly impressive. | | Around June 4th, the article's author comes in with a bug report | that basically says "I hammered Postgres with a whole bunch of | artificial load and made something happen" [1]. | | By the 8th, a preliminary patch is ready for review [2]. That | includes all the time to get the author's testing bootstrap up | and running, reproduce, diagnose the bug (which, lest us forget, | is the part of all of this that is actually hard), and assemble a | fix. It's worth noting that it's no one's job per se on the | Postgres project of fix this kind of thing -- the hope is that | someone will take interest, step up, and find a solution -- and | as unlikely as that sounds to work in most environments, | amazingly, it usually does for Postgres. | | Of note to the hacker types here, Peter Geoghegan was able to | track the bug down through the use of rr [4] [5], which allowed | an entire problematic run to be captured, and then stepped | through forwards _and_ backwards (the latter being the key for | not having to run the simulation over and over again) until the | problematic code was identified and a fix could be developed. | | --- | | [1] https://www.postgresql.org/message- | id/CAH2-Wzm9kNAK0cbzGAvDt... | | [2] https://www.postgresql.org/message- | id/CAH2-Wzk%2BFHVJvSS9VPP... | | [3] https://www.postgresql.org/message- | id/CAH2-WznTb6-0fjW4WPzNQ... | | [4] https://en.wikipedia.org/wiki/Rr_(debugging) | | [5] https://www.postgresql.org/message- | id/CAH2-WznTb6-0fjW4WPzNQ... | jwr wrote: | Indeed -- it's great to see a vendor (team, in this case) that | doesn't try to downplay a Jepsen result, and instead fixes the | issues. | | However, there is one more takeaway here. I've heard too many | times "just use Postgres", repeated as an unthinking mantra. | But there are no obvious solutions in the complex world of | databases. And this isn't even a multi-node scenario! | arcticfox wrote: | > there is one more takeaway here | | I don't think the "just use Postgres" mantra takes any hits | at all from this. (If anything, I feel better about it). | | I've used maybe a dozen (?) databases/stores over the years - | graph databases, NoSQL databases, KV stores, the most boring | old databases, the sexiest new databases - and my general | approach is now to just use Postgres unless it really, really | doesn't fit. Works great for me. | jwr wrote: | All the answers to my post are missing the point. | | I'm happy Postgres works for you. It works for me, too, in | a number of setups. But one should never accept advice like | "just use Postgres" without thinking and careful | consideration. As the Jepsen article above shows. | Carpetsmoker wrote: | I have rarely seen people give "just use PostgreSQL" as | advice, but rather "just use PostgreSQL unless you have a | compelling reason not to". There's a pretty big | difference between the two. | pnathan wrote: | "Use PostGres until you have an engineering - data driven | rationale not to" is my standard answer for non-blob data | storage when a project starts. | | Why? because when `n` is small (tables, rows, connections), | postgres works well enough, and if `n` should ever become | large, we'll have interest in funding the work to do a | migration, if that's appropriate - and we'll be able to | evaluate the system at scale with real data. | Scarbutt wrote: | "Just use Postgres" may have become meme but for good reason | and is well grounded IMO. | | Many immature databases with not much wide use are better | avoided though, we manage to break datomic three times during | development, the first two bugs were fixed in a week, the | third took a month, which they called in their changelog | "Fix: Prevent a rare scenario where retracting non-existent | entities could prevent future transactions from succeeding" | so yeah, we went back to "just use postgres", who wants to go | through the nightmare of hitting those bugs in production and | who knows how many more?scary situation. | aphyr wrote: | Yeah, the PostgreSQL team really knocked it out of the park on | this one. It was a pleasure working with them. :) | MoOmer wrote: | To be fair, you have a great batting average in identifying | issues to allow for improvement. Thanks for your work | AtlasBarfed wrote: | With distributed and multicore being the path forward with | the end of Moores law, your work has been instrumental in | helping open source distributed systems improve. | | Since distributed systems are so difficult and complicated, | it enables salespeople and zealots to both deny issues and | overstate capability. | | Your work is a shining star in that darkness. Thank you. | gen220 wrote: | Thank you for this comment that gives credit where it's due, | this is a very impressive set of threads to read through. | | And I agree. For me, one of the most important measures of the | reliability of a system is how that system responds too | information that it might be wrong. If the response is | defensiveness, evasiveness, or persuasive in any way, i.e. of | the "it's not _that_ bad " variety, run for the hills. This, on | the other hand is technical, validating, and prompt. | | Every system has bugs, but depending on these cultural | features, not every system is capable of systematically | removing those bugs. With logs like these, the pg community | continues to prove capable. Kudos! | tetha wrote: | >If the response is defensiveness, evasiveness, or persuasive | in any way, i.e. of the "it's not that bad" variety, run for | the hills. This, on the other hand is technical, validating, | and prompt. | | This resonates with me with teams inside the company as well. | | We have a few teams that just deflect issues. Find any issue | in the bug report, be it an FQDN in a log search, and poof it | goes. Back to sender, don't care. Engineers in my team just | don't care to report bugs there anymore, regardless how | simple. Usually, it's faster and less frustrating to just | work around it or ignore it. You could be fighting windmills | for weeks, or just fudge around it. | | Other teams, far more receptive with bugs.. engineers end up | curious and just poke around until they understand what's up. | And then you have bug reports like "Ok, if I create these 57 | things over here, and toggle thing 32 to off, and then toggle | 2 things indexed by prime numbers on, then my database | connection fails. I've reproduced this from empty VMs. If 32 | is on, I need to toggle two perfect squares on, but not 3". | And then a lot of things just get fixed. | shawn-butler wrote: | What are the storage requirements for using rr for intense or | longer debugging sessions? | gen220 wrote: | this paper describes rr, which for context was designed to | be used on commodity hardware: | https://arxiv.org/pdf/1610.02144.pdf | | section 4.4 talks about disk requirements: | | > Memory-mapped files are almost entirely just the | executables and libraries loaded by tracees. As long as the | original files don't change and are not removed, which is | usually true in practice, their clones take up no | additional space and require no data writes | | > Like cloned files, cloned file blocks do not consume | space as long as the underlying data they're cloned from | persists. | | they conclude the section with: | | > In any case, in real-world usage trace storage has not | been a concern | | I imagine that over "longer debugging sessions" the | metadata footprint would expand linearly, but probably with | a constant smaller than the logs for the average program. | petergeoghegan wrote: | The exact recording in question was about 125MB, and that | was after I materialized it using "rr pack". | | I'd say that the storage overhead is unlikely to be a | concern in almost all cases. It's just something that you | need to keep an eye on. | bredren wrote: | Thanks for this summary. I take for granted that I have a | Postgres, powerful And reliable database that I get to use for | free in all my projects and work. | emilyst wrote: | Ah, now I know why you hopped on IRC finally last week. :) | reitanqild wrote: | By the way: where does the Jepsen name come from? | | I have wondered more than once and my browsing and searching | skills are failing me on this one. | | Edit: The closest link I can find is "Call me maybe" but I am not | able to find a causation or even a direct link or mention for | now. | jdwithit wrote: | IIRC it's a joke referencing the pop song "Call Me Maybe" by | Carly Rae Jepsen and the unreliability of many of the systems | he tests. | aphyr wrote: | For legal reasons, Jepsen, the series on distributed database | safety, has nothing to do with any other thing, place, person, | or concept. | cp9 wrote: | it's named after the Carly Rae Jepsen song "Call Me Maybe" | ivanfon wrote: | There's an old Jepsen post that used to be referencing that | song, but it looks like it's been modified/renamed now: | https://aphyr.com/posts/284-call-me-maybe-mongodb | | (you can still see it in the url) | amyjess wrote: | When I first discovered aphyr's site, all of the test | articles began with "Call Me Maybe:" rather than "Jepsen:", | and then one day all the articles were renamed. | | I've always suspected he changed it for legal reasons, and | his comment elsewhere in this thread pretty much confirms it. | perlgeek wrote: | I don't actually know, but I could imagine it's a tribute to | Carly Rae Jepsen and their song "Call me maybe". | | I dimly recall that either Aphyr's blog or the jepsen blog was | called "call me maybe" in the earlier days. | [deleted] | twunde wrote: | Here's at least one reference to it: | https://www.informationweek.com/database/the-man-who- | torture... And it looks like earlier versions of the github | project looked more like this: | https://github.com/threadwaste/jepsen with references to the | Carly Rae Jepson song in the project description in both the | README and in github. | | Actually, it looks like the original talk (Slides: | https://aphyr.com/media/jepsen-ricon-east.pdf has multiple | references) and the original blog post has a slug referring | to the song https://aphyr.com/posts/281-call-me-maybe-carly- | rae-jepsen-a... | takeda wrote: | Yes, it started as a hobby and turned into a business, but yes, | the song is the inspiration. It basically was testing | distributed systems with network partitioning (i.e. services | not calling back etc) | | https://aphyr.com/posts/281-jepsen-on-the-perils-of-network-... | | https://aphyr.com/posts/281-call-me-maybe-carly-rae-jepsen-a... | tta wrote: | Slides 12 and 13 here should help: | https://aphyr.com/media/talk.pdf | reitanqild wrote: | That is a 403 for me, and based on aphyr's answer above | that's OK with me. | pkilgore wrote: | > Neither process crashes, multiple tables, nor secondary-key | access is required to reproduce our findings in this report. The | technical justification for including them in this workload is | "for funsies". | | Always read the footnotes! | ordx wrote: | Any plans to test any other NoSQL databases? I'm interested in | MarkLogic | redwood wrote: | It would be great to see Jepsen testing on distributed Postgres | as this is a single node issue they've found here. In prod don't | folks run HA? | [deleted] | aphyr wrote: | I started this analysis intending to do just that--it's been | difficult, however, to figure out which of the dozens of | replication/HA configurations to actually test. I settled on | Stolon, since it seemed to make the strongest safety claims. | However, I found bugs which turned out to be PostgreSQL's | fault, so I backed off to investigate those first. | qeternity wrote: | And herein lies the rub: HA Postgres is an extremely painful | proposition. Based on our non-scientific research, Patroni | seems to be the most battle tested solution, and as popular | if not more so than Stolon. | ksec wrote: | Is there a proposed roadmap for basic / default solution of | HA Postgres? It seems MySQl has this well covered and | Postgres continue to think it is not a core part of their | DB and relies on third party. ( Not suggesting that is | necessary a bad thing ) | zozbot234 wrote: | "HA in Postgres" does not have a very well-defined meaning. | The Postgres documentation provides an overview of | different viable solutions: | https://www.postgresql.org/docs/12/different-replication- | sol... with features and drawbacks for each. But to call it | "extremely painful" seems to be a bit overstated. | aphyr wrote: | Patroni's documentation also seems to suggest that even | with the strongest settings, it can lose transactions; | Stolon makes stronger claims. | satyanash wrote: | > documentation also seems to suggest that even with the | strongest settings, it can lose transactions; | | Can be reproduced even on a single node postgres. Just | hammer it with inserts and maintain a local counter for | inserts performed. Then, kill9 the postgres process. | You'd expect your local counter to match the actual rows | inserted, but you'll find that your counter will always | be "less" than the actual rows inserted. Like any | "networked" system, it is possible to lose commit | acknowledgments even if the commit itself was successful. | | So yes, you've not "lost" transactions per se. You've | "gained" them, but it is still a data issue in either | case. | anarazel wrote: | The classical solution to that is to use 2PC. But often | it's not worth it... | feike wrote: | Patroni does have synchronous_mode_strict setting, which | may be what you're looking for: | | This parameter prevents Patroni from switching off the | synchronous replication on the primary when no | synchronous standby candidates are available. As a | downside, the primary is not be available for writes | (unless the Postgres transaction explicitly turns of | synchronous_mode), blocking all client write requests | until at least one synchronous replica comes up. | | https://patroni.readthedocs.io/en/latest/replication_mode | s.h... | | edit: seems I missed this discussion on twitter: | https://twitter.com/jepsen_io/status/1265626035380346881 | aphyr wrote: | Er, again, the docs say "it is still possible to lose | transactions even when using synchronous_mode_strict". | I've talked about this with some of the Stolon folks on | Twitter, and we're not exactly sure how that manifests-- | possibly an SI or SSI violation. | qeternity wrote: | Ah, I presumed you were talking about distributed failure | situations (split brain, etc) as opposed the to PG level | replication (which most solutions orchestrate anyway). | mwcampbell wrote: | > HA Postgres is an extremely painful proposition. | | Does anyone here know how Amazon RDS's HA setup, | particularly their multi-AZ option, works? That seems to be | a switch that the AWS customer can just turn on. Do they | have a proprietary implementation, even for non-Aurora | Postgres? | threeseed wrote: | They basically have built a proprietary, distributed | block store. | | And on top of this they have layered PostgreSQL, MySQL, | MongoDB, Cassandra etc. | | I doubt they will never release the code for it since | it's very much a competitive advantage. | devit wrote: | So if the hardware running the database is suddenly | destroyed they try to start another instance really fast? | | That seems inferior to having multiple sync replicas | ready to take over without having to start a process and | replay the WAL. | | Also, such an HA block store seems very easy to replicate | ( I'd guess there would be something open source | already), not much of a competitive advantage. | redis_mlc wrote: | - using block-level replication allows them to support | multiple databases in a common way | | - block-level replication can be more reliable in the | long run operationally than some types of database | replication, especially MySQL back in the day | | - block-level replication has more scalable support staff | available than hiring DBAs to fix database replication | problems | | - programming for all the edge cases is something that is | a competitive advantage | | - no licensing required for it | | - you can probably guess which Open Source project it's | based on | | Source: DBA, worked there. | aeyes wrote: | Here is pretty much the most detailed post about how it | works you'll be able to find in public: | https://aws.amazon.com/blogs/database/amazon-rds-under- | the-h... | | They basically do replication at the storage layer. Each | write has to be acknowledged by both the primary and | secondary EBS volume. | LunaSea wrote: | It's one of the reasons for which NoSQL databases got a lot | of publicity during the early 2010's. | threeseed wrote: | And are still widely used today. | | People like to criticise NoSQL databases like MongoDB etc | but at least they took on the challenge of making | clustering easy enough to use and safe enough to rely on. | Especially because it such a complex and error prone | challenge. | biggestdummy wrote: | Odd that you would point out MongoDB as your named | example, as it is pretty awful at sharding/clustering. | For HA, the more better example would be Cassandra or | Scylla. Mongo's success is more tied to the ease of | development with a native JSON document DB, rather than | any claims to scalability. (Insert "Mongodb is webscale" | video here.) | threeseed wrote: | MongoDB was called out because of its ease of use. You | can create replica sets and shards in seconds. And for | many use cases it works great. | | Cassandra is one of if not the best since it's multi- | master but it's a little bit more complex to setup. | camgunz wrote: | Reading through the source of Elle: | | > "I cannot begin to convey the confluence of despair and | laughter which I encountered over the course of three hours | attempting to debug this issue. We assert that all keys have the | same type, and that at most one integer type exists. If you put a | mix of, say, Ints and Longs into this checker, you WILL question | your fundamental beliefs about computers" [1]. | | I feel like Jepsen/Elle is a great argument for Clojure, reading | the source is actually kind of fun. Not what you'd expect for a | project like this. | | [1]: https://github.com/jepsen- | io/elle/blob/master/src/elle/txn.c... | agambrahma wrote: | Wonder if this "manual type constraints"-style code is | pre-"spec" | aphyr wrote: | Normally I'm a core.typed person, but static type constraints | don't quite make sense here. We _want_ heterogeneity in some | cases (e.g. you want to be able to mix nils and ints), but | not in others (e.g. this short and int mixing, which _could_ | be intentional, but also, might not be) | | I've considered spec as well, but spec has a weird insistence | that a keyword has exactly one meaning in a given namespace, | which is emphatically _not_ the case in pretty much any code | I 've tried to verify. Also its errors are... not exactly | helpful. | lemming wrote: | This is interesting to me as a Clojure person - you would | be approximately the first person I've seen using | core.typed since CircleCI's post in 2015 discussing why it | didn't work for them. Are you using more modern versions of | core.typed? What's the experience like these days? | aphyr wrote: | I don't use it often. In general, I've found the number | of bugs I catch with core.typed doesn't justify the time | investment in convincing things to typecheck--my tests | generally (not always, of course!) find type issues | first. I also tend to do a lot of weird performance- | oriented stateful stuff with java interop, which brings | me into untyped corners of the library. | | That said, I've found core.typed helpful in managing | complex state transformations, especially in namespaces | which have, say, five or six similar representations of | the same logical thing. What do you do when a "node" is a | hostname, a logical identifier in Jepsen, an identifier | in the database itself, a UID, and a UID+signature pair? | Managing those names can be tricky, and having a type | system really helps. | camgunz wrote: | Elle is pretty new so I would guess not--unless it's been | lurking somewhere else. Dunno what aphyr's thoughts on spec | are, plus I'm an amateur clojurian so, I'm not sure what | community consensus is or if spec has drawbacks that make it | not a good fit. | [deleted] | arghwhat wrote: | It is very rare to see a Jepsen report that concludes with a note | that a project is being too humble about their consistency | promises. | | Finding effectively only a single obscure and now fixed issue | where real-world consistency did not match the promised | consistency is pretty impressive. | rossmohax wrote: | > Finding effectively only a single obscure and now fixed issue | where real-world consistency did not match the promised | consistency is pretty impressive. | | They also admitted, that testing framework cannot evaluate more | complex scenarios with subqueries, aggregates and predicates. | So it is possible, that PG consistency promises are spot on or | maybe even overpromising. | willvarfar wrote: | Let's hope the tests grow in scope! | rolls-reus wrote: | So this does not affect SSI guarantees if the transactions | involved all operate on the same row? Is my understanding | correct? For instance can I update a counter with serializable | isolation and not run into this bug? | aphyr wrote: | I think so, yeah. You _could_ theoretically have a G2-item | anomaly on a single key, but in PostgreSQL 's case, the usual | write-set conflict checking seems to prevent them. | feike wrote: | This postgresql mailing list thread allows you to read along with | the PostgreSQL developers and Jepsen, seems like a very useful | discussion: https://www.postgresql.org/message- | id/flat/db7b729d-0226-d16... | aeontech wrote: | This is just such a pleasure to read, even as someone that has | only surface awareness of database internals at all. Both for | the incredibly friendly and professional tone, and for the | obvious deep technical knowledge on both sides. | | And that first email, my god, that should be titanium-and-gold- | plated standard of a bug report. | bloopernova wrote: | > that first email, my god, that should be titanium-and-gold- | plated standard of a bug report. | | It's a thing of beauty. It even includes versions of software | used! | | My daily experience with bug reports are that they 50/50 | won't even include a description, just a title. It's such a | cliche already, but "project name is broken" makes my blood | boil. What environment? What were you doing? Is this | production? How do I test this bug? (from an Ops perspective) | When did you notice this? Has anything changed recently to | possibly cause an error? | | Arg, my blood pressure! | | /offtopic, sorry. ___________________________________________________________________ (page generated 2020-06-12 23:00 UTC)