[HN Gopher] Jepsen: MongoDB 4.2.6 ___________________________________________________________________ Jepsen: MongoDB 4.2.6 Author : aphyr Score : 526 points Date : 2020-05-24 11:42 UTC (11 hours ago) (HTM) web link (jepsen.io) (TXT) w3m dump (jepsen.io) | pier25 wrote: | > Normally we downweight follow-up posts | | So you manually moderate the content? | VonGuard wrote: | I mean, this was kind of an exception case, where there is a | big old technical war of words back and forth. Almost a "He | said She said" except here, He is an absolute expert, and She | is just some marketing dorks at Mongo. | | I, for one, welcome this by-hand moderation because it keeps | this issue alive, and allows Kyle to keep the discussion going. | | As I commented in a previous post, Kyle is the Chef Ramsey of | database testing, and here, he's in a position where some idiot | has just served him an undercooked hamburger. Bits will fly, | marketing people will be flayed alive, and Kyle will be the | only one left standing at the end. | | Without this by-hand moderation, we'd be missing out on the | second act of this intense thriller! | pier25 wrote: | I'm totally ok with the moderation/curation/whatever! | dang wrote: | Oh yes. HN has always been moderated/curated/whatever term you | prefer. Many past explanations can be found through these | links: | | https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que... | | https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que... | | https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que... | | https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que... | | (I've detached this subthread from | https://news.ycombinator.com/item?id=23294048 to prevent the | top comment from being too distracting.) | pier25 wrote: | Thanks for the links! | | It's totally fine with me, but I just wasn't aware of it. | [deleted] | DoreenMichele wrote: | They use a combination of algorithms and human intervention, to | generally good effect. | | No clue if this "downweighting" in this case is an algorithm or | a manual thing. I would assume algorithm for the downweighting | and human intervention for reversing it, but that's sort of a | guess or inference. | baq wrote: | of course they do? given the quality of discussion here it's a | hard requirement to preserve high snr. | rmdashrfstar wrote: | The main argument for using a documented-oriented database: | https://martinfowler.com/bliki/AggregateOrientedDatabase.htm... | nevi-me wrote: | Friendly question: did you update anything on the findings since | https://news.ycombinator.com/item?id=23191439 ? | aphyr wrote: | Nope! Something weird happened to that post; it got a lot of | upvotes and some comments, but never made it to frontpage. | After the InfoQ article took off yesterday, an HN mod got in | touch and asked if I'd like to resubmit it. | lllr_finger wrote: | Mongo has been _related to_ "perpetual irritation" up to "major | production issue" at all three of my last companies. | | For as easy as it is to use jsonb in Postgres, or Redis, or | RocksDB/SQLite, or whatever else depending on your use case - I | can't find any reason to advocate its use these days. In my | anecdotal experience, the success stories never happen, and | nearly developer I know has an unpleasant experience they can | share. | | Big thanks to aphyr and the Jepsen suite (and unrelated blog | posts like Hexing the Interview) for inspiring me to do thorough | engineering. | StavrosK wrote: | I find that using JSON for things you don't need to | query/validate (like big blobs you just want to store) and | breaking the rest out to columns works well enough. Plus, you | can always migrate the data out to a field anyway. | emerongi wrote: | Postgres 12 has generated columns, so you can throw your data | in a jsonb column and have Postgres pull data out of it into | separate columns for indexing for example. | magnushiie wrote: | Generated columns are not necessary for indexing in | Postgres, you can create an index on any expression based | on the record (supported by many versions now). | mtrycz2 wrote: | > I can't find any reason to advocate its use these days. | | Don't you know? It's web-scale. | rmdashrfstar wrote: | If I was a moderator on HN, I would instantly ban commenters | who continue to make these asinine posts. Is this Reddit, or | is HN striving to be Reddit? | reese_john wrote: | https://news.ycombinator.com/newsfaq.html | | __" Please don't post comments saying that HN is turning | into Reddit. It's a semi-noob illusion, as old as the | hills." __ | rmdashrfstar wrote: | Interesting taste of my own medicine. Will do, thanks for | the reminder! | ep103 wrote: | Is Postgres what most people would suggest as a MongoDB | replacement? | | Anyone have any suggestions for a true non-MongoDB jsonDocument | based noSql option? | jfkebwjsbx wrote: | The first question you must ask yourself is: do I really need | a document store? | | Because the answer is "no" in the overwhelmingly majority of | cases, specially if your product is mature. | zozbot234 wrote: | It depends what you're using it for. Postgres is a very good | all-around choice these days (compared to when the whole | 'noSql' thing got started) and also supports document-based | scenarios quite well via JSON/JSONB columns and its support | for these datatypes in queries, updates, indexing etc. | Sharding and replication can also be set up via fairly | general mechanisms, as described in pgSQL documentation. (For | instance, the FDW facility is often used to set up sharding, | but it could also support e.g. aggregation.) | threeseed wrote: | Note that there is no Jepsen test for those | sharding/replication features. | threeseed wrote: | As has been mentioned above PostgreSQL does not come out of | the box with a supported, tested clustering solution. | | Given that is a pretty popular part of MongoDB seems like an | important thing for people to continuously fail to mention. | tester756 wrote: | Why is this being here everyday for last 3 days? | judofyr wrote: | This is not directly related to this report or Jepsen, but since | you're here I've got to ask: Aphyr, are there any recent | papers/research in the realm of distributed databases which | you're excited about? | aphyr wrote: | Calvin and CRDTs aren't new, but I still think they're | dramatically underappreciated! Heidi Howard's recent work on | generalizing Paxos quorums is super intriguing, and from some | discussion with her, I think there are open possibilities in | making _leaderless_ single-round-trip consensus systems for | log-oriented FSMs, which is what pretty much everyone WANTS. | | I'm also excited about my own research with Elle, but we're | still working on getting that through peer review, haha. ;-) | thramp wrote: | > I think there are open possibilities in making leaderless | single-round-trip consensus systems for log-oriented FSMs, | which is what pretty much everyone WANTS. | | Woah, that's wild. Are there any pre-prints/papers/talks that | you can link to on this subject? I'd _love_ to read this. | | > I'm also excited about my own research with Elle, but we're | still working on getting that through peer review, haha. ;-) | | I read over bits of Elle; the documentation in it is | absolutely top-notch. You and Peter Alvaro knocked it out of | the park! | aphyr wrote: | _I think there are open possibilities in making leaderless | single-round-trip consensus systems for log-oriented FSMs, | which is what pretty much everyone WANTS._ | | This is based on her presentation and some dinner | conversation at HPTS 2019, so I don't know if there's | actually a paper I can point to. The gist of is that Paxos | normally involves an arbitration phase where there are | conflicting proposals, which adds a second pair of message | delays. But if you relax the consensus problem to agreement | on a _set_ of proposals, rather than a single proposal, you | don 't need the arbitration phase. Instead of "who won", it | becomes "everyone wins". Then you can impose an order on | that set via, say, sorting, and iterate to get a replicated | log. | | _I read over bits of Elle; the documentation in it is | absolutely top-notch. You and Peter Alvaro knocked it out | of the park!_ | | Thank you! Could I... hang on, just let me grab reviewer #1 | quickly, I'd like them to hear this. ;-) | judofyr wrote: | > _This is based on her presentation and some dinner | conversation at HPTS 2019, so I don 't know if there's | actually a paper I can point to. The gist of is that | Paxos normally involves an arbitration phase where there | are conflicting proposals, which adds a second pair of | message delays. But if you relax the consensus problem to | agreement on a set of proposals, rather than a single | proposal, you don't need the arbitration phase. Instead | of "who won", it becomes "everyone wins". Then you can | impose an order on that set via, say, sorting, and | iterate to get a replicated log._ | | This sounds very similar to _atomic broadcast_ | (https://en.wikipedia.org/wiki/Atomic_broadcast) where | each node sends a single message and the process ensures | that all nodes agree on the same set of messages. Not | sure how it would fit with a log-oriented FSM, but it | certainly sounds interesting. | senderista wrote: | It's really pretty trivial to implement RSM given an | atomic broadcast protocol. But you can implement many | other things, like totally ordered ephemeral messaging | with arbitrary fanout, or a replicated durable log ala | Kafka. Here's my current favorite atomic broadcast | protocol (from 2007 or so), which is leaderless, has | write throughput saturating network bandwidth, and read | throughput scaling linearly with cluster size: | | https://os.zhdk.cloud.switch.ch/tind-tmp- | epfl/394a62dd-278f-... | thramp wrote: | > This is based on her presentation and some dinner | conversation at HPTS 2019, so I don't know if there's | actually a paper I can point to. | | Thanks for the explanation! I just found | http://www.hpts.ws/papers/2019/howard.pdf; I'm reading | through it now :) | | > Thank you! Could I... hang on, just let me grab | reviewer #1 quickly, I'd like them to hear this. ;-) | | Do as you please with my praise! | zzzeek wrote: | How many more years do we have to keep evaluating, studying, and | reading about MongoDB's ongoing failures? It would appear this | product has been a great burden on the community for many years. | aphyr wrote: | I like to keep in mind that MongoDB's existing feature set is | maturing--occasional regressions may happen, but by and large | they're making progress. The problems in this analysis were in | a transaction system that's only been around for a couple | years, so it's had less time to have rough edges sanded off. | zzzeek wrote: | there are _so_ _many_ _great_ _databases_ out there. There 's | no need for one that has been mediocre for years and | continues to make false claims. This is an issue of years of | super aggressive marketing of an inferior product making it | hard on engineers. | aphyr wrote: | Hi folks! Author of the report here. If anyone has questions | about detecting transactional anomalies, what those anomalies are | in the first place, snapshot isolation, etc., I'm happy to answer | as best I can. | rystsov wrote: | Hi Kyle, thanks for the Elle :) I want to use Elle to check | long histories of transactions over small set of keys with read | dominant workload, the paper recommends to use lists over | registers but when the history becomes long on the one hand it | becomes too wasteful to read the register's history on each | request on the other hand the Elle's input becomes very large. | E.g. when each read should return the whole register's history | the size of history grows O(n^2) compared to the case when the | reads return just the head. | | So I'm curios how would you have described the ability of | finding violations with Elle using read-write registers with | unique values vs the append-only lists? | aphyr wrote: | _E.g. when each read should return the whole register 's | history the size of history grows O(n^2) compared to the case | when the reads return just the head._ | | If you look at Elle's transaction generators, you can cap the | size of any individual key, and use an uneven (e.g. | exponential) distribution of key choices to get various | frequencies. That way keys stay reasonably small (I use 1-10K | writes/key), some keys are updated frequently to catch race | conditions, and others last hundreds of seconds to catch | long-lasting errors. | | _So I 'm curios how would you have described the ability of | finding violations with Elle using read-write registers with | unique values vs the append-only lists?_ | | RW registers are significantly weaker, though I don't know | how to quantify the difference. I've still caught errors with | registers, but the grounds for inferring anomalies are a.) | less powerful and b.) can only be applied in certain | circumstances--we talk about some of these details in the | paper. | eternalban wrote: | "3.4 Duplicate Effects" | | This section seems to be the most worrying results in your | report, Kyle, with no work around. Did I read that correctly? | aphyr wrote: | Yeah, there's no workaround that I can find for 3.4 | (duplicate effects), 3.5 (read skew), 3.6 (cyclic information | flow), or 3.7 (read own future writes). I've arranged those | in "increasingly worrying order"--duplicating writes doesn't | feel as bad as allowing transactions to mutually observe each | other's effects, for example. The fact that you can't even | rely on a single transactions' operations taking place (or, | more precisely, appearing to take place) in the order they're | written is especially worrying. All of these behaviors | occurred with read and write concerns set to | snapshot/majority. | | That's not to say that workarounds don't exist, just that I | didn't find any in the documentation or by twiddling config | flags in the ~2 weeks I was working on this report. :) | devit wrote: | Have you considered presenting the data in a concise manner in | addition to the in-depth analyses? | | That is, a table on the jepsen.io frontpage, or at least on | each product's review page, with database products and | configuration on rows and consistency properties on columns, | and a nice "Yay!" or "Nope!" mark in the cell, plus links on | how to achieve the database configurations in the table (esp. | how to configure each database to have the most guarantees). | | Also, ideally the analyses should be rerun automatically (or | possibly after being paid, but making it easy for the company | to do so) every time a new major release happens rather than | being done once and then being stale. | | Finally, there should be tests for the non-broken databases | (PostgreSQL for instance, both in single-server mode, deployed | with Stolon on Kubernetes and using the multimaster projects) | as well to confirm they actually work. | eloff wrote: | Oh man this would be useful. | aphyr wrote: | _That is, a table on the jepsen.io frontpage, or at least on | each product 's review page, with database products and | configuration on rows and consistency properties on columns, | and a nice "Yay!" or "Nope!" mark in the cell, plus links on | how to achieve the database configurations in the table (esp. | how to configure each database to have the most guarantees)._ | | This is a wonderful idea, and I've got no idea how to | actually do it in a standardized, rigorous way. Vendor claims | are often contradictory, it's hard to get a good idea of | anomaly frequency, availability is... a rabbithole, and it's | hard to come up with a standard taxonomy of anomalies--most | of the analyses I do wind up finding something I've never | really seen before, haha. With that in mind, I've wound up | letting the reports speak for themselves. | | _Also, ideally the analyses should be rerun automatically | (or possibly after being paid, but making it easy for the | company to do so) every time a new major release happens | rather than being done once and then being stale._ | | I don't know a good way to do this either. Each report is | typically the product of months of experimental work; it's | not like Jepsen is a pass-fail test suite that gives | immediately accurate results. There is, unfortunately, a lot | of subtle interpretive work that goes into figuring out if a | test is doing something meaningful, and a lot of that work | needs to be repeated on each test run. Think, like... staring | at the logs and noticing that a certain class of exception is | being caught more often than you might have expected, and | realizing that a certain type of transaction now triggers a | new conflict detection mechanism which causes higher | probabilities of aborts; those aborts reduce the frequency | with which you can observe database state, allowing a race | condition to go un-noticed. That kinda thing. | | If I'm lucky and the API/setup process haven't changed, I can | re-run an analysis in about a week or so. If I'm unlucky, | there's been drift in the OS, setup process, APIs, client | libraries, error handling, etc. It's not uncommon for a | repeat analysis to take months. :-( | X6S1x6Okd1st wrote: | It's probably more snarky than helpful, but it'd be great | to have a section where it's just marketing materials or | docs that you've corrected with a red pen | bcrosby95 wrote: | It's probably better to keep it professional. Your | average employee can afford some snark. But when | companies hire you for this sort of consulting, you could | turn off a lot of potential clients by including it in | materials you produce, even when they didn't pay for it. | Because it is a representation of the product they would | be paying for. | | It would be kinda like you including this sort of thing | on your resume. Which would also be a bad idea. | ashtonkem wrote: | For those who don't know, Kyle makes a living offering | these types of analysis to database companies directly. | While a lot of us love to dunk on Mongo (myself | included), it would be silly to expect Kyle to risk his | livelihood. | jka wrote: | If done accurately and professionally, something like | you're suggesting could be really useful to aid people | and organizations during vendor selection. | | https://web.hypothes.is/about/ or similar could be used | to develop commentary overlays on top of marketing | materials. | HappyDreamer wrote: | > _consistency properties on columns, and a nice "Yay!" or | "Nope!" mark in the cell_ | | Plus maybe a column indicating what [the company behind the | database] claims? | teskk123 wrote: | hello :) Where is you found out all information about how to do | such testing? | aphyr wrote: | I was lucky to have a good education: my B.A. involved | courses in contemporary experimental physics and independent | research in nonlinear quantum dynamics (esp. proofs, | experimental design, writing), cognitive and social | psychology (more experiment design and stats), math | structures (proof techniques), philosophy (metaphysics, | philosophy of science), and English (rhetoric). All of those | helped give me a foundation for doing this kind of | experimental work and communicating it to others. | | Jepsen draws inspiration from a long line of work on | property-based testing, especially Quickcheck & co. It also | draws on roughly 10 years of experience building & running | distributed systems in production. A lot of Jepsen I invented | from whole cloth, but some of the checkers in Jepsen are | derived from specific research papers, like work by Wing, | Gong, and Howe on linearizability checking. | | Then it's just... a lot of thinking, experimenting, and | writing. Jepsen's the product of ~6 years of full-time work. | Elle, the system which detected the anomalies in this report, | was a research project I've been puzzling over for roughly | two years. | | I write the Jepsen series, and open-source all of the code | for these tests, partly as a resource so that other people | can learn to do this same kind of work. :-) | teskk123 wrote: | wow. thanks a lot for quick and full answering! | throwaway_pdp09 wrote: | I guess you've answered my question, but to be clear, you | do not instrument/analyse the code, you treat it as a black | box which you hammer on externally, is that right? | aphyr wrote: | Pretty much, yeah. There are some cases where Jepsen | reaches into the guts of a database or lies to it via | LD_PRELOAD shims, but generally these are Just Plain Old | Binaries provided by vendors; no instrumentation | required. | monstrado wrote: | Huge fan of your work! I was curious if you've ever attempted | to run your (or part of) Mongo test suite against FoundationDB | using their DocumentLayer since it's supposed to be Mongo API | compatible. | robterrell wrote: | IIRC one of the FoundationDB engineers tested with Jepsen and | found that it passed in its default configuration, but the | blog post seems to have disappeared. | | https://web.archive.org/web/20150312112556/http://blog.found. | .. | monstrado wrote: | Thanks for firing up the time machine! I've been using FDB | for a little over a year now and can't recommend it enough. | Such a solid piece of meticulous engineering. | aphyr wrote: | No, I haven't! You can see a full list of analyses here: | http://jepsen.io/analyses | rclayton wrote: | Hi Kyle! I've really enjoyed your work over the years. I was | wondering, with all of your testing and experimentation, is | there any system that had really impressed you? | zbentley wrote: | I don't presume to speak for him, but his writeup on | ZooKeeper was among the most positive in the Jepsen series: | https://aphyr.com/posts/291-jepsen-zookeeper | | My bias: I like and heavily use ZooKeeper in production. HN | seems not to like it as much. | aphyr wrote: | I'm kind of impressed _any_ distributed system gets off the | ground. These things are hard to write! | dilandau wrote: | You're doing very, very valuable work. Thanks fam, keep those | vendors honest, and help us make informed decisions. | politician wrote: | Thank you for all of your work over the years. Your reports | have helped me and others stand up to bizdev hype and make | better decisions for our companies and customers. | | Postgres is widely understood to be a robust database with safe | defaults. I, and perhaps others, would love to see you aim your | array of weapons at Postgres. Do you have any plans to look at | stock Postgres? | aphyr wrote: | It's been on my list for a long time, but I've also struggled | to find out like... what, exactly, is the right way to do | postgres replication? Every time I go into the docs I wind up | with a laundry list of different mechanisms for replication | and failover, and no idea which one would be most appropriate | for a test. I gotta get on this! | takeda wrote: | Well, the built-in ways is the right way to do it. But | given that PostgreSQL is quite conservative about it, it | will be hard to find issues there (the replicas are read | only, so at worst it will be just a replication delay, | unless you use synchronous replication, which will remove | the replication delay at the cost of slower performance). | | All the tooling that provides extra distributed | functionality not present in postgres (auto failover, multi | master replication, sharding etc) will surely have issues, | but then you aren't testing the PostgreSQL itself, but the | tooling, so to be fair, you the article should evaluate | these tools, and any shortcomings shouldn't go to | PostgreSQL (unless it really is a PostgreSQL issue). | didip wrote: | It is true that only recently PG has a standard way of | replicating. But even then, PG is not a distributed | database by default. | | However, if I may suggest, Stolon, Patroni, Postgres XL or | Citus Data might be interesting to you. | bsaul wrote: | i feel like this is the reaction of everyone having ever | tried to setup postgres replication. With your audience, | you deciding for a particular setup will probably help a | LOT of people, and ultimately the postgres project as well. | takeda wrote: | If you worry about data, you should not use automatic | failover. It's nearly impossible for standby to know why | master stopped responding. Maybe there was a hardware | failure, or maybe master is just busy. This is why manual | failover is better, because you can know the real reason | and decide whether you should perform failover or just | wait. | | With tools like repmgr it is just a single command | invoked on the standby. | | If you absolutely don't want to lose any data, you should | have two masters in close proximity (so the latency isn't | high) set up with synchronous replication, then have one | or two standbys with asynchronous replication. This will | reduce throughout, but then you can be sure that the | other machine has all the same transactions. If something | happens to both you then can fallback to the asynchronous | one which might be a bit behind. | feike wrote: | One of the authors of Patroni here. | | Automatic failover for PostgreSQL works great and can be | done safely if combined with synchronous replication. | | Multiple tools will implement this correctly: | | https://patroni.readthedocs.io/en/latest/replication_mode | s.h... https://github.com/sorintlab/stolon/blob/master/do | c/syncrepl... | | Quoting a former colleague here, but "if it hurts, do it | more often". That is what you should do with your | PostgreSQL failovers. | | I have clusters running on timelines in the hundreds | without a byte of data loss due to using synchronous | replication, tools that help out with leader election, | and just doing it often. | takeda wrote: | Can Patroni tell if master node is not responsive because | it is busy vs dead? GitHub (I believe) had few outages | that caused data loss because their auto failover | mechanism kicked in when it shouldn't. | | I would actually be interested if aphyr's analysis of | Patroni and other distributed add-ons to PostgreSQL. | pcl wrote: | I think that it'd be super-valuable to do an analysis of an | RDS Postgres deployment. Amazon is doing some dark magic | with RDS that sits at this really interesting "distributed, | but not _that_ distributed " inflection point, which | impacts the basic assumptions of lots of distributed | database design. | | I believe RDS Postgres is probably the right answer for | lots of applications, especially for those that already | depend on AWS for baseline availability. I'd love to see if | that holds up against a rigorous analysis. | aeyes wrote: | Are you talking about Aurora? Because in RDS the | replication is just what you get out of the box. | aphyr wrote: | I'd like this too, but I'm not sure how to do fault | injection against an Amazon-controlled service. | ashtonkem wrote: | You'd probably have to work directly with AWS on that | one, either to get a custom harness in AWS, or to find | out how they configure RDS replication. | elesbao wrote: | the setup would prob ably be pgsql primary, aurora | secondaries on diff zones and something changing cross- | zone or cross-region vpc setting to try to break | replication ? never tried that but was hurt by rds pure | pgsql cross region replication in a network outage | situation. | zbjornson wrote: | It'd be especially interesting given that MongoDB claims | this: | | > Postgres has both asynchronous (the default) and | synchronous replication options, neither of which offers | automatic failure detection and failover [12]. The | synchronous replication only waits for durability on one | additional node, regardless of how many nodes exist [13]. | Additionally, Postgres allows one to tune these durability | behaviors at the user level. When reading from a node, | there is no way to specify the durability or recency of the | data read. A query may return data that is subsequently | lost. Additionally, Postgres does not guarantee clients can | read their own writes across nodes. | | From http://www.vldb.org/pvldb/vol12/p2071-schultz.pdf | takeda wrote: | > It'd be especially interesting given that MongoDB | claims this: | | > > Postgres has both asynchronous (the default) and | synchronous replication options, neither of which offers | automatic failure detection and failover [12]. The | synchronous replication only waits for durability on one | additional node, regardless of how many nodes exist [13]. | Additionally, Postgres allows one to tune these | durability behaviors at the user level. When reading from | a node, there is no way to specify the durability or | recency of the data read. A query may return data that is | subsequently lost. Additionally, Postgres does not | guarantee clients can read their own writes across nodes. | | > From http://www.vldb.org/pvldb/vol12/p2071-schultz.pdf | | This is like those commonly seen tables comparing your | product with others where your product had checkmarks in | all categories, and of course competitors are missing a | bunch of them. The problem is that the categories were | picked by you, and are often irrelevant to the other | product. This is the case here. | | PostgreSQL is not a distributed database, the master is | the one doing all writes. The replicas are read only. By | default replicas are asynchronous which means they won't | affect master performance, at the cost of having data | there being late by few seconds. Since you can't write to | replicas, this won't cause data corruption, only delay | which often is acceptable. If you design your | applications in such way that will have two database | endpoints: one for writes and one just for reads, you can | then decide based on context which endpoint you want to | use. The read only is easy to scale, but as mentioned | earlier it is read only, and might slight delay. | | Now, for failover, you might also opt on using | synchronous replicas this will add extra latency, but | then you always have at least one machine that has the | same data. They mentioned that if you have multiple | synchronous standbys then it only one needs to write. | Actually that's configurable, you can specify group of | synchronous machines and how many and which need to be | synchronized, the remaining ones are a backup in case | those that you specified aren't available. | | Besides, the writes don't work the same way as in mongo, | when a standby node is in sync it isn't just in sync for | that particular write, it is completely in sync, so their | following argument about not being able to specify | durability/recency of data on read is redundant. If you | contact the master or synchronous replica, you will | always get the most recent state. If you don't mind | slight delay you should query asynchronous replicas (in | fact you should prefer them whenever you can, since those | are cheap to add) | zbjornson wrote: | I'm not sure I understand your point. | | > the master is the one doing all writes. The replicas | are read only. By default replicas are asynchronous | | The same is true with MongoDB's defaults in an unsharded | cluster. | zzzcpan wrote: | Postgres is not a distributed database and doesn't have a | single safe default for running it in a distributed | configuration, including talking to it over network. It can't | claim any consistency guarantee, so there is nothing for | aphyr to test it for. | | Even common highly available configurations take the route of | no consistency guarantees by doing primitive async | replication and primitive failover. | bsaul wrote: | i'm not sure what you mean by pg not being a "distributed" | database. it has replication and sharding functionalities | that let it run in various clustering configuration. This | looks enough to me to qualify it for aphyr tests. | takeda wrote: | Replication is read only, so at worst there's only delay | when it is set up asynchronously, but ultimately it will | be the same as master. The sharding part, do you mean | FDW? I don't think PostgreSQL gives any consistency | guarantees if you use them. | bsaul wrote: | ha, my bad. I had the feeling pg provided some solution | for sharding, but it seems they're all third-party | extensions ( like citus/pg-shard) | politician wrote: | Postgres supports multi-master replication, among other | replication models. This could provide an interesting | target. | | In a classic single node configuration, a confirmation that | its transaction isolation behaviors exhibited the | corresponding anomalies would be valuable. | | So I think there's value in this ask. | samdk wrote: | Postgres doesn't natively support multi-master. (Although | there are a variety of open source/proprietary offerings | that add support for it to various degrees.) | [deleted] | takeda wrote: | PostgreSQL doesn't offer multi master replication. There | are extensions that do, but if aphyr will evaluate them | he should emphasize that he is treating them not the | PostgreSQL (unless he finds a bug in PostgreSQL itself). | | I think he did something similar for MySQL when | evaluating the Galera cluster. | politician wrote: | Jepsen reports often include two distinct types of | analyses: correctness in a distributed storage system | under a variety of failure scenarios, and in-depth | analysis of consistency claims. Both examinations are | extremely helpful. | | In a single write master configuration, Postgres runs | transactions concurrently, so the consistency analysis is | still quite relevant. | | I don't think it's a stretch to say that everyone expects | Postgres to get top marks in this configuration and it | would be worth confirming that this is the case. | takeda wrote: | Actually he already did analyze PostgreSQL: | https://aphyr.com/posts/282-call-me-maybe-postgres | | But it was long ago, and maybe needs to be redone? | | Edit: after re-reading it he treats it as a distributed | system because client and server is over network. And | that is true, it can also be thought of as a distributed | system because as you said transactions are concurrent | and are running as separate processes. Although in these | cases you can't have a partition (which aphyr uses to | find weaknesses), or maybe there is something equivalent | that happens? | zozbot234 wrote: | > PostgreSQL doesn't offer multi master replication. | | Not in itself, but it does offer a PREPARE TRANSACTION - | COMMIT PREPARED / ROLLBACK PREPARED extension that could | be used to add such support in the future. This would not | be unprecedented, as the simpler case of db sharding is | already being supported via the PARTITION BY feature, | combined with "FOREIGN" database access. | danpalmer wrote: | Not a question necessarily about the technical side, but I'm | interested in your opinion as to the root cause - is it desire | to achieve certain results for marketing purposes, lack of | understanding/training in the team about distributed systems, | just bugs and a lack of testing...? Alternatively does most of | this come down to one specific technical choice, and why might | they have made that choice? | | Very happy for (informed) speculation here, I recognise we'll | probably never know for certain, but I'm interested to avoid | making similar mistakes myself. | aphyr wrote: | There's a few things at play here. One is talking only about | the positive results from the previous Jepsen analysis, while | not discussing the negative ones. Vendors often try to | represent findings in the most positive light, but this was a | particularly extreme case. Not discussing default behavior is | a significant oversight, and it's especially important given | ~80% of people run with default write concern, and 99% run | with default read concern. | | The middle part of the report talks about unexpected but | (almost all) documented behavior around read and write | concern for transactions. I don't want to conjecture too much | about motivations here, but based on my professional | experience with a few dozen databases, and surveys of | colleagues, I termed it "surprising". The fact that there's | explicit documentation for what I'd consider Counterintuitive | API Design suggests that this is something MongoDB engineers | considered, and possibly debated, internally. | | The final part of the report talks about what I'm pretty sure | are bugs. I'm strongly suspicious of the retry mechanism: | it's possible that an idempotency token doesn't exist, isn't | properly used, or that MongoDB's client or server layers are | improperly interpreting an indeterminate failure as a | determinate one. It seems possible that all 4 phenomena we | observed stem from the retry mechanism, but as discussed in | the report, it's not entirely clear that's the case. | danpalmer wrote: | Thanks for the thoughts. | | I get the impression that MongoDB may have hyped themselves | into a corner in the early days with poorly made (or | misleading) benchmarks. Perhaps they have customers with a | lot of influence determining how they think about | performance vs consistency. | | Maybe this combined with patching, re-patching, re-patching | again their replication logic/consistency algorithm means | that they'll be stuck in this sort of position for a long | time. | aphyr wrote: | Possibly! You're right that path dependence played a role | in safety issues: the problems we found in 3.4.0-rc3 were | related to grafting the new v1 replication protocol onto | a system which made assumptions about how v0 behaved. | That said, I don't want to discount that MongoDB _has_ | made significant improvements over the years. Single- | document linearizability was a long time in the works, | and that 's nothing to sneeze at! | | http://jepsen.io/analyses/mongodb-3-4-0-rc3 | staticassertion wrote: | I've wanted to try building a toy database to learn more about | how they work - any suggestions for good resources? | [deleted] | lmilcin wrote: | I am tech lead for a project that revolves around multiple | terabytes of trading data for one of top ten largest banks in the | world. My team has three, 3-node, 3TB per node MongoDB clusters | where we keep huge amount of documents (mostly immutable 1kB to | 10kB in size). | | Majority write/read concern is exactly so that you don't loose | data and don't observe stuff that is going to be rolled back. It | is important to understand this fact when you evaluate MongoDB | for your solution. That it comes with additional downsides is | hardly a surprise, otherwise there would be no reason to specify | anything else than majority. | | You just can't test lower levels of guarantees and then complain | you did not get what higher levels of guarantees were designed to | provide. | | It is also obvious, when you use majority concern, that some of | the nodes may accept the write but then have to roll back when | the majority cannot acknowledge the write. It is obvious this may | cause some of the writes to fail that would succeed should the | write concern be configured to not require majority | acknowledgment. | | The article simply misses the mark by trying to create sensation | where there is none to be found. | | The MongoDB documentation explains the architecture and | guarantees provided by MongoDB enough so that you should be able | to understand various read/write concerns and that anything below | majority does not guarantee much. This is a tradeoff which you | are allowed to make provided you understand the consequences. | lllr_finger wrote: | > The article simply misses the mark by trying to create | sensation where there is none to be found. | | As someone who is a tech lead for a large database install, I'd | urge you to read the rest of the Jepsen reports. They aren't | intended to be hit pieces on technology - they're deep dives | into the claims and guarantees of each database. IIRC MDB has | explicitly reached out to OP in the past (I doubt they'll | continue to do so after this). | | Why that matters to the rest of us: once I learn all those | dials and knobs I'm left wondering why I would choose Mongo | over another technology, and how much the design of the default | behavior and complexity of said dials/knobs are influenced by | their core business. | lmilcin wrote: | I agree. MongoDB has large numbers of peculiarities that you | better know before you buy in. It is definitely not so rosy | as advertised. In particular it seems the product is not | mature (especially if you come from Oracle world) and the | features seem slapped on as they go and not thought through. | nosequel wrote: | Since you just leaned all the way in, while repeatedly proving | you either will not, or cannot read the posted article at all. | Will you let us know what bank you support so at least I can | make sure I never use that bank? | | Thanks, Those of us who care about our banking and investing | data. | aphyr wrote: | _You just can 't test lower levels of guarantees and then | complain you did not get what higher levels of guarantees were | designed to provide._ | | Gently, may I suggest that you read the report, or at least the | abstract? This is addressed in the second sentence. :-) | lmilcin wrote: | To quote from the report: "Moreover, the snapshot read | concern did not guarantee snapshot unless paired with write | concern majority--even for read-only transactions." | | Of course, it doesn't work when you don't pair it with | majority read/write concern. You can't expect to get a | snapshot of data that wasn't yet acknowledged by majority of | the cluster. | | As to the quote you probably are referring to: | | "Jepsen evaluated MongoDB version 4.2.6, and found that even | at the strongest levels of read and write concern, it failed | to preserve snapshot isolation." | | I did not find any proof of this in the rest of the report. | It seems this is mostly complaint of what happens when you | mix different read and write concerns. | | I would also suggest to think a little bit on the concept of | snapshot in the context of distributed system. It is not | possible to have the same kind of snapshot that you would get | with a single-node application with the architecture of | MongoDB. MongoDB is a distributed system where you will get | different results depending on which node you are asking. | | The only way you could get close to having a global snapshot | is if all nodes agreed on a single truth (for example single | log file, block chain, etc.) which would preclude read/write | with concern level less than majority. | aphyr wrote: | > I did not find any proof of this in the rest of the | report. | | May I suggest sections 3.4, 3.5, 3.6, 3.7, 4.0, and 4.1? | lmilcin wrote: | Quoting half the report is bad for the discussion as it | makes it impossible for the reader to follow. | inglor wrote: | > May I suggest alternative perspective on the matter? | | Can't reply to that since it's too nested so I'll reply | here. I warmly recommend getting off tree you climbed on | and actually reading the article because if you do - you | will see you are not disagreeing on that part. | | The article is a mostly technical analysis of the | transaction isolation levels and where they hold. The | main criticism is how MongoDB _advertises_ itself. If | they didn 't claim the database is "fully ACID" then the | article would have just been a technical analysis :] | aphyr wrote: | Chief, it does _not_ have to be this hard. 3.4 clearly | states: | | _This anomaly occurred even with read concern snapshot | and write concern majority_ | | 3.5: _In this case, a test running with read concern | snapshot and write concern majority executed a trio of | transactions with the following dependency graph_ | | 3.6: _Worse yet, transactions running with the strongest | isolation levels can exhibit G1c: cyclic information | flow._ | | 3.7: _It's even possible for a single transaction to | observe its own future effects. In this test run, four | transactions, all executed at read concern snapshot and | write concern majority, append 1, 2, 3, and 4 to key 586 | --but the transaction which wrote 1 observed [1 2 3 4] | before it appended 1._ | | Like... if you had read any of these sections--or even | their very first sentences--you wouldn't be in this | position. They're also summarized both in the abstract | and discussion sections, in case you skipped the results. | | 4.0: _Finally, even with the strongest levels of read and | write concern for both single-document and transactional | operations, we observed cases of G-single (read skew), | G1c (cyclic information flow), duplicated writes, and a | sort of retrocausal internal consistency anomaly: within | a single transaction, reads could observe that | transaction's own writes from the future. MongoDB appears | to allow transactions to both observe and not observe | prior transactions, and to observe one another's writes. | A single write could be applied multiple times, | suggesting an error in MongoDB's automatic retry | mechanism. All of these behaviors are incompatible with | MongoDB's claims of snapshot isolation._ | | It's OK to stop digging now! | lmilcin wrote: | May I suggest alternative perspective on the matter? | | Compared to a product like Oracle, transactions on | MongoDB are very new, very niche functionality. Even | MongoDB consultants do openly suggest not to use it. | | MongoDB is really meant to store and retrieve documents. | That's where the majority read/write concern guarantees | come from. | | As long as you are storing and retrieving documents you | are pretty safe functionality. | | Your article presents the situation as if MongoDB did not | work correctly at all. That is simply not true, the most | you can say is that a single (niche) feature doesn't | work. | | Have you ever tried distributed transactions with | relational databases? Everybody knows these exist but | nobody with sound mind would ever architect their | application to rely on it. | | Any person with a bit of experience will understand that | things don't come free and some things are just too good | to be true. MongoDB marketing may be a bit trigger happy | with their advertisements but it does not mean the | product is unusable, they just probably promised bit too | much. | JohnBooty wrote: | This comment will rightfully be downvoted, but I'm going | to break HN decorum for once in my long posting history | here and simply say: | | Holy _shit_ , buddy. Stop. | threeseed wrote: | At least he is contributing something to do the | discussion. | | He may be right. Hey may be wrong. But it helps everyone | learn. | | Your comments contribute nothing. So how about you stop ? | lmilcin wrote: | The world does not revolve around HN votes. If your first | urge is whether the post gets downvoted or not you might | want to rethink your life a little bit. | | So don't worry about me. | dang wrote: | Please stop. We don't want flamewars here. | JohnBooty wrote: | I'm not "worried" nor experiencing an "urge." Please skip | the concern trolling. | | What I do have an interest in is HN's accepted decorum, | which I admittedly stepped outside of when I implored you | to stop digging yourself such a hole. | | HN is far from perfect but there is a culture of | respectful discourse here, which is part of the reason | for its value IMO. | speedgoose wrote: | You may want to delete this comment too. | [deleted] | jiofih wrote: | May I suggest the tiniest bit of consideration (such as | reading the report) before jumping to conclusions and | low-key offending the author? You should be embarrassed. | aphyr wrote: | _Have you ever tried distributed transactions with | relational databases?_ | | I am delighted to say that yes: checking safety | properties of distributed systems, including those of | relational databases, is literally my job. See | https://jepsen.io/analyses for a comprehensive list of | prior work, or http://jepsen.io/analyses/tidb-2.1.7, | http://jepsen.io/analyses/yugabyte-db-1.1.9, | http://jepsen.io/analyses/yugabyte-db-1.3.1, or | http://jepsen.io/analyses/voltdb-6-3 for recent examples | of Jepsen analyses on relational databases. | [deleted] | logicchains wrote: | Did you see the part about "Operations in a transaction use | the transaction-level read concern. That is, any read | concern set at the collection and database level is ignored | inside the transaction."? | | "Tansactions without an explicit read concern downgrade any | requested read concern at the database or collection level | to a default level of local, which offers "no guarantee | that the data has been written to a majority of replicas | (i.e. may be rolled back)."" | | The big problem is that, even if somebody correctly sets | the read and write concerns to something sensible, the | moment they use a transaction these guarantees fly out the | window, unless they read the docs carefully enough to | realise they have to set the read and write concern for the | transaction too. The defaults are very un-intuitive; I | can't imagine that the case of somebody needing snapshot | isolation in general but being fine with arbitrary data | less in transactions is a common case, compared to wanting | to avoid data loss both generally and in transactions. | lmilcin wrote: | It is different to complain about unclear documentation | and unintuitive gurantees and to say that it just doesn't | work. | | Yes it works. Yes, you have to read the documentation | very carefully. | inglor wrote: | Not saying you're wrong. As an anecdotal data point - | we've read the docs (carefully) and spoke to MongoDB | quite a bit when implementing transactions including | their highest paid levels of support and still ran into | this issue: | | > transactions running with the strongest isolation | levels can exhibit G1c: cyclic information flow. | | As well as the Node.js API issue (I just checked randomly | and their Python API has the same bug lol) listed above. | bronson wrote: | If a database advertises attributes that aren't a part of its | default setup, you can expect its docs to make it very simple | and clear how to get them. | | If not, that's misrepresentation. | lmilcin wrote: | The documentation states that very clearly and the attributes | are part of every call to the database (as long as you are | using native driver). | | In any case any person that has some experience with | distributed systems will understand what it roughly means to | get an acknowledgment from just a single node vs. waiting for | the majority. | | Oracle also does not use serializable as its default | isolation level, yet it advertises it. | | This is all part of the product functionality. Whenever you | evaluate product for your project you have to understand | various options, functionalities and their tradeoffs. | | Defaults don't mean shit. In a complex clustered product you | need to understand all important knobs to decide the correct | settings and configurable guarantees are most important knobs | there are. | bronson wrote: | Good point. If there's a database that rivals Mongo for | shady sales tactics, it's Oracle. | dang wrote: | All: there was a big thread about this yesterday | (https://news.ycombinator.com/item?id=23285249) but because it | didn't focus on the technical content, and because there were | glitches with a previous submission of this report (described at | https://news.ycombinator.com/item?id=23288120 and | https://news.ycombinator.com/item?id=23287763 if anyone cares), | we invited aphyr to repost this. Normally we downweight follow-up | posts that have such close overlap with a recent discussion (http | s://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...), so | the exception is probably worth explaining. | sorokod wrote: | I suppose there are reasons why the defaults are the way they | are. Can anyone comment on the implications, performance or | otherwise, of bumping up the read/write concerns? | aphyr wrote: | Latency is a big one--you've got to wait an extra round-trip | for secondaries to acknowledge primary writes, and primaries | (assuming you don't have reliable clocks) need to check in with | secondaries to confirm they have the most recent picture of | things if you want to do a linearizable read. Snapshot isolated | reads _shouldn 't_ require that, at least in theory--it's legal | to read state from the past under SI, so there's no need to | establish present leadership. That's why I'm surprised that | MongoDB requires snapshot reads to go through write concern | majority--it doesn't _seem_ like it 'd be necessary. Might have | something to do with sharding--maybe establishing a consistent | cut across shards requires a round of coordination. Even then I | feel like that's a cost you should be able to pay only at write | time, making reads fast, but... apparently not! I'm sure the | MongoDB engineers who designed this system have good reasons; | they're smart folks and understand the replication protocol | much better than I do. | | MongoDB's also published a writeup (which is cited a few times | in the Jepsen report!) talking about the impact of stronger | safety settings and why they choose weak defaults: | http://www.vldb.org/pvldb/vol12/p2071-schultz.pdf | goatinaboat wrote: | In general, MongoDB's defaults fall into two categories. The | first could possibly be justified as making it easy for | inexperienced devs to get started, but it means that people | rely on those defaults and then try to promote to production, | and unless there is an experienced traditional DBA with the | power to veto it, it will go ahead. This is how they "backdoor" | their way into companies. The second category is whatever will | look good on a benchmark, regardless of any corners cut. | | Compare and contrast with the highly ethical Postgres team, who | encourage good practices from the start and who get a feature | right first before worrying about performance. That may harm | their adoption in the short term but over the long term, that's | why they're the gold standard. And with their JSONB datatype | they have a better MongoDB than MongoDB anyway! And have a | million other features besides! | logicchains wrote: | >And have a million other features besides! | | Yeah, but in spite of that their performance still sucks | compared to writing directly to /dev/null, and that's where | Mongo steals their thunder. | threeseed wrote: | > Compare and contrast with the highly ethical Postgres team | | You do know that PostgreSQL had issues with not fsyncing data | as well ? It's technology. Bugs will be made. Design | decisions will be wrong. | | I think it's really disappointing and inappropriate to be | labelling MongoDB engineers as unethical for simply having | incorrect defaults. Which in their history they often change | after they are made aware of them. | junon wrote: | I wanted to incorporate MongoDB into a C++ server at one point. | | Their C/C++ client is literally unusable. I went to look into | writing my own that actually worked and their network protocols | are almost impossible to understand. BSON is a wreck and | basically the whole thing discouraged me from ever trying to | interact with that project again. | bbulkow wrote: | mongodb's business model, forever, has been to get developers to | write code, be damned the fact that you can't support it reliably | on a cloudy day. | jtdev wrote: | Now do DynamoDB. | aphyr wrote: | I'd like to, but I don't have any way to do fault injection on | a system someone else owns. :( | jtdev wrote: | Would love to see AWS agree to facilitating this. Appreciate | your work very much! | petrikapu wrote: | They have downloadable version of it https://docs.aws.amazon. | com/amazondynamodb/latest/developerg... | [deleted] | chousuke wrote: | This article reinforces my stance that bad defaults are a bug. | Defaults should be set up with the least number of pitfalls and | safety tradeoffs possible so that the system is as robust as it | can be for the majority of its users, since the vast majority of | them aren't going to change the defaults. | | Sometimes you end up with bad defaults simply by accident but I | feel like for MongoDB the morally correct choice would be to own | up to past mistakes and change the defaults rather than maintain | a dangerous status quo for "backwards compatibility", even if you | end up looking worse in benchmarks as a result. | aphyr wrote: | I think this is a good way to look at things, and there are | vendors who do this! VoltDB, for instance, changed their | defaults to be strict serializable even though it imposed a | performance hit, following their Jepsen analysis. | https://www.voltdb.com/blog/2016/07/voltdb-6-4-passes-offici... | mtrycz2 wrote: | aphyr, you are of great inspiration as an engineer and as a | human. | | Your attitude of "a tool I need doesn't exists, so I'll just go | ahead and create it" blew my mind and changed me for the better. | | I'm dedicating my next test framework to you. Thank you for | everything. | aphyr wrote: | Aw shucks, thank you! <3 | inglor wrote: | Without going into details die to NDAs, the experience in the OP | matches the ones of several fortune 500 companies I had gigs | with. | Laszlotejfel86 wrote: | SSID: Tejfel Lazalo Protocol: Wi-Fi 5 (802.11ac) Security type: | WPA2-Personal Network band: 5 GHz Network channel: 157 Link-local | IPv6 address: fe80::1d0:5810:36ab:219d%8 IPv4 address: 10.0.0.148 | IPv4 DNS servers: 64.71.255.204 64.71.255.198 Manufacturer: Intel | Corporation Description: Intel(R) Wireless-AC 9462 Driver | version: 20.50.0.5 Physical address (MAC): 7C-2A-31-7B-02-46 | sam1r wrote: | Extremely well written! I learned a lot. | | I wonder if someone can type up a well-manicured post-Morten of | the recent triple byte incident? | fastball wrote: | At this point I think we might be going a bit overboard with | title changes. | | Now that it's just "MongoDB 4.2.6", the title makes me think that | this is a release announcement, not an analysis of the software. | | The first title (that specifically referenced a finding of the | analysis) was best, imo. Mildly opinionated or whatever, but at | least it quickly communicated the gist of the post. On the other | hand: | | "Jepsen: MongoDB 4.2.6" - not super helpful if you're not already | familiar with the Jepsen body of work. | | "MongoDB 4.2.6" - as stated above, sounds like a release | announcement. | | If you want a suggestion, maybe something like "Jepsen evaluation | of MongoDB 4.2.6"? Not overly specific (/ negative) like the | first title, but at least provides some slight amount of context. | | @dang | dang wrote: | Please read the site guidelines: | https://news.ycombinator.com/newsguidelines.html. They say: " | _If the title includes the name of the site, please take it | out, because the site name will be displayed after the link._ " | That's why a moderator changed it: the submitted title was | "Jepsen: MongoDB 4.2.6". | | I don't mind making an exception, since exceptions are things | sometimes. Jepsen is famous on HN, so the current title is not | an issue. Indeed, referencing a specific finding would arguably | be misleading, since this article _is_ the Jepsen report about | MongoDB 4.2.6. Btw, I don 't know what you mean by "The first | title (that specifically referenced a finding of the analysis) | was best". The submitted title was "Jepsen: MongoDB 4.2.6" and | it has only ever rotated between two states, one with "Jepsen: | " and one without. Are you confusing this thread with | https://news.ycombinator.com/item?id=23285249? | | It's very silly to have this be the top comment on the page | (I've since downweighted it, but that's where it was when I | looked in). Yesterday I briefly swapped the URL of this article | into the other thread, but then reversed that because it seemed | that thread couldn't support a more technical discussion | (https://news.ycombinator.com/item?id=23288120). I invited | aphyr to repost it instead, which was quite a break from our | standard practice of downweighting follow-up posts, but seemed | like the best solution at the time. What technical discussion | was our reward? Bickering about title policy! | aphyr wrote: | This... usually happens on Jepsen HN threads. The full title, | as in the page metadata, and as originally submitted, is | "Jepsen: MongoDB 4.2.6. At some point a mod drops the "Jepsen:" | part, then we have this discussion, and it comes back. :) | | "Why don't you put Jepsen:" on the same line as the database | name and version?" | | Space concerns, and also, it's immediately above the DB name in | giant letters. | | "Why don't you give them more creative names?" | | Clients _love_ to argue about the titles of these analyses; | having a concise, predictable policy for titling is how I get | past those discussions. | fastball wrote: | As another commenter pointed out, it might be worth making | the titles "An evaluation of X" going forward - better for HN | and probably better everywhere else this is shared too. | aphyr wrote: | Not sure how many ways I can say this: the titles are | _already_ "Jepsen: X". HN's got a policy in place that | means sometimes mods change the title to just "X". That's | not something I have control over, sorry. | fastball wrote: | Right, but "Jepsen: X" doesn't really mean anything to | anyone that isn't familiar with your work. "An Evaluation | of X" is much more informative. | aphyr wrote: | I've invested seven years of my life into this brand, and | my choices are carefully considered. | Ecco wrote: | It's the article's title... | fiddlerwoaroof wrote: | A generic "Mongo 4.2.6" title doesn't help me decide whether | to click on the link (especially with how light the domain | is). I thought it was a release announcement and only clicked | through to the comments because of yesterday's discusssion. | dang wrote: | An HN title needs to be read along with the site name to | the right of it. | fiddlerwoaroof wrote: | The styling of the site name makes it hard to scan. If | it's so essential, the font should be darker and bigger. | dang wrote: | That's a fair point, but people have a lot of | contradictory preferences about things like that. I think | I'd rather address this by allowing more customization of | the site. Still thinking about | https://news.ycombinator.com/item?id=23199264. | fiddlerwoaroof wrote: | As I said there, I'd like to see that added | Fiveplus wrote: | No context doesn't help. | cromulent wrote: | Well... it's the articles second H1 header. The title is | "Jepsen: MongoDB 4.2.6". | petepete wrote: | And taken out of context it makes little sense. | simias wrote: | "An evaluation of MongoDB 4.2.6" might be neutral and | informative enough I suppose. | | But then again ultimately the blame is on the author of the | article, it's a terrible title for this type of articles. I can | understand if the moderators here don't want to go through the | trouble of dealing with editorialized titles (with all the | controversies it could generate) when clearly the original | author didn't care enough to come up with a decent title. | takeda wrote: | Why? His site is about evaluating distributed data stores. In | context of his site, that title makes perfect sense, HN | should just add the missing context to its title. | fastball wrote: | Because as can be seen from the fact that most people only | found this article because it was posted on HN (and not | because they were browsing the site), the context of the | overall site isn't super relevant. | | Site context isn't a given when most of us are finding | content via 3rd party sources. | inglor wrote: | I also want to point their Node.js transactions API is wrong and | looks like they have no idea how promises or async code work in | JS. | | In mongo, you have a `withTransaction(fn)` helper that passes a | session parameter. Mongo can call this function mutliple times | with the same session object. | | This means that if you have an async function with reference to a | session and a transaction gets retried - you very often get "part | of one attempt + some parts of another" committed. | | We had to write a ton of logic around their poor implementation | and I was shocked to see the code underneath. | | It was just such a stark contrast to products that I worked with | before that generally "just worked" like postgres, elasticsearch | or redis. Even tools people joke about a lot like mysql never | gave me this sort of data corruption. | | Edit: I was kind of angry when writing this so I didn't provide a | source and I'm a bit surprised this go so many upvotes without a | source (I guess this community is more trusting than I assumed :] | ). Anyway for good measure and to behave the way I'd like others | to when making such accusations here is where they pass the same | session object to the transacton https://github.com/mongodb/node- | mongodb-native/blob/e5b762c6... (follow from withTransaction in | that file) - I can add examples of code easily introducing the | above mentioned bug if people are interested. | jfkebwjsbx wrote: | > Even tools people joke about a lot like mysql never gave me | this sort of data corruption. | | People rightfully joked about MySQL when they had the non-ACID | engine. | | Same for MongoDB. A database that loses data when properly used | is a joke. | | Yes, there are use cases out there for fast non-guaranteed | writes. No, 99% of companies don't have them. | Something1234 wrote: | Can you name a use case for a fast non guaranteed write? | throwaway744678 wrote: | Analytics: you don't want to slow down your app, and you | don't care if you lose a few records in the process. | zbentley wrote: | Importantly, in analytics workloads, it is very important | to know roughly _how many_ writes aren 't making it. | Otherwise your analytics system sucks. | rocho wrote: | Interesting. How would one know that? | zbentley wrote: | Good question. You'd need some accurate-enough data | source telling you about failed writes. Which eventually | comes back around to needing a consistent database and | indications of client disconnects. | why-el wrote: | Even more elementary that sibling comments, this also | happens in gaming all the time. You are recording live | results, say in Fifa, but if you unplug your device, your | results are gone, since they were memory only. The game | simply cannot afford to write to disk, the write is "non | guaranteed" in the true sense of the word, but it is | _fast_. | | You then "checkpoint" when the game is over. | | You might dissent that is not a "non-guaranted" write, | because in fact the write did occur, but I simply want to | allude to the concept of a "non-secured" write, in that it | vanished without an fsync. | threeseed wrote: | Telemetry. | | I work for a telco where we log large amounts of network | requests using MongoDB. | jfkebwjsbx wrote: | The number of likes in a given post in your favorite | $social-network-of-the-year. | twic wrote: | Caches. If you lose a write, you just get a cache miss. | | Periodic snapshots of state held elsewhere. If you lose a | write, you just get stale data until the next update. | | Firm realtime work. If you lose a write, that sucks, but a | slow write sucks just as much. | Jweb_Guru wrote: | Sure. Data that people don't care about enough to be | worried about losing--for example, time series data from an | unimportant remote sensor. Should this data be recorded at | all? Maybe not, but if should then a best-effort recording | may be fine. It may even be all that's possible. | mbreese wrote: | I wouldn't go as far as to say an "unimportant" remote | sensor... but I think you're correct in spirit. | | I could think of an instance where you'd like to log | data, but the occasional datapoint being missing wouldn't | be terrible. Maybe something like a temperature monitor | -- you'd like to have a record of the temperature by the | minute, but if a few records dropped out, you'd be able | to guess the missing values from context. Something like | the data monitoring equivalent of UDP vs TCP. | xeromal wrote: | I just want remind people that this video exists. | | https://www.youtube.com/watch?v=b2F-DItXtZs | vorticalbox wrote: | > Mongo can call this function multiple times with the same | session object. | | isn't that the point? you can use a session to do multi actions | within that session. | inglor wrote: | If you have code that looks like this: | withTransaction(async session => { await | Promise.all([someOp(sesson), | someOtherOp(session)]); }); | | Mongo may retry running it (calling the function again) if a | "TransaientTransactionError" is raised (the transaction is | retried from the client side rather than at the cluster). | | However, when the driver calls your function again it doesn't | invalidate the `session` object - so previous calls to the | same function can make updates to the database. | | Let's say `someOp` does something that causes the transaction | to retry and `someOtherOp` is doing something non-mongo- | related in the meantime (like pulling a value from redis). | Now `someOtherOp` reached the mongo part of its code and it | is executing it happily with the same session object (so | operations succeed although they really shouldn't) | | The point of transactions like you said is to perform | multiple operations atomically and for them to happen | "exactly once or not at all". With Mongo in practice it is | very easy to get "Once and some leftovers from a previous | attempt". | IgorPartola wrote: | Sorry, I haven't had my coffee yet. If I am reading this | correctly, either someOp() or someOtherOp() may execute | first, no? And if you introduce an external database, why | do you expect Mongo to handle that rollback? Say | someOtherOp() increments a Redis value by 1. If that part | executed first since both are asynchronous here, what would | a Mongo session have to do with it? | | What exactly would invalidating that session object do | here? And what would the session object do after it was | invalidated? | [deleted] | waheoo wrote: | It sounds like the old session object is reused and | becomes live again or something. | Namari wrote: | I think this is the expected behaviour of the transaction | but the problem comes from the fact you wrap all DB | operation inside a Promise.all. | | Because you wrap the DB operations inside a Promise.all, it | means it will run them all BUT it will not revert them if | one fails (it's not atomic, it just says that one has | failed and you need to catch it), it will reject them but | not revert them. (the CUD operation will already have | changed the data) The problem I believe is the transaction | is considering the Promise.all and not what's inside of it | so it will run it again despite the fact that some have | already succeeded earlier | | I think you just have to resolve each of them outside a | Promise.all. In your case because Promise.all has been | rejected it will redo the transaction, therefor it will | redo the one that have already worked in the first call. | | I'm no expert but this is how I understand it. | gabrieledarrigo wrote: | This is right. Are you sure nglor that you know how to | write code? | bambataa wrote: | Thanks for this explanation. So if I understand correctly, | `someOp` has thrown an error but this doesn't affect | `someOtherOp`? So `someOtherOp` will end up being called | twice? | inglor wrote: | Correct, the easy workaround is not to use that | transaction API and write your own disposer instead of | using withTransaction. | [deleted] | takeda wrote: | MySQL is less of a joke than MongoDB is. They similarly started | by someone who didn't know anything about databases and learned | about them on the go. Actually both of them started as much | faster alternatives to other databases, both ended up having | complete rewrite of its engine written by someone from outside | that knew their stuff. MySQL ISAM then MyISAM and then InnoDB | (written by an outsider). Similarly MongoDB got a WiredTiger | written. | | The thing is that MySQL is older so it went through all of it | earlier, but it still suffers from poor decisions from the | past. This is contrasting with PostgreSQL, where correctness | and reliability was #1 from the beginning. It started as an | awfully slow database, but performance for improved and we now | have correct, reliable and fast database. | berns wrote: | MySQL is no joke, nothing is perfect and Postgresql is not | 100% reliable. Remember: | | Transaction ID wraparound: | https://twitter.com/bcantrill/status/1110647418008133632 | | Incorrect use of fsync: | https://news.ycombinator.com/item?id=19119991 | goatinaboat wrote: | _MySQL is no joke_ | | If you were around back in the day you will remember the | MySQL team claiming that no one needed transactions or | referential integrity, that you should just do it yourself | in the application... | beatrobot wrote: | And still no transactional DDL in MySQL | mathnode wrote: | No, but it does support online DDL for some operations in | InnoDB. | | Very few database systems support online DDL, which | unlike a transaction, does not require undo or rollback | resources. Of course one must have a rollback procedure | if something fails, but you need one for transactions | too, just in case. | | An online rollback is far lest costly than a | transactional rollback, because and online rollback is | just undoing what you did. Added a column you didn't want | in one query? Remove it again in another, very quickly. | | TokuDB (a mysql/mariadb storage engine) supported all DDL | as an online operation. But percona killed it in favour | of TokuMX, the MongoDB equivalent. | | TokuMX has no upgrade path to wired tiger, only one major | customer at Percona (I can't say who it is) and no | engineers. | | Any kind of DDL is tricky and requires users to RTFM for | the intricacies of their chosen database. One size rarely | fits all. | edw wrote: | MySQL's rise IMO cannot be considered without also | looking at the rise of Ruby on Rails and other CRUD- | optimized platforms and frameworks. Also ORMs. These | things denigrated the idea of using an RDBMS as anything | but a dumb table store. Features like stored procedures | and views were seen as pointless. MySQL was the perfect | database for people who had no respect for databases. | twic wrote: | Does MySQL support check constraints yet? | dnissley wrote: | It does finally! In 8.0.16+ | [deleted] | hodgesrm wrote: | > Even tools people joke about a lot like mysql never gave me | this sort of data corruption. | | That's about a decade out of date at this point. MySQL/InnoDB | is the standard table engine and corruption is exceedingly | rare. As of 2014, when I last directly worked on MySQL prod | systems, there was no practical difference from PostgreSQL in | terms of transactional guarantees. That includes APIs like JDBC | which we used for billions of transactions. | morelisp wrote: | MySQL still has no transactional DDL (and I think still even | autocommits if you try). This is a major difference from | Postgres which I believe supports everything short of | dropping tables. | yobert wrote: | Every month, we do an external database import into our | production PostgreSQL database. In a single transaction, we | drop dozens of tables, create new ones with the same names, | insert hundreds of thousands of rows, and recreate indexes, | all in a single transaction. It works flawlessly. | takeda wrote: | I wouldn't use that particular thing against MySQL. DDL | normally supposed to be always outside of a transaction, | it's just PostgreSQL feature that you can use them inside | and be able to rollback. BTW I'm convinced you also can | drop table within a transaction in PostgreSQL. | morelisp wrote: | No, MySQL stands out here. Postgres, SQL Server, DB2, and | Firebird all give at least some way to do some major DDL | transactionally. Usability varies (e.g. Oracle supports a | very specific kind of change that is not its normal DDL | statements), but it's at least possible. | | https://wiki.postgresql.org/wiki/Transactional_DDL_in_Pos | tgr... | | That MySQL autocommits is also even worse than just | "doesn't support it." | dragonwriter wrote: | > DDL normally supposed to be always outside of a | transaction | | A basic element of the relational model is that metadata | is stored as relational data and that the same guarantees | that apply to manipulating main data in the database | apply to manipulating the schema metadata. | | It's true that many real relational databases compromise | on this element in various ways at times, but it is | absolutely not the case that DDL "is supposed to be" non- | transactional. | Carpetsmoker wrote: | The biggest issue with MySQL/MariaDB isn't so much data | corruption at the InnoDB level but stuff like: | MariaDB [test]> create table test ( i int ); Query OK, | 0 rows affected (0.06 sec) MariaDB [test]> | insert into test values (''), ('xxx'); Query OK, 2 row | affected, 2 warning (0.01 sec) MariaDB [test]> | select * from test; +------+ | i | | +------+ | 0 | | 0 | +------+ 2 | row in set (0.01 sec) | | There's a bunch of other similar caveats as well, and this | can really take you by surprise. I've seen it introduce data | integrity issues more than once. | | That's a new MariaDB 15.1 with the default settings I just | installed the other day to test some WordPress stuff. I know | there are warnings, and that you can configure this by adding | STRICT_ALL_TABLES to SQL_MODE, but IMO it's a dangerous | default. | | This is also an issue with using MongoDB as a generic | database: every time I've seen it used there were these kind | of data integrity issues: sometimes minor, sometimes brining | everything down. Jepsen reports aside, this alone should make | people double-check if they really want or need MongoDB, | because turns out that most of the time you don't really want | this. | mathnode wrote: | 15.1 is not a version. Since MariaDB 10.2, this is not | possible as strict_trans_tables is enabled by default in | sql_mode. | hintymad wrote: | Just curious, what was the reason that your team decided to | work around the problem instead of migrating away from MongoDB? | inglor wrote: | We have a complicated system and migration is ~3 months we | won't be shipping features. | | We have a roadmap we need to meet and so far we have been | trying to spill money on it rather than developers (paying | mongo atlas) and adding features incrementally as Mongo gets | them (like transactions). | | If this wasn't a startup we would probably rewrite. | capableweb wrote: | Not the author but done similar things (patching something | rather than migrating away from it). Usually it's way more | work to migrate away than just patching it again to fit your | use-case. Once you find yourself having to patch it too | often, you start thinking about migrating away. Then the | research slowly begins ad-hoc until it hits "seems we need to | migrate away now, otherwise we're spending too much time | working around something / fixing their broken shit", that's | when you sit down and decide to migrate away from it. | | Also would depend on how long time you think the application | will be around. You're building an MVP to evaluate something? | Just hack together whatever will work (then throw away). | You're maintain software for a library/archive that will most | likely stick around for a long time, even if they say it's | just temporary? Do decisions that will help in the future, | always. | inglor wrote: | If you work for Mongo and are reading this. Please just fix it. | I don't need to win and I don't care about being "right". | | I just don't want to be called to the office on a weekend | anymore for this sort of BS. | | Production incidents with MongoDB last year: 15 Production | instances with redis, elasticsearch and mysql combined last | year: 2 (and with much less severity) | | Edit: just to add: I didn't pick Mongo, I was just the engineer | called to clean that mess. I created enough of my own messes to | not resent the person who made that shot for it. We are | constantly on the verge of rewriting the MongoDB stuff since a | database that small (~250GB) should really not have these many | issues (In previous workplaces I ran ~10TB PostgreSQL | deployments with much more complicated schemas and queries with | far fewer issues). It's also expensive and support at Mongo | Atlas hasn't been great (we should probably self host but I am | not used to small databases being so problematic) | brianwawok wrote: | This is why most of us don't use mongo in production. It's | just not worth it. Postgres is a tank and supports Json when | you really need it. | Quekid5 wrote: | I was actually amazed that a big CMS/E-commerce vendor | _proudly_ proclaimed in a sales meeting that they were on | MongoDB. | | I suppose salespeople probably aren't into the nitty- | gritty, but their tech people should have warned them about | this. Maybe they were just trying to pull our collective | leg, but I suppose that why I was at that meeting. | | It was obviously an instant 'No'. | leviathant wrote: | There aren't a lot of CMS/Ecommerce vendors that sit on | MongoDB, so maybe we were in a meeting together! | | Even if we weren't - as a sales engineer on a large | CMS/ECommerce platform with merchants running $150M+ in | annual revenue, with an average client retention of seven | years, and two decades of agency experience behind the | decisions around building that platform, if you instantly | said no just because of MongoDB, maybe you don't know as | much about MongoDB as you think you do. | | I came from a SQL background myself, and had reservations | based on all the things I'd read about MongoDB as we | decided to build a platform after doing things bespoke | for two decades, but time has proven our architecture | choices out. It's easy to be proud of something that | works well. | hetspookjee wrote: | The Guardian posted quite a nice blog in 2018 about the | switch to Postgres from MongoDB. Especially interesting | because they intended to use Postgres as replacement | document storage: Here's the link | https://www.theguardian.com/info/2018/nov/30/bye-bye- | mongo-h... | guanzo wrote: | > Automatically generating database indexes on | application startup is probably a bad idea. | | aw crap. oh well it probably doesn't matter for my small- | ish application. | [deleted] | Carpetsmoker wrote: | _I didn 't pick Mongo, I was just the engineer called to | clean that mess._ | | My only experience with MongoDB is being "the engineer called | to clean the mess". I'm sure you can effectively use MongoDB | in production if you're knowledgable and careful, but most | people aren't and they shouldn't have to know the detailed | inner working to not create a mess. | goatinaboat wrote: | _My only experience with MongoDB is being "the engineer | called to clean the mess"._ | | It's always the same | | 1. Newbie webdev (aren't they all) uses MongoDB because | it's easy to use according to blogs and twitter | | 2. Somehow it makes it into production | | 3. A dozen experienced engineers spend years trying to keep | it running | lossolo wrote: | When I was evaluating MongoDb couple of years ago (around the | time they were switching to WiredTiger engine), I've found | memory leak in their Node.js client on day one, I've submitted | a ticket on their Jira and the same time I had a look at other | issues they had there. I saw there memory leak after memory | leak, memory corruption everywhere, data disappearing without | any reason, segfaults etc. After that MongoDB was dropped as a | candidate for a DB in project I was working on, we went with | Postgres and never regretted it. | loeg wrote: | Aphyr is such a competent professional. What a relatively | thorough and polite response to Mongo's inaccurate claims. "We | also wish to thank MongoDB's Maxime Beugnet for inspiration." is | a nice touch. | bithavoc wrote: | > Clients observed a monotonically growing list of elements until | [1 2 3 5 4 6 7], at which point the list reset to [], and started | afresh with [8]. This could be an example of MongoDB rollbacks, | which is a fancy way of saying "data loss". | | I hope they learned the lesson, don't fuck with aphyr. | amenod wrote: | That's... not the lesson they need to learn. Databases are app | foundations. Make sure you do them right and don't overpromise. | baq wrote: | I agree but maybe it's the only lesson they are able to | understand at this time. Their attitude was asking for | somebody to call them, which aphyr is maybe the best | positioned to do. | | I'd love to read a roasting like that authored by Leslie | Lamport for a different perspective but aphyr's works | absolutely stand on their own. | | Any ideas how to get Jepsen and TLA to work together? :) | azernik wrote: | Ouch. This is what you get when you order up a third-party review | and then misrepresent it in advertising. | taywrobel wrote: | I'm still waiting for Jepsen to put Confluent's "Kafka provides | exactly once delivery semantics" claim to the test. | | Since they're claiming something provably false, it'd be nice | to have some empirical evidence as such. | aphyr wrote: | I'm not convinced it _is_ false--IIRC their claim is | specifically w.r.t other Kafka side effects, and those they | _can_ control. | egeozcan wrote: | The general mood I observed about MongoDB was that it used to be | inconsistent and unreliable but they fixed most, if not all of | those problems and they now have a stable product but bad word of | mouth among developers. Personally, I've treated it as "legacy" | and migrated everything that I had to touch since 2013 [0], and | luckily (just read the article so hindsight 20/20 -- transaction | running twice and seeing its own updates? holy...) never gave it | another try. | | [0]: https://news.ycombinator.com/item?id=6801970 (BTW: no, my | dream of simple migration never materialized, but exporting and | dumping data to Postgres JSONB columns and rewriting queries | turned out to be neither buggy nor hard). | cyphar wrote: | > MongoDB was that it used to be inconsistent and unreliable | but they fixed most, if not all of those problems and they now | have a stable product but bad word of mouth among developers. | | This report is 9 days old, and tests the latest stable release | of MongoDB. The problems it discusses are present on modern | MongoDB. | egeozcan wrote: | If it wasn't clear, I said "mood" (what you conveniently | ignored), referring to chit-chat I heard recently, and was | underlining the fact how wrong it has been. I totally | understand what the report says and know what version it | tests. | cyphar wrote: | In my defense, it wasn't clear that's what you were saying | in your original comment. "Mood" has become a filler word | at this point -- hence why I omitted it from the quote -- | and can mean anything from the traditional meaning of "mood | in the room" to "incredibly relatable/factual statement". | How I originally understood your comment was that you were | saying that you felt that most of the issues are in the | past, but you still decided to migrate away from it. | egeozcan wrote: | English is not my mother language and given the down- | votes, probably it's my wording at fault here - sorry. | | I'm glad now that it's been clarified :) | koishikomeiji wrote: | Fuck me gently with a chainsaw | depr wrote: | >Sometimes, Programs That Use Transactions... Are Worse | | I understood that reference ___________________________________________________________________ (page generated 2020-05-24 23:00 UTC)