hngopher.com

       [HN Gopher] Jepsen Disputes MongoDB's Data Consistency Claims
       ___________________________________________________________________
        
       Jepsen Disputes MongoDB's Data Consistency Claims
        
       Author : anarchyrucks
       Score  : 326 points
       Date   : 2020-05-23 18:33 UTC (4 hours ago)
        
 (HTM) web link (www.infoq.com)
 (TXT) w3m dump (www.infoq.com)
        
       | madhadron wrote:
       | In the circles I run in, MongoDB is regarded as a joke and the
       | company behind it as basically duplicitous. For example, they
       | still list Facebook as their first user of MongoDB on their
       | website, for example, but there is no MongoDB use in Facebook
       | hasn't been for years (it came in only via a startup
       | acquisition).
       | 
       | I had the misfortune to use MongoDB at a previous job. The
       | replication protocol wasn't atomic. You would find partial
       | records that were never fixed in replicas. They claimed they
       | fixed that in several releases, but never did. The right answer
       | turned out to be to abandon MongoDB.
        
         | macintux wrote:
         | The joke I learned early on: "Migrating away from Mongo is
         | trivial: wait long enough, and all your data will be gone
         | anyway."
         | 
         | I imagine things are better now.
        
           | MrBuddyCasino wrote:
           | MongoDB: the Snapchat of databases.
        
             | DevKoala wrote:
             | That's mean to SnapChat.
        
               | craftinator wrote:
               | Snapchat at least warns you that your data will
               | disappear.
        
         | chx wrote:
         | There was a time when I advocated for MongoDB with the usual
         | caveats. The ability to easily store and index complex data was
         | of great value. And then in 2015 October, within a week of each
         | other, SQLite and MySQL both learned how to index on
         | expressions and store JSON (SQLite 3.9 2015-10-14, MySQL 5.7
         | 2015-10-21). PostgreSQL added jsonb the year prior in 9.4. At
         | that moment the value of MongoDB for me diminished greatly.
        
         | tehlike wrote:
         | I got my pm friend prototype his idea on mean stack, but when
         | we got more serious, immediately transitioned to postgres and
         | started using sequelize as the orm. Pretty good decision so
         | far. I don't think they will have cases that won't scale with
         | orm for foreseeable future.
        
         | sneak wrote:
         | I run in two circles: the one you mention, but also the other:
         | I have gotten pushback from people (usually devs at clients of
         | mine) for saying it's lunacy to run a real, actual business on
         | Mongo. (This has always happened from orgs with <10TB of data
         | in the database.)
         | 
         | You'd be astounded how common it is at so-called "enterprise"
         | startups. It blew my mind.
         | 
         | A lot of people simply never went through the LAMP stack days
         | and have little/no experience with real databases like Postgres
         | (or even MySQL). It's disheartening.
        
         | Diederich wrote:
         | > but there is no MongoDB use in Facebook hasn't been for years
         | 
         | Are you sure?
        
           | hypewatch wrote:
           | It's certainly not used for any mission critical apps.
           | Facebook's stack is pretty well known. They've been using
           | sharded MySQL for a while now. Instagram started on
           | PostgreSQL but I believe has switched to Cassandra.
        
             | Diederich wrote:
             | > mission critical apps
             | 
             | Thanks for the clarification.
             | 
             | As an example: would you consider the backend software
             | stack that manages physical access to the campus 'mission
             | critical'?
        
               | hypewatch wrote:
               | I would not. Facebook.com still operates with or without
               | that system. Especially now with COVID-19, I'm sure it's
               | not being used at all.
               | 
               | Mission critical - essential for operating Facebook.com
        
         | polote wrote:
         | Nobody seems to like it. Someone has any idea on why the
         | company still has their revenue increasing ?
        
         | thomascgalvin wrote:
         | You can tell a lot about a developer by their preferred
         | database.
         | 
         | * Mongo: I like things easy, even if easy is dangerous. I
         | probably write Javascript exclusively
         | 
         | * MySQL: I don't like to rock the boat, and MySQL is available
         | everywhere
         | 
         | * PostgreSQL: I'm not afraid of the command line
         | 
         | * H2: My company can't afford a database admin, so I embedded
         | the database in our application (I have actually done this)
         | 
         | * SQLite: I'm either using SQLite as my app's file format,
         | writing a smartphone app, or about to realize the difference
         | between load-in-test and load-in-production
         | 
         | * RabbitMQ: I don't know what a database is
         | 
         | * Redis: I got tired of optimizing SQL queries
         | 
         | * Oracle: I'm being paid to sell you Oracle
        
           | maevyn11 wrote:
           | ha, nailed it dude.
        
           | ceocoder wrote:
           | This might be a stupid question, but surely no one thinks of
           | RabbigMQ as a _database_ right? I've used it from 2012 to
           | 2018 extensively, including using things like shovels to
           | build hub spoke topologies, however not once did I think of
           | it as anything but a message broker.
           | 
           | Did I miss something huge?
        
             | detaro wrote:
             | I interpret it as they'd _probably_ not call it a database,
             | but they might use it in places where a database would be
             | better suited, and effectively store data in it.
        
             | henryfjordan wrote:
             | RabbitMQ stores your data, right? Then it's a database!
             | That's pretty much all it takes. A flat file, memory-store,
             | SQL DB, Document store, any of them can be databases if
             | that's where you stick your data!
             | 
             | But also no, RabbitMQ and Kafka and the like are clearly
             | message buses and though they might also technically
             | qualify as a DB it would be a poor descriptor.
        
               | gopalv wrote:
               | > Kafka and the like are clearly message buses and though
               | they might also technically qualify as a DB
               | 
               | ksqldb is actually a database on top of this.
               | 
               | The thing is that they have an incrementally updated
               | materialized view that is the table, while the event
               | stream is similar to a WAL ("ahead of write logs?" in
               | this case).
               | 
               | Because eventually you can't just go over your entire
               | history for every query.
        
               | [deleted]
        
               | makach wrote:
               | Oh ho ho ho. What weird things we use as a databases. I
               | remember when I first started out as a consultant
               | developer we were using a CMS as our data repository
               | because someone thought that was a good idea. (It
               | wasn't). The vendor was flown in from the states to help
               | troubleshooting. I will never forget how he looked at me
               | when I had to explain to him why we made so many nodes in
               | the content tree, it was because we were using the CMS as
               | a repository.
        
               | ceocoder wrote:
               | Ah I see, we are going with "well technically it stores
               | something therefore it is database joke". Now I'm fully
               | onboard :)
               | 
               | Back when I worked in LA my CTO used to joke that most
               | places use Microsoft Outlook as a database and Excel as
               | BI tool.
        
               | goatinaboat wrote:
               | _well technically it stores something therefore it is
               | database joke_
               | 
               | Confluent, the company behind Kafka, are 100% serious
               | about Kafka being a database. It is however a far better
               | database than MongoDB.
        
               | m0zg wrote:
               | You laugh, but I bet Excel produces orders of magnitude
               | more real "business intelligence" than all other "BI"
               | tools combined.
        
             | Hamuko wrote:
             | > _This might be a stupid question, but surely no one
             | thinks of RabbigMQ as a database right?_
             | 
             | Arguably the world's most popular database is Microsoft
             | Excel.
        
           | cgijoe wrote:
           | HAHAHAH The RabbitMQ one got me. Have your upvote, sir.
        
           | 1123581321 wrote:
           | As someone who chose MySQL and provides direction to
           | developers who really like Postgres, and who also uses
           | Postgres for fun, I do find myself having to both defend
           | MySQL as a prudent option and convince them that I know
           | anything at all about Postgres or computer science. :)
        
             | ForHackernews wrote:
             | I've heard MySQL (well, MariaDB, really) has improved a lot
             | in recent years, but I still can't imagine why I'd ever
             | choose it over Postgres for a professional project. Is
             | there any reason?
             | 
             | It used to be that bargain basement shared-hosting
             | providers would only give you a LAMP stack, so it was MySQL
             | or nothing. But if you're on RDS, Postgres every time for
             | my money.
        
               | VWWHFSfQ wrote:
               | mysql's admin tools are still far superior than what's
               | available for postgres
        
               | cyral wrote:
               | What tools are these? Curious as a Postgres user
        
             | gav wrote:
             | I tend to find people who argue with me against MySQL bring
             | up things that haven't been true in a long time such as
             | Unicode or NULL handling.
             | 
             | I'd probably choose Postgres over MySQL for a new project
             | just to have the improved JSON support, but there's upsides
             | to MySQL too:
             | 
             | - Per-thread vs per-process connection handling
             | 
             | - Ease of getting replication running
             | 
             | - Ability to use alternate engines such as MyRocks
        
               | johannes1234321 wrote:
               | MySQL also has great JSON features (json data type,
               | virtual indexes onnit, multi-value (array) indexes,
               | json_table, ....)
        
           | yawaramin wrote:
           | I have other boats to rock than MySQL! ;-)
        
           | tester756 wrote:
           | What about MSSQL?
        
           | benibela wrote:
           | What if they prefer an XML database (like basex, exist,
           | marklogic)?
        
             | kabes wrote:
             | Psychopath
        
           | threeseed wrote:
           | And you can tell a lot about a developer when they post
           | comments like this.
           | 
           | Almost none of is remotely accurate e.g. RabbitMQ isn't even
           | a database.
        
             | beardbandit wrote:
             | Man, people really hate Mongo.
             | 
             | We use it for a very specific use case and its been perfect
             | for us when we need raw speed over everything. Data loss is
             | tolerable.
        
               | craftinator wrote:
               | It seems like you have the only good use case for it
               | pegged down. I've worked at multiple companies that
               | really, really didn't understand that putting something
               | into the DB comes with some probability that it'll never
               | come out. The arguments were "but it's a dataBASE, it
               | stores data. They'd never sell this as a product if it
               | LOST data; then it wouldn't be a database..."
        
             | rickbad68 wrote:
             | LOL
        
             | [deleted]
        
             | wzy wrote:
             | I can't believe the one item that was so obviously added as
             | a joke went right over head.
             | 
             | It may be good idea to take a break from the computer and
             | find something less stressful to do.
        
               | ceocoder wrote:
               | Perhaps that's because some other message brokers are now
               | being touted as databases[0][1], I remember seeing a
               | thread about it on HN couple of days ago.
               | 
               | [0] https://www.confluent.io/blog/okay-store-data-apache-
               | kafka/
               | 
               | [1] https://dzone.com/articles/is-apache-kafka-a-
               | database-the-20...
        
             | craftinator wrote:
             | And now I can tell a lot about you as a developer; maybe
             | stay away from any kind of complex abstraction, wouldn't
             | want it to go right over your head, ya know?
        
             | macmac wrote:
             | Re RabbitMQ, isn't that OPs point.
        
           | tjalfi wrote:
           | SQL Server: I use C# and write line-of-business applications.
        
       | ncmncm wrote:
       | MongoDB's big problem is that their present user base _does not
       | want_ the problems fixed, particularly at default settings,
       | because it would mean going slower. Their users are self-selected
       | as not caring much about integrity and durability. There are lots
       | of applications where those qualities are just not very
       | important, but speed is. People with such applications do need
       | help with data management, and have money to spend on it.
       | 
       | The stock market wants to see the product as a competitor with
       | Oracle, so demands all the certifications that say so. MongoDB
       | marketing wants to be able to collect money as if the product
       | were competitive. Many of the customers have management that
       | would be embarrassed to spend that kind of money on a database
       | that is not. And, ultimately, many of the applications do have
       | durability requirements for _some_ of the data.
       | 
       | So, MongoDB's engineers are pulled in one direction by actual
       | (paying) users, and the opposite direction by the money people.
       | It's not a good place to be. They have very competent engineers,
       | but they have set themselves a problem that might not be solvable
       | under their constraints, and that they might not be able to prove
       | they have solved, if they did. Time spent on it does not address
       | what most customers want to see progress on.
        
         | threeseed wrote:
         | If they only cared about performance then they would've left
         | the write concern defaults to not acknowledge writes either
         | locally or within a replica set. Or just read from the nearest
         | replica and don't worry about potential consistency issues.
         | 
         | Also this isn't 2011. MongoDB is not a competitor to Oracle and
         | never really has been by people that knew that a DocumentDB was
         | not usable as a SQL one. It's other SQL databases that are the
         | real competitors e.g. Snowflake, Redshift are.
        
           | ncmncm wrote:
           | You know it, I know it, MDB knows it, and most of their
           | customers know it, but that doesn't matter: the stock market
           | doesn't. MDB wants to be valued like a durable-database
           | company, and to be able to charge durable-database prices.
           | They need a plausible durable-database story to get those,
           | regardless of what actual current users want.
           | 
           | It is possible there are still potential users not buying
           | until they get that story. MDB wants those users.
        
       | seemslegit wrote:
       | "We found that due to these weak defaults, MongoDB's causal
       | sessions did not preserve causal consistency by default: users
       | needed to specify both write and read concern majority (or
       | higher) to actually get causal consistency. MongoDB closed the
       | issue, saying it was working as designed, and updated their
       | isolation documentation to note that even though MongoDB offers
       | "causal consistency in client sessions", that guarantee does not
       | hold unless users take care to use both read and write concern
       | majority. A detailed table now shows the properties offered by
       | weaker read and write concerns."
       | 
       | That sounds like a valid redress, or am I missing something ?
        
         | Smaug123 wrote:
         | Kyle's point is that it's arguably valid but certainly
         | unhelpful: the _default settings_ are liable to lead to data
         | loss. Moreover, he draws attention specifically to transactions
         | as something which you would expect to make things safer, but
         | in fact there 's a rather arcane part of the documentation that
         | notes that you need to manually specify both read and write
         | concerns on every transaction individually if you want
         | transactions to behave consistently, regardless of the concerns
         | specified at the database level.
         | 
         | Basically, there are a large number of pitfalls that it's very
         | easy to fall into unless you have an encyclopaedic knowledge of
         | the documentation, and you need to ignore some of the words
         | that are used (like "transaction" or "ACID") because they carry
         | connotations that either do not apply or only apply if you do
         | extra work to make it so.
        
           | scarface74 wrote:
           | How is this any different than DynamoDB where you specify
           | that you want either eventual consistency vs strong
           | consistency? DDB also does eventual consistent reads by
           | default.
           | 
           | Is the argument that Mongo's documentation isn't clear?
        
             | doublesCs wrote:
             | Oops. Turns out I was right.
             | 
             | https://news.ycombinator.com/item?id=23271211
             | 
             | The timing of this is absolutely beautiful.
        
               | scarface74 wrote:
               | So now we shouldn't ever trust a project because they
               | don't have good technical writers?
               | 
               | I don't have a dog in the Mongo fight. I haven't done an
               | implementation on top of it in years and probably the
               | next time I do something with "Mongo" it will probably be
               | AWS's Document DB with Mongo support. That's based on
               | AWS's own code and storage tier and doesn't have the same
               | characteristics as Mongo proper.
        
               | doublesCs wrote:
               | > So now we shouldn't ever trust a project because they
               | don't have good technical writers?
               | 
               | > the newer MongoDB 4.2.6 has more problems including
               | "retrocausal transactions" where a transaction reverses
               | order so that a read can see the result of a future
               | write.
        
               | tpxl wrote:
               | For what it's worth, Document DB doesn't support a lot of
               | the Mongo API, such as $$ROOT in aggregations, and it
               | can't use indices on (paraphrased) "SELECT * FROM x WHERE
               | id IN [list]" if the list length is > 10.
               | 
               | If you ask me, if there's something worse than Mongo it's
               | Document DB.
        
             | Smaug123 wrote:
             | I trust Kyle when he tells me that the behaviour he
             | observes is surprising. From the analysis
             | (https://jepsen.io/analyses/mongodb-4.2.6):
             | 
             | "In order to obtain snapshot isolation, users must be
             | careful not only to set the read concern to snapshot for
             | each transaction, but also to set write concern for each
             | transaction to majority. Astonishingly, this applies even
             | to read-only transactions."
             | 
             | "This behavior might be surprising, but to MongoDB's
             | credit, most of this behavior is clearly laid out in the
             | transactions documentation... MongoDB offers database and
             | collection-level safety settings precisely so users can
             | assume all operations interacting with those databases or
             | collections use those settings; ignoring read and write
             | concern settings when users perform (presumably) safety-
             | critical operations is surprising!"
        
               | scarface74 wrote:
               | There is difference between "Mongo's documentation sucks"
               | and "Mongo is technically deficient". The former can be
               | corrected by updating the documentation.
               | 
               | Yes, I agree as far as the end user is concerned, they
               | are losing data either way.
        
       | therealdrag0 wrote:
       | How is Cassandra as an alternative to MongoDB?
        
       | gigatexal wrote:
       | Typical HN posts of late hating on Javascript and MongoDB from
       | database elitists -- the thing is there's a tool for a job and as
       | engineers we need to figure out what tool best suits our use
       | cases. It could very well be a NoSQL database such as Mongo or a
       | relational one like Postgres or MySQL.
        
         | calcifer wrote:
         | > the thing is there's a tool for a job
         | 
         | Really? Which job do you belive needs a _" maybe store some of
         | this data, sometimes"_ kind of database?
        
       | jedberg wrote:
       | MongoDB started life as a database designed for speed and ease of
       | use over durability. That's not a good look for a database.
       | 
       | People have told me that they have since changed, but the
       | evidence is overwhelmingly and repeatedly against them.
       | 
       | They seem to have been successful on marketing alone. Or people
       | care more about speed and ease of use than durability, and my
       | assumptions about what people want in a database are just wrong.
        
         | collyw wrote:
         | Maybe it's just because I know SQL reasonably well but I don't
         | even find Mongo particularly easy to use. Not for complex
         | queries anyway.
        
         | cpuguy83 wrote:
         | I used it effectively to denormalize and combine some data from
         | other services... sort of like a 2nd level, queryable cache.
         | Worked very well for my needs. This was 7-8 yrs ago.
        
         | otterley wrote:
         | > MongoDB started life as a database designed for speed and
         | ease of use over durability. That's not a good look for a
         | database.
         | 
         | I think it depends. One could say the same about Redis, but
         | it's wildly successful and people love it.
         | 
         | The difference is now they are advertised. Redis makes no
         | claims to be anything other than what it is - a fast in-memory
         | database that has some persistence capability but isn't meant
         | to be a long-term data store. MongoDB, on the other hand, made
         | (and continues to make) claims about being comparable in
         | atomicity and durability to traditional SQL databases (but
         | magically much faster!) that haven't withstood scrutiny.
         | 
         | Keep in mind, too, that most data ain't worth much. It's one
         | thing to entrust data of low value in MongoDB; another to store
         | mission-critical data in it. I would look askew at leadership
         | who didn't ask hard questions about storing data worth millions
         | or billions of dollars in MongoDB without frequent snapshots --
         | and even then, the value mustn't be contingent on the 100%
         | accuracy of said data.
        
           | gav wrote:
           | When I'm thinking about data stores in large systems I like
           | to break them down depending on how they are used on two main
           | axes: is it fast/slow moving and durability from "we don't
           | care" and "we must never lose data".
           | 
           | It's easier to reason about systems if there's fewer things
           | that require durability guarantees, ideally you want to be
           | able to draw data flows that look like a tree instead of a
           | graph.
           | 
           | I find that Redis fits great because it's perfect for a whole
           | bunch of different temporal shared state needs, everything
           | from sessions to partial results. I've also deployed things
           | like Ehcache, MongoDB, and Memcached to fit these needs and
           | found other tools such as Kafka or RabbitMQ to be great
           | "glue".
           | 
           | Having the root of your important data be something "boring"
           | like Postgres or MySQL (or even Oracle!) is just good risk
           | management to me. I wouldn't want to trust Redis or MongoDB
           | for important data because it adds to the things I have to
           | worry about. It's "keeping your eggs in one basket" while
           | making sure that basket is really well looked after.
        
           | bjt wrote:
           | Yes. What I love most about Redis is that the fundamental
           | tradeoffs of the algorithms it's built on are surfaced up
           | through the interface, and made very plain in the
           | documentation.
        
         | Jare wrote:
         | Reading past marketing blurbs and using products for the things
         | they are designed is part of any engineer's job. I was
         | irritated by MongoDB's claims and defaults, but that didn't
         | stop us from putting it in production. We used it from 2012 to
         | 2016 (their most infamous years?), and for our use cases,
         | scale, size+expertise, and feature set, it was a perfect match.
         | In our case, durability was a smaller concern by design (lots
         | of write-only data, lots of ephemeral data), but we still
         | configured it carefully and never ran into any data loss
         | whatsoever; snapshots worked, migrations worked, etc.
         | 
         | If the service had lasted longer, scaled bigger, and the
         | business it supported had been more successful, we might have
         | ended up with a now-classic MongoDB to pg migration. That was
         | always an acceptable outcome, and it would have not invalidated
         | going with Mongo at the start.
        
           | collyw wrote:
           | >In our case, durability was a smaller concern by design
           | (lots of write-only data, lots of ephemeral data),
           | 
           | I assume that you mean write once data. If you mean write
           | only you might as well use /dev/null.
        
         | thomascgalvin wrote:
         | > Or people care more about speed and ease of use than
         | durability
         | 
         | I think 90% of the Mongo installs I've been exposed to were set
         | up by people that were tired of fighting with Hibernate
         | configurations and schema migrations.
         | 
         | It's also popular among people whose definition of "legacy
         | software" is "that app I stopped working on after three months
         | because I have something shiny and new."
        
           | collyw wrote:
           | We have it at our work. I bet it's because it was the hip new
           | thing to try out in 2013. Our tech lead is more into tech
           | challenges than building a maintainable app.
        
       | Hydraulix989 wrote:
       | This has been a known issue for a while:
       | 
       | https://hackingdistributed.com/2013/01/29/mongo-ft/
       | 
       | MongoDB: Broken By Design
        
         | threeseed wrote:
         | Might want to read up as this involves a completely different
         | set of issues.
         | 
         | And most of those listed in the blog were fixed many years
         | before 2013.
        
         | Beefin wrote:
         | This is a good of the HN MDB hate: everything referenced has
         | been addressed long before 2013. It was a new DB then and early
         | adopters should know what they're getting into
        
       | arpa wrote:
       | Oh, Jepsen and MongoDB again? Somebody get the popcorn!
        
         | balfirevic wrote:
         | Unfortunately, not an entertaining showdown - too one-sided.
        
           | saagarjha wrote:
           | Because MongoDB is web scale?
        
             | senko wrote:
             | Some readers might not be familiar with that particular
             | meme: https://m.youtube.com/watch?v=b2F-DItXtZs
             | 
             | IMHO it perfectly describes the hype-reality disconnect at
             | the early days of MongoDB. Yeah it was that bad.
             | 
             | Mongo has improved since, the hype has toned down and the
             | NoSQL space is more crowded these days.
        
               | collyw wrote:
               | The damage is done. I have to use this crap at work when
               | we should be using an SQL database. We have been planning
               | a migration since before I started a year and a half ago.
               | I won't be surprised if we are still on Mongo in another
               | year and a half.
        
               | karatestomp wrote:
               | Fantasy: "Don't use special database features (by which I
               | mean, like, any features) and make sure our ORM supports
               | a ton of different datastores because we might want to
               | change to a different database at some point and don't
               | want to be tied to this one."
               | 
               | Reality: Three app rewrites later plus another
               | application written talking to the same DB, and the
               | database is still the same.
        
               | znpy wrote:
               | i remember diaspora chanting about using mongodb.
               | 
               | then a year or two later they admitted that their data
               | model mostly fitted the relational model, and that they
               | spent a lot of time basically reimplementing relational
               | integrity in application code, in ruby.
               | 
               | yeah, diaspora has never been fast. I'm not sure they can
               | blame it on mongodb though.
        
               | collyw wrote:
               | I remember the Mongo hype when it came out and I really
               | couldn't understand it. You are just throwing away a lot
               | of useful features of a relational database because
               | "schemaless" and "big data". The majority of people using
               | it were on single server setups.
        
             | dathinab wrote:
             | Oh that brings back memories ;)
        
           | arpa wrote:
           | I remember having immensely enjoyed the original "Call me
           | maybe" analysis [https://aphyr.com/posts/284-jepsen-mongodb].
           | Sometimes it's just fun to see someone beaten.
        
       | speedgoose wrote:
       | The Jepsen analysis : https://jepsen.io/analyses/mongodb-4.2.6
        
       | hartator wrote:
       | [repost - asking for help] I am disappointed with the direction
       | that MongoDB took this past few years. Going ACID shows in
       | benchmarks [1] and it's not advisable if you are using MongoDB
       | for stats and queue. (No one uses MongoDB for financial
       | transactions despite the changes.)
       | 
       | And the recent change to a restrictive license is worrisome as
       | well. I have been thinking of forking 3.4 and make it back to
       | "true" open source and awesome performance. (If any C++ devs want
       | to help out, reach out to me! username @gmail.com)
       | 
       | [1] https://link.medium.com/PXIeZfhhH6
        
         | manigandham wrote:
         | Postgres already handles JSON well. MySQL does a good job now
         | too. And there are tons of other JSONb/document-stores like
         | Couchbase, CouchDB, RavenDB, MarkLogic, ElasticSearch,
         | ArangoDB, CosmosDB, AWS DocumentDB, and even RethinkDB that
         | still exists.
         | 
         | It's a nice goal but there's likely not much of a commercial
         | market for it, if that's your roadmap.
        
         | JoshTriplett wrote:
         | > And the recent change to a restrictive license is worrisome
         | as well. I have been thinking of forking 3.4 and make it back
         | to "true" open source and awesome performance.
         | 
         | Please do; someone needs to take that first step, and then many
         | more could potentially contribute.
        
           | DabbyDabberson wrote:
           | the license change was needed to keep MDB alive. Amazon's
           | documentDB is just a fork of mdb before the new license.
        
         | toomuchtodo wrote:
         | Why not use PostgreSQL instead? It supports a JSON document
         | data type natively. It also has exceptional stewardship as an
         | open source project.
         | 
         | Mongo should never be a first choice, but a last choice for
         | edge cases.
        
           | wdb wrote:
           | I really enjoy using PostgreSQL only I just don't know how to
           | make it scale easily. Running it on large VM in the cloud
           | works fine until you have lots of data or need it easily
           | accessible. How can you have the data in three different
           | regions (e.g. Europe, US, Asia) when you using something like
           | Google Cloud? Seems to be a hard problem to crack.
        
           | aeonsky wrote:
           | Postgres has terrible indexing with json. It doesn't keep
           | statistics so simple queries sometimes take much longer than
           | expected due to query planner not knowing much about the
           | data.
        
             | [deleted]
        
             | pletnes wrote:
             | DB noob question: if you know that you should be indexing
             | on a json attribute, can't you put it into a <<proper
             | column>> and index there?
        
               | why-el wrote:
               | You could, of course. But that would mean that you are
               | effectively not using json anymore. You need to pull the
               | data out of your json on each write, update in two
               | places, and so on. And if you need to delete a json
               | column, what do you do with the other one? You need to
               | delete it also. You are then managing two things.
               | 
               | There is always a trade off. If the column is important
               | enough, then you are right, it should stand on its own,
               | but then you lose the json flexibility. I personally
               | almost always only use jsonb if I know I only care about
               | that overall object as a whole, and rarely need to poke
               | around to find an exact value. As a the grandparent
               | comment mentions, if you do need a particular value, then
               | it might be slower if your JSON records are too different
               | (if you think about it, how can you calculate selectivity
               | stats on a value if you have no idea how wide or
               | different JSON records are?).
        
               | Mister_Snuggles wrote:
               | There are a number of ways to do this:
               | 
               | * Extract the attributes you're interested in into their
               | own columns, index these. With the extraction happening
               | outside the database, this is the most flexible option.
               | 
               | * Similar to above, use a trigger to automatically
               | extract these attributes.
               | 
               | * Also similar to above, used a generated column[0] to
               | automatically extract these attributes.
               | 
               | * Create an index on the expression[1] you use to extract
               | the attributes.
               | 
               | My use a JSON in PostgreSQL tends towards the first
               | option. This works well enough for cases where documents
               | are ingested and queried, but not updated. The last three
               | options are automatic - add/change the JSON document and
               | the extracted/indexed values are automatically updated.
               | 
               | [0] https://www.postgresql.org/docs/12/ddl-generated-
               | columns.htm...
               | 
               | [1] https://www.postgresql.org/docs/12/indexes-
               | expressional.html
        
             | [deleted]
        
             | [deleted]
        
             | fabian2k wrote:
             | I've seen that as well, the default estimate for jsonb can
             | seriously confuse the query planner. There is a patch in
             | PG13 that addresses this as far as I understand, but I'm
             | not familiar enough with PG internals to be sure I'm
             | reading that right. I'll be playing with this when PG13 is
             | out, the jsonb feature is really useful, though I wouldn't
             | recommend to shove relational data into it. Many things are
             | much, much harder to query inside jsonb than regular
             | columns.
             | 
             | There are ways around the statistics issue in some cases,
             | e.g. defining a functional index on a jsonb property will
             | collect proper statistics.
        
             | orf wrote:
             | "It doesn't keep statistics" is a weird way to say "I
             | expect full table scans to always be fast".
             | 
             | Create a functional index.
        
           | hartator wrote:
           | > Why not use PostgreSQL instead? It supports a JSON document
           | data type natively.
           | 
           | Yes, that's the thing, it's just a field type. It's not
           | really that different than dumping your JSON in a TEXT
           | column. MongoDB is fun because it's truly JSON - BSON - so
           | you don't have to run migrations, you can store complex
           | documents, and have a more object oriented way of storing
           | your data than SQL.
        
             | [deleted]
        
             | sergiotapia wrote:
             | It's completely different than dumping json into a text
             | field...
             | 
             | Read the docs, you can do a lot of fancy JSON stuff in
             | plain ol' Postgres. It's really powerful and guarantees
             | your data.
        
             | throwanem wrote:
             | You should probably read the Postgres documentation [1]
             | before you make erroneous claims like this. Postgres JSON
             | fields can be destructured, queried, and aggregated
             | sufficiently to cover _at least_ the 90% cases in MongoDB
             | usage.
             | 
             | I'll grant that Postgres probably isn't as much fun as
             | Mongo, what with all its tiresome insistence on consistency
             | and reliability. I would, however, argue that quantity of
             | available fun isn't really a figure of merit here.
             | 
             | [1] https://www.postgresql.org/docs/10/functions-json.html
        
             | bjt wrote:
             | > It's not really that different than dumping your JSON in
             | a TEXT column
             | 
             | That was true of the initial "JSON" type support.
             | 
             | It is very much not true of the "JSONB" type, which was
             | added in 2014 as part of Postgres 9.4. JSONB uses a binary
             | serialization that supports efficiently selecting into JSON
             | documents, putting regular BTREE indexes on specific fields
             | inside the documents, or even putting Elasticsearch-like
             | inverted indexes on complete JSON documents.
        
             | IggleSniggle wrote:
             | This is so not true that it hurts. Postgres jsonb is highly
             | queryable.
        
       | naked-ferret wrote:
       | From the jepsen report:
       | 
       | """
       | 
       | Curiously, MongoDB omitted any mention of these findings in their
       | MongoDB and Jepsen page. Instead, that page discusses only
       | passing results, makes no mention of read or write concern,
       | buries the actual report in a footnote, and goes on to claim:
       | 
       | > MongoDB offers among the strongest data consistency,
       | correctness, and safety guarantees of any database available
       | today.
       | 
       | We encourage MongoDB to report Jepsen findings in context: while
       | MongoDB did appear to offer per-document linearizability and
       | causal consistency with the strongest settings, it also failed to
       | offer those properties in most configurations.
       | 
       | """
       | 
       | This is a really professional to tell someone to stop their
       | nonsense.
        
         | Thaxll wrote:
         | MySQL and PG are not truly consistent per default, they don't
         | fsync every writes.
         | 
         | MongoDB explains that pretty well: https://www.mongodb.com/faq
         | and https://docs.mongodb.com/manual/core/causal-consistency-
         | read...
        
           | wolf550e wrote:
           | The default in MySQL and in postgresql is to fsync before
           | commit and afaik that has always been the default.
        
           | castorp wrote:
           | > MySQL and PG are not truly consistent per default, they
           | don't fsync every writes.
           | 
           | Postgres most certainly does fsync by default.
           | 
           | It's tru, you can disable it, but there is a big warning
           | about "may corrupt your database" in the config file.
        
             | Thaxll wrote:
             | No PG does not fsync every writes, more details here:
             | https://dba.stackexchange.com/questions/254069/how-often-
             | doe...
             | 
             | My point is people complain about MongoDB are the one not
             | using it most likely, MongoDB is very different from 10
             | years ago.
             | 
             | I like to remind people that PG did not have an official
             | replication system 10years ago and as of today is still
             | behind MySQL. No DB is perfect, it's about tradeof.
        
               | wolf550e wrote:
               | > It writes out and syncs the accumulated WAL records at
               | each transaction commit, unless the committed transaction
               | touched only UNLOGGED or TEMP tables, or
               | synchronous_commit is turned off.
               | 
               | So wal is synced before commit returns, and if you power
               | cycle immediately after, the wal is played back and your
               | transaction is not lost? So it's fine?
               | 
               | It does not need to sync all writes, only the records
               | needed to play back the transaction after restart. This
               | is what all real databases do.
        
               | robocat wrote:
               | "PG writes out and syncs the accumulated WAL (=
               | Transaction log) records at each transaction commit
               | [snip] It also syncs at the end of each WAL file (16MB by
               | default). The wal_writer process also wakes up
               | occasionally and writes out and syncs the WAL."
               | 
               | So PG keeps data consistent by default - unlike MongoDB.
               | 
               | > MySQL and PG are not truly consistent per default, they
               | don't fsync every writes. MongoDB explains that pretty
               | well [links]
               | 
               | Where in those MongoDB doc links is there anything about
               | MySQL or PG?
        
         | foobarian wrote:
         | From top of linked article:
         | 
         | >>> I have to admit raising an eyebrow when I saw that web
         | page. In that report, MongoDB lost data and violated causal by
         | default. Somehow that became "among the strongest data
         | consistency, correctness, and safety guarantees of any database
         | available today"! <<<
         | 
         | It's not wrong, just misleading. Seems overblown given that
         | most practitioners know how to read this kind of marketing
         | speak.
        
           | takeda wrote:
           | > It's not wrong, just misleading. Seems overblown given that
           | most practitioners know how to read this kind of marketing
           | speak.
           | 
           | So basically whatever MongoDB was doing 10 years ago, they
           | are continuing to do there. They did not change at all,
           | yesterday or two days ago there were few people defending
           | mongo that indeed in early years mongo want the greatest, but
           | it is now and people should just stop being hang up in the
           | past.
           | 
           | The reason why people lost their trust with mongo wasn't
           | technical, it was this.
        
           | lostcolony wrote:
           | I appreciate your optimism in thinking that most (all?)
           | people reaching for distributed systems actually know enough
           | in the space to evaluate such claims.
        
             | takeda wrote:
             | Agree, and the "Mongo and Jepsen" page isn't targeting
             | distributed systems experts, most of them know to stay
             | away, because even if there are things that mongo does
             | right, other systems do it better.
        
       | NelsonMinar wrote:
       | I think it's remarkable this report has been out for a week now
       | and no one at MongoDB has commented on it. At least, not that I
       | have seen.
        
         | pengaru wrote:
         | Maybe they're too busy spending their MDB money.
         | 
         | https://www.google.com/search?q=NASDAQ:+MDB
        
           | threeseed wrote:
           | I genuinely am confused by comments like this.
           | 
           | Are companies not supposed to invest money into their
           | product, sales, people etc ?
           | 
           | And why does being listed on the NASDAQ imply being flush
           | with money ?
        
             | pengaru wrote:
             | > Are companies not supposed to invest money into their
             | product, sales, people etc ?
             | 
             | > And why does being listed on the NASDAQ imply being flush
             | with money ?
             | 
             | It was intended to be a playful reference to MDB's stock
             | price being on a tear right now, not simply being listed on
             | NASDAQ.
             | 
             | Expand the timeline on the graph to "Max", it's at an all
             | time high.
        
       | twoodfin wrote:
       | Discussed previously:
       | 
       | https://news.ycombinator.com/item?id=23191439
        
         | dang wrote:
         | Surprisingly, it seems not to have made the front page:
         | http://hnrankings.info/23191439/. There's clearly community
         | appetite to discuss this, so we won't treat the current
         | submission as a dupe.
        
           | kevinburke wrote:
           | " Did HN's antispam measures get a lot more aggressive
           | recently? The last handful of Jepsen reports have really
           | struggled to make it to frontpage, despite significantly
           | higher vote-to-age ratios than comparable posts. Once they're
           | on FP, they reliably hit top 10, but Dgraph's (1/2) "
           | https://twitter.com/jepsen_io/status/1261640852666855426
        
       | jpxw wrote:
       | Obligatory https://www.youtube.com/watch?v=b2F-DItXtZs
        
       | SmallPeePeeMan wrote:
       | MongoDB is a joke. Tried to use it on a project 3 years ago: it
       | consistently and repeatedly lost new data on non-replicated,
       | single-instance servers. I don't understand how anyone can use
       | it.
        
       ___________________________________________________________________
       (page generated 2020-05-23 23:00 UTC)