[HN Gopher] Things to know about databases ___________________________________________________________________ Things to know about databases Author : grech Score : 449 points Date : 2022-06-27 15:08 UTC (7 hours ago) (HTM) web link (architecturenotes.co) (TXT) w3m dump (architecturenotes.co) | thedougd wrote: | I have to plug the "Designing Data-Intensive Applications" book. | It dives deep into the inner workings of various database | architectures. | | https://dataintensive.net/ | SulphurSmell wrote: | This article is informative. I have found that databases in | general tend to be less sexy than the front-end apps...especially | with the recent cohort of devs. As an old bastard, I would pass | on one thing: Realize that any reasonably used database will | likely outlast the applications leveraging it. This is especially | true the bigger it gets, and the longer it stays in production. | That said, if you are influencing the design of a database, | imagine years later what someone looking at it might want to know | if having to rip all the data out into some other store. Having | migrated many legacy systems, I tend to sleep better when I know | the data is well-structured and easy to normalize. In those | cases, I really don't care so much about the apps. If I can sort | out (haha) the data, I worry less about the new apps I need to | design. I have been known to bury documentation into for-purpose | tables...that way I know that info won't be lost. Export the | schema regularly, version it, check it in somewhere. And, if you | can, _please_ , limit the use of anything that can hold a NULL. | Not every RDBMS handles NULL the same way. Big old databases live | a looooong time. | emerongi wrote: | > Realize that any reasonably used database will likely outlast | the applications leveraging it. | | I love this statement. It's true too, having seen a decades-old | database that needed to be converted to Postgres. The old | application was going to be thrown away, but the data was still | relevant :). | Yhippa wrote: | I think this is and will continue to be a common use case. | I'm very thankful for these applications that the data was | still stuck in a crusty old relational database for me to | work on top of as I built a new application. | | It's going to be interesting when this same problem occurs | years from now when people are trying to reverse schemas from | NoSQL databases or if they become difficult to extract. | | The only sticking point is when business logic is put into | stored procedures. On one hand if you're building an app on | top of it, there's a temptation to extract and optimize that | logic in your new back-end. On the other hand, it is kind of | nice to even have it at all should the legacy app go _poof_. | SulphurSmell wrote: | >business logic is put into stored procedures | | This is a double-edged sword. I have seen _massive_ | business logic baked into stored procedures...so much so, | that the applications themselves are rather slim. If this | stored procedure code is properly versioned and otherwise | managed, this is not entirely bad. If the data model is | sound, I don 't worry that much...stored procs vs 100KLOC | of Java code? I can tell you what is easier to migrate. The | other side of it is that stored procedures can also serve | as an API into the actual database. I built a system once | (90's) where the developers never accessed the actual | database tables directly. They always called stored | procedures. The advantage here was that we could tune the | snot out of the database without a code | change/build/deploy. It also allowed the DBA some control | over poorly written, runaway queries. YMMV. I think today I | probably would try to keep the database as vanilla as | possible. | richardw wrote: | Also an old timer. I've gone from mostly complex app and | no DB logic to mostly heavily optimised DB and lots of | procs. They protect the DB, make carefully crafted | interfaces available and allow changes to the | implementation. | | Except for eg infinitely scalable cloud data stores like | Google's, or for ML where it's just massive data and you | need a dumb-ish store of many GB of parquet. | SulphurSmell wrote: | >protect the DB | | I share this sentiment. The apps will come and go, the | real value is in the data. If the database can cover its | ass, I am less concerned about ham-fisted developers | randomly messing it up with ill-conceived DML. It's not | that they are malicious...it just happens. I have seen | devs that code in Erlang, Haskell and even Assembly...run | in terror at doing some SQL. It's weird. Trust but | verify. And hire persnickity, passionate DBAs. | evilduck wrote: | About a decade ago I worked for an insurance company. It was | an offshoot that was spun out of of another insurance company | from another state, which itself was decades old. As best as | I could infer from my vantage point, my expertise at the | time, and the spare time I was willing to investigate the | matter, the database schema and a good chunk of the core data | tables were first created in the late-80s on a mainframe and | had outlived 4 or 5 application rewrites and (at least) two | SQL variant migrations. I'm hand-waving exact details because | _nobody_ from the original company or that time period was | still around even prior to the corporate split and so there | was nobody who could answer history questions in detail, but | that 's also a testament to how persistent data can be. There | was one developer from the parent company they slapped with | golden handcuffs who knew where most of the bodies were hid | in that software stack that enabled decent productivity but | even she was lacking a solid 15 years of first-hand | experience of its inception. To the best of my knowledge that | database is still in use today. | | Databases in heavy use will not just outlast your | application, they have a strong chance of outlasting your | career and they very well may outlast you as a person. | SulphurSmell wrote: | Are you me? LOL | YetAnotherNick wrote: | > I have found that databases in general tend to be less sexy | than the front-end apps | | I don't know if there is a single soul who believes this. If | you are designing a database, it is much more cooler than front | end apps. | SulphurSmell wrote: | I think they are wonderful (from the Codd and Date days...) | but mostly everyone else disagrees. | mmcnl wrote: | "Show me your flowchart and conceal your tables, and I shall | continue to be mystified. Show me your tables, and I won't | usually need your flowchart; it'll be obvious." -- Fred Brooks, | The Mythical Man Month (1975) | SulphurSmell wrote: | This is going on my wall. Thanks so much. | irrational wrote: | The NULL issue is so true. We migrated a large database from | Oracle to Postgres. It took 2 years. By far and away the | biggest issue was rewriting queries to account for the | (correct) way Postgres handles NULLs versus how Oracle does it. | | Also, in my experience, the database is almost always the main | cause of any performance issues. I would much rather hire | someone who is very good at making the database perform well | than making the front end perform well. If you are seeking to | be a full stack developer, devote much more time to the | database layer than anything else. | SulphurSmell wrote: | >the database is almost always the main cause of any | performance issues | | I would be careful with the term "cause". There is a | symbiotic relationship between the application and the | database. Or, if talking to a DBA...a database and its | applications. Most databases can store any sets of arbitrary | information...but _how_ they are stored (read: structure) | must take into account how the data is to be used. When the | database designer can be told up-front (by the app dev team) | considerations can be made to optimize performance along | whatever vector is most desired (e.g. read speed, write | speed, consistency, concurrency, etc). Most database | performance issues result when these considerations are left | out. Related: Just because a query works (ie. returns the | right data) does not mean it 's the _best_ query. | hodgesrm wrote: | The article left out one of the most fundamantal topics of | databases--clustering of data in storage is everything. | Examples: | | 1. If you store data in rows it's quite fast to | insert/update/delete individual rows. Moreover, it's easy to do | it concurrently. However reads can be very slow because you | read the entire table if you scan a single column. That's why | OLAP databases use column storage. | | 2. If you sort insert data in the table, reading ranges based | on the sort key(s) is very fast. On the other hand inserts may | spray data over over the entire table, (eventually) forcing | writes to all blocks, which is very slow. That's why many OLTP | databases use heap (unsorted) row organization. | | In small databases you don't notice the differences, but they | become dominant as volume increases. I believe this fact alone | explains a lot of the proliferation of DBMS types as enterprise | datasest have grown larger. | | Edit: minor clarification | beckingz wrote: | I heard about Flywaydb today, which appears to be an open | source database versioning tool. Pretty interesting! | https://flywaydb.org/ | vladsanchez wrote: | Pretty open-source, until you need "premium" features like | "rollback" :/ (headwall) | mjb wrote: | Introductory material is always welcome, but I suspect this isn't | going to hit the target for most people. For example: | | > Therefore, if the price isn't an issue, SSDs are a better | option -- especially since modern SSDs are just about as reliable | as HDDs | | This needs a tiny extra bit of detail: if you're buying random IO | (IOPS) or throughput (MB/s), SSDs are significantly (orders of | magnitude!) cheaper than HDDs. HDDs are only cheaper on space, | and only if your need for throughput or IO doesn't cause you to | "strand" space. | | > Consistency can be understood after a successful write, update, | or delete of a row. Any read request immediately receives the | latest value of the row. | | This isn't the ACID definition of C, and is closer to the | distributed systems (CAP) one. I can't fault the article for | getting this wrong, though - it's super confusing! | googletron wrote: | You are absolutely right about the C being more inline with CAP | one. | | I have a post in draft to discuss disk trade offs which digs | into this aspect, its impossible to dig into everything in this | level of a post. | yla92 wrote: | Great post. Also highly recommend Designing Data-Intensive | Applications by Martin Kleppmann | (https://www.amazon.com/Designing-Data-Intensive- | Applications...). The sections on "Storage and Retrieval", | "Replication", "Partitioning" and "Transactions" really opened up | my eyes! | lysecret wrote: | Absolutely loved the book. Can someone recommend similar books? | wombatpm wrote: | Database Design for Mere Mortals by Ray Hernandez | avinassh wrote: | Database Internals is also pretty good. | skrtskrt wrote: | Seconding Database Internals - it's not just about | "Internals of a database", as part 2 gets nitty gritty with | the general problems of distributed systems, consensus, | consistency, availability, etc. etc. | dangets wrote: | I have not read it personally, but I've seen 'How Query | Engines Work' highly recommended several times before. I have | a procrasinatory tab open to check it out some day. | | https://leanpub.com/how-query-engines-work | itsmemattchung wrote: | Second this. | | I really like how he (Martin Kelppman) in the book starts with | a primitive data structure for constructing a database design, | and then evolves the system slowly and describes the various | trade offs with building a database from the ground up. | bironran wrote: | Nice post, though for the indexing "introduction-deep-dive" I | would still recommend newbies to look at https://use-the-index- | luke.com/ . | konfusinomicon wrote: | also check out rick james's mysql documents | http://mysql.rjweb.org/ | | I send those 2 links to coworkers all the time | googletron wrote: | Great resource! I have it linked as a reference! | sonofacorner wrote: | This is great. Thanks for sharing! | jwr wrote: | Some of the explanations are questionable: I think they were | overly simplified, and while I applaud the goal, some things just | aren't that simple. | | I highly recommend reading https://jepsen.io/consistency and | clicking on each model on the map. This is the best resource I | found so far for understanding databases, especially distributed | ones. | googletron wrote: | I would love the feedback, what was questionable? striking the | balance is tough. jepsen's content is great. | gumby wrote: | Everyone can disagree on what is the precise place to slice | "this is beginner content" from "this is almost-beginner | content". I could stick my own oar in in this regard but I | won't. | | I think your level of abstraction is quite good for the | absolute "what on earth are people talking about when they | use that 'database' word?". With an extremely high level | understanding, when they encounter more detail they'll have a | "place to put it". | Diggsey wrote: | One thing that can be surprising is that for "REPEATABLE | READ", not all "reads" are actually repeatable. | | There are at least two ways (that I'm aware of) that this can | be violated. For example, if you run an update statement like | this: UPDATE foo SET bar = bar + 1 | | Then the read of "bar" will always use the latest value, | which may be different from the value other statements in the | same transaction saw. | kerblang wrote: | Not sure what you're claiming here... | | Repeatable read isolation creates read locks so that other | transactions cannot write to those records. Of course our | own transaction has to first wait for outstanding writes to | those records to commit before starting. | | Best as I know the goal is not to prevent one's own | transaction from updating the records we read; the read | locks will just get upgraded to write locks. | petergeoghegan wrote: | > Some of the explanations are questionable: I think they were | overly simplified, and while I applaud the goal, some things | just aren't that simple. | | I am an expert on the subject matter, and I don't think that | the overall approach is questionable. The approach that the | author took seems fine to me. | | The definition of certain basic concepts like 'consistency' is | even confusing to experts at times. This is made all the more | confusing by introducing concepts from the distributed systems | world, where consistency is often understood to mean something | else. | | Here's an example of that that I'm familiar with, where an | expert admits to confusion about the basic definition of | consistency in the sense that it appears in ACID: | | https://queue.acm.org/detail.cfm?id=3469647 | | This is a person that is a longtime peer of the people that | invented the concepts! | | Not trying to rigorously define these things makes a great deal | of sense in the context of a high level overview. Getting the | general idea across is far more important. | jandrewrogers wrote: | > "Scale of data often works against you, and balanced trees are | the first tool in your arsenal against it." | | An ironic caveat to this is that balanced trees don't scale well, | only offering good performance across a relatively narrow range | of data size. This is a side-effect of being "balanced", which | necessarily limits both compactness and concurrency. | | That said, concurrent B+trees are an absolute classic and provide | important historical context for the tradeoffs inherent in | indexing. Modern hardware has evolved to the point where B+trees | will often offer disappointing results, so their use in indexing | has dwindled with time. | hashmash wrote: | What kinds of indexing structures are used instead, and how do | they differ from B+trees? Do you have examples of which | relational databases have replaced B+tree indexes? | jandrewrogers wrote: | The property being optimized for, relative to B+trees, is | extreme compactness of representation. In the pantheon of | possible indexing algorithms, B+trees are pretty far on the | bloated end of the spectrum in terms of the ratio between | data space and index space. All indexes have a scaling | performance cliff due to the index structure filling up and | eventually overflowing available cache, crowding out the data | and forcing page faults for almost every index write. In | B+tree indexes this happens relatively early and often. | | Radically improving index compactness is achieved by | loosening design constraints on B+trees: the indexes | represent a partial order which only converges on a total | order at the limit and the search structure is unbalanced. In | the abstract these appear slightly less efficient but it | enables the use of selectivity-maximizing succinct | representations of the key space that can get pretty close to | the information theoretic limits. Scalability gains result | from the radical reduction in cache footprint when | represented this way. | | Optimal compressive indexes are not computable (being | equivalent to AI), so the efficient approximation strategies | people come up with tend to be diverse, colorful, and | sometimes impractical. Tangentially, some flavors have | excellent write performance. It is not a trivial algorithm | problem but there are a few design families that generalize | well to real databases engines. I wouldn't describe this as a | fully solved problem but many ordinary cases are covered. | | There isn't much incentive to design a relational database | engine that can use these types of indexes, since the types | of workloads and data models that recommend them usually | aren't relational. Someone could, there just isn't much | incentive. It is more _de rigueur_ for graph, spatiotemporal, | and some types of analytical databases, where there is no | other practical option if scalability matters at all. | xmprt wrote: | I know Clickhouse uses MergeTrees which are different from | B+trees. However it can't really be used as an RDBMS. It's | especially bad at point reads. | | https://en.wikipedia.org/wiki/Log-structured_merge-tree | mikeklaas wrote: | There are projects that use LSMTs as the storage engine for | RDBMS' (like RocksDB); I'm not sure it's accurate to say | "they can't be use as an RDMBS". | petergeoghegan wrote: | It's definitely possible, and can make a lot of sense -- | MyRocks/RocksDB for MySQL seems like an interesting and | well designed system to me. It is fairly natural to | compare MyRocks to InnoDB, since they're both MySQL | storage engines. That kind of comparison is usually far | more useful than an abstract comparison that ignores the | practicalities of concurrency control and recovery. | | The fact that MyRocks doesn't use B+Trees seems like half | the story. Less than half, even. The really important | difference between MyRocks and InnoDB is that MyRocks | uses log-structured storage (one LSM tree for | everything), while InnoDB uses a traditional write-ahead | log with checkpoints, and with logical UNDO. There are | multiple dimensions to optimize here, not just a single | dimension. Focusing only on time/speed is much too | reductive. In fact, Facebook themselves have said that | they didn't set out to improve performance as such by | adopting MyRocks. The actual goal was price/performance, | particularly better write amplification and space | amplification. | petergeoghegan wrote: | > Modern hardware has evolved to the point where B+trees will | often offer disappointing results, so their use in indexing has | dwindled with time. | | This is pure nonsense. B+Trees are used extensively and by | default by 5 out of 5 of the top database systems, according to | db-engines.com. | jandrewrogers wrote: | You don't actually address the point. | | If your database engine is an old design or your data is | small by modern standards, then a B+tree will be one of the | few indexing algorithms available and if the data is small it | will probably work. Modern database kernels targeting modern | hardware and storage densities typically aren't using B+trees | and the reasons why are well-understood. No one with any | sense is using a B+tree to index e.g. a trillion records, | which is a pretty ordinary thing to do on a single server in | 2022. | | You can't just swap out indexing architectures due to their | dependency on storage engine and scheduling behavior, so | older databases like PostgreSQL will be using B+trees for the | indefinite future even if suboptimal. | | The transition away from B+tree based architectures in new | databases engines started about 10-15 years ago. Back then I | used them ubiquitously but I honestly don't remember the last | time I've seen one in a new design. | funcDropShadow wrote: | > so older databases like PostgreSQL will be using B+trees | for the indefinite future even if suboptimal. | | PostgreSQL 14 comes with 6 builtin index types[1]: B-tree, | Gist, SP-Gist, Gin, Brin, and Hash. More can be plugged in | as extensions. | | [1]: Chapters 63-69 of | https://www.postgresql.org/docs/14/internals.html | | Edited: Fixed the link to version 14. | petergeoghegan wrote: | > You don't actually address the point. | | You said that B-Trees "use in indexing has dwindled with | time". This is demonstrably false. | | > Back then I used them ubiquitously but I honestly don't | remember the last time I've seen one in a new design. | | Even if that was true (which it definitely isn't), why | would anybody judge the commercial or scientific relevance | of B-Trees by looking at what new systems do? There are | very few new systems that are intended to be competitive as | general purpose systems, which is where most of the market | is. | | You still haven't actually named a single example of a | "modern database kernel" that exemplifies what you're | talking about. | jandrewrogers wrote: | You are understating the limitations of B+trees for real | workloads. A common and growing problem is the lack of | online indexing that scales, the particularly data model | doesn't matter that much. Index construction throughput | and scaling has been a serious problem at some pretty | boring companies I've done work for. | | Use of B+trees in _new_ database kernels has definitely | diminished. I 'm not counting the installed base of | SQLite etc. Ubiquity doesn't make something the pinnacle | of technology -- just as often it means "legacy installed | base". I still use PostgreSQL a lot and mod it when I | need to but I am not under any illusions about its | limitations. | | A "modern" database kernel that can efficiently use | modern hardware is going to be a thread-per-core | architecture with all I/O and execution scheduling done | in user space, and the ability to operate on modern | storage densities found on database servers, which can | exceed a petabyte of direct-attached storage. The | implications of storage density and its interaction with | indexing drive most of the real changes in the way | database kernels are designed. You can find elements of | this in open source, but mostly in big data platforms | rather than proper database engines. | | That said, no one builds new high-end databases for | retail anymore, the economics don't make sense. All the | money moved to more specialized implementations that | cater to smaller audiences where you don't need to | advertise. The kernels are fully general, and widely | reused, but the interfaces and surrounding bits are | purpose-built for particular workloads. Hell, my old | storage engines are still used under license by that lot. | The days of database billboards on the 101 are an | anachronism. | petergeoghegan wrote: | > You are understating the limitations of B+trees for | real workloads. | | I never said anything about workloads. All I said was | that your statements about B+Trees having dwindling usage | are clearly false. | | If you make a claim that is self-evidently bogus, then | you shouldn't expect anything else that you may have said | at the same time to be taken seriously. | hashmash wrote: | Can you provide an explicitly named "modern" database | that doesn't use B+tree indexes, and what specifically | does it use instead? | jiggawatts wrote: | You keep talking about how B-Trees are rarely used but | I've seen relatively new systems deployed that use them | (or some minor variation). FoundationDB, FASTER, and a | few others. | | Other than in-memory hash indexing as used by SAP HANA, | I'm not aware of any other data structures anywhere near | as popular for database engines. | | Can you name the data structure(s) that have superseded | these? | heisjustsosmart wrote: | Read this guy's past posts, you will save yourself a lot | of time. He does a lot of this sort of thing. | jandrewrogers wrote: | ...says the person who is confidently oblivious to the | problems of mixing virtual methods and direct paged | memory. | petergeoghegan wrote: | Thanks for the tip | trhoad wrote: | An interesting subject! The article could do with an edit, | however. There are lots of grammatical errors. | AtNightWeCode wrote: | Not sure how to use these recommendations in practice though even | if the info is somewhat correct. SQL is a beast of tech and it is | used because of battle history and since there is simply no other | viable tech replacing it when it comes to transactions and | aggregated queries. | | Indexes are a nightmare to get right. Often performance | optimizations of SQL databases include removing indexes as much | as adding indexes. | vorpalhex wrote: | It's not that SQL is all that beastly, it's that most tutorials | fail to explain the internals and basics and so you just see | all these features and interfaces of the system and can't build | a mental model of how the system works. | AtNightWeCode wrote: | Well, SQL does come with liberties. I worked with expensive | commercial software that destroys the performance of | databases by doing everything from complicated ad hoc queries | to massive amounts of point reads. | larrik wrote: | Indexes aren't a "make my DB faster" magic wand. They have | benefits and costs. | | If you are seeing performance gains from _removing_ indexes, | then I 'm assuming your workload is very heavy on | writes/updates compared to reads. | AtNightWeCode wrote: | Mostly because of overlapping indexes. Then if there are | include columns it may get out of hand. Not too difficult to | achieve. Just blindly follow recommendations from a tool or a | cloud service. | roflyear wrote: | Or you're using MySQL ;) | dspillett wrote: | Too many indexes can cause significant performance problems | if RAM is short. If the indexes are actually used (rather | than sitting idle on disk because other indexes are better | choices for all your applications' typical queries) then they | will "compete" for memory potentially causing a cache | thrashing situation. | | But yes, the issue with too many indexes is more often that | they harm write performance. | | A related issue is indexes that are too wide, either covering | many columns or "including" them. As well as eating disk | space they also eat extra memory (and potentially cause extra | IO load) when used (less rows per page, so more pages loaded | into RAM for the same query). | | Both problems together, too many indexes many of which are | too wide, usually comes from blindly accepting | recommendations from automated tools (particularly when they | are right that there is a problem, and it is a problem that a | given index may solve, but fixing the queries so existing | indexes are useful could have a much greater effect than | adding the indexes). | manish_gill wrote: | What tool was used to create the visuals? | donatj wrote: | I still think about my first job out of college. Shopping cart | application, we would add indexes exclusively _when there was a | problem_ rather than proactively based on expected usage | patterns. It 's genuinely a testament to MySQL that we got as far | as we did without knowing anything about what we were doing. | | One of my most popular StackOverflow questions to this day is | about how to handle _one million_ rows in a single MySQL table ( | _shudder_ ). | | The product I work on now collects more rows than that _a day_ in | a number of tables. | Merad wrote: | > a dirty read occurs when you perform a read, and another | transaction updates the same row but doesn't commit the work, you | perform another read, and you can access the uncommitted (dirty) | value | | It's even worse than this with MS SQL Server. When using the READ | UNCOMMITTED isolation level it's actually possible to read | corrupted data, e.g. you might read a string while it's being | updated, so the result row you get contains a mix of the old | value and new value of the column. SQL Server essentially does | the "we got a badass over here" Neil deGrasse Tyson meme and | throws data at you as fast as it can. Unfortunately I've worked | on several projects where someone apparently thought that READ | UNCOMMITTED was a magic "go fast" button for SQL and used it all | throughout the app. | jiggawatts wrote: | I really wish SERIALIZABLE was the default transaction | isolation level and anything lower was opt in... with warnings. | hodgesrm wrote: | SERIALIZABLE is ridiculously slow if you have any level of | concurrency in your app. READ COMMITTED is a reasonable | default in general. The behavior GP is describing sounds like | an out and out bug. | | Dirty reads incidentally weren't supported for quite some | time in the Sybase architecture (which forked to MS SQL | Server in 1992). There was a Sybase effort to add dirty read | support around 1995 or so. The project name was "Lolita." | throwaway787544 wrote: | Can anyone give me a brief understanding of stored procedures and | when I should use them? | tiffanyh wrote: | #1 thing you should know, RDBMS can solve pretty much every data | storage/retrieval problem you have. | | If you're choosing something other than an RDBMS - you should | rethink why. | | Because unless you're at massive scale (which still doesn't | justify it), choosing something else is rarely the right | decision. | VirusNewbie wrote: | It is very frustrating to work with engineers who don't | understand the nuances of RDBMS and assume they can solve all | the things. The small company I work for has 3B rows. We have a | high write volume. _Can_ you use an RDBMs database to solve | this? Sure, but it would be a terrible waste of engineering | effort. | qaq wrote: | ? as opposed to buggy reimplementation of subset of RDBMS | functionality on the Application side? | googletron wrote: | Good point. Its often the problem space and other constraints | that usually drive these decisions. Its important that you deal | with problems when you have them. | throwamon wrote: | Isn't this like saying you can solve every programming problem | you have with <insert your favorite Turing-complete language | here>? Of course you can, but aren't there _any_ cases where | the tradeoffs outweigh the benefits, even if it 's about | something selfish like ergonomics or, dare I say, fun? | _the_inflator wrote: | I agree with you. However, conversely I don't see anything | that proves him wrong. Databases are not like programming | languages. There is a reason why we don't use punch cards | anymore. | w0m wrote: | Is 'Not performance bound, and dot knowing the future shape of | your data' a valid reason? Less overhead on initial rollout to | just Toss it up there. | | > choosing something else is rarely the right decision | | I think this is a little bit of a 'We always did it this way' | statement. | xmprt wrote: | I think the tradeoff is similar to using a weakly typed vs | strongly typed language. Strong typing is more up front | effort but it will save you down the line because it's more | predictable. Similarly, an RDBMS will require more up front | planning and design and regular maintenance but that extra | planning will save you more time down the line. | adrianmonk wrote: | > _future shape of your data_ | | Contrary to what people seem to assume, you actually can | change the schema of a database and migrate the existing data | to the new schema. There's a learning curve, but it's doable. | | If you go schema-less, you run into another problem: not | knowing the _past_ shape of your data. When you try to load | old records (from previous years), you may find that they don | 't look like the ones you wrote recently. And, if your code | was changed, it may fail to handle them. | | This makes it hard to safely change code that handles stored | data. You can avoid changing that code, you can accept | breakage, or you can do a deep-dive research project before | making a change. | | If you have a schema, you have a contract about what the data | looks like, and you can have guarantees that it follows that | contract. | grogers wrote: | Maintaining backwards compatibility for reading old records | in code is not hard. You can always rewrite all rows to the | newer format if you want to remove the conpat code, or if | the structure changes in an incompatible way. It's pretty | comparable to what you have to do to evolve the code/schema | safely together. | | Having schema is much better for ad-hoc queries though, | doubly so if your schemaless types aren't JSON (e.g. | protobufs). | alecfong wrote: | I find myself forced to model access pattern when choosing | non relational dbs. This often results in a much less | flexible model if you didn't put a lot of thought into it. | Ymmv | gryn wrote: | as long as you're willing to write a whole lots of data | validation, constraint checking scripts by hand in the | future, ETL scripts for non-trivial analytical queries | (depending on what NoSQL you chose, but if you chose it for | perf this one is usual a price you have to pay). and keep a | very rigorous track of the conceptual model of your data | somewhere else, or simply don't care about its consistency | when different parts of your not-schema have contradicting | data (at that point why are you even storing it?) | | and that you ruled out using a JSON string column(s) as a | dump for the uncertain parts, de-normalization and indexing, | and the EAV schema as potential solutions to your problems. | | the point is noting is free, and you have to be sure it's a | price your willing to pay. | | are you ready to give up joins ?, have your data be modeled | after the exact queries your going to make ?, for you data to | be duplicated across many places ? etc ... | emerongi wrote: | With Postgres, you can always just have a JSONB column for | data whose shape you're unsure of. Personally, I'd rather | start with Postgres and dump data into there and retain the | powers of RDBMS for the future, rather than the other way | around and end up finding out that I really would like to | have features that come out of the box with relational | databases. | | I think a valid reason for not choosing a relational database | is if your business plan requires that you grow to be a | $100B+ company with hundreds of millions of users. Otherwise, | you will probably be fine with RDBMS, even if it will require | some optimizing in the future. | packetlost wrote: | Postgres' JSON implementation perfectly adheres to the JSON | spec, which actually sucks if you need to support things | like NaNs, Inf, etc. It's a good option, but it doesn't | work for all datasets. | slaymaker1907 wrote: | Wow, TIL that NaN isn't valid JSON. | mslot wrote: | Most $100B+ companies (e.g. Google, Meta, Amazon) were | built primarily using relational databases for the first | 10-15 years. | qaq wrote: | No longer an issue with things like Spanner, CockroachDB | etc | dspillett wrote: | There are circumstances where you really don't know the shape | of the data, especially when prototyping for proof of concept | purposes, but usually not understanding the shape of your | data is something that you should fix up-front as it | indicates you don't actually understand the problem you are | trying to solve. | | More often than not it is worth sometime thinking and | planning to work out at least the core requirements in that | area, to save yourself a lot of refactoring (or throwing away | and restarting) later, and potentially hitting bugs in | production that a relational DB with well-defined constraints | could have saved you from while still in dev. | | Programming is brilliant. Many weeks of it sometimes save you | whole hours of up-front design work. | spmurrayzzz wrote: | > usually not understanding the shape of your data is | something that you should fix up-front as it indicates you | don't actually understand the problem you are trying to | solve. | | This is a good point and probably correct often enough, but | I also think not understanding the entire problem you are | solving is not only common, but in fact necessary to most | early-stage velocity. There is need to iterate and adapt | frequently, sometimes as part of your go-to-market | strategy, in order to fully understand the problem space. | | > a relational DB with well-defined constraints could have | saved you from while still in dev | | This presumes that systems built on top of non-RDBMS are | incapable of enforcing similar constraints. This has not | been my experience personally. But its possible I don't | understand your meaning of constraints in this context. I | assumed it to mean, for instance, something like schemas | which are fairly common now in the nosql world. Curious | what other constraints were you referencing? | marcosdumay wrote: | > This presumes that systems built on top of non-RDBMS | are incapable of enforcing similar constraints. | | Are you kidding? They never can. | | The entire point of ditching the relational model is | discarding data constraints and normalization. | spmurrayzzz wrote: | Never? Many NoSQL stores are offering parity in many of | the feature verticals that were historically the sole | domain of RDBMS. | | Mongo has always had semantics to support normalized data | modeling, has schema support, and has had distributed | multi-document ACID transactions since 2019 [1]. You | don't have to use those features as they're opt-in, but | they're there. | | I know that full parity between the two isn't feasible, | but to say they never can is a mischaracterization. | | [1] Small addendum on this: Jepsen highlighted some | issues with their implementation of snapshot isolation | and some rightful gripes about poor default config | settings and wonky API (you need to specify snapshot read | concern on all queries in conjunction with majority write | concern, which isnt highlighted in some docs). But with | the right config, their only throat clearing was whether | snapshot isolation was "full ACID", which would apply to | postgres as well given they use the same model. | funcDropShadow wrote: | What is the point of using MongoDB with multi-document | ACID transactions? Enabling durability in MongoDB is | usually costly enough that you can't find a performance | benefit compared to Postgres. With JSONB support in | PostgreSQL, I dare say, it can express anything that | MongoDB can express with its data model and query | language. That leaves scalability as the only possible | advantage of MongoDB compared to PostgreSQL. And the | scalability of MongoDB is rather restrictive, compared to | e.g. Cassandra. | | And I would never trust a database that has such a bad | track record, regarding durability as MongoDB, although I | admit that PostgreSQL had theoretical problems there as | well in the past. | spmurrayzzz wrote: | I actually agree with you on the point about multi- | document tx's, I wouldn't choose mongo solely _for_ that | feature. Its nice to have maybe for the niche use case in | your nosql workload for when its beneficial. But the | point I was originally making was that nosql stacks are | not fundamentally incompatible with most of the features | or safety constraints offered by other RDBMS. | | > And I would never trust a database that has such a bad | track record, regarding durability as MongoDB | | I can't comment on your own experience obviously, but | I've been using mongo since 2011 in high throughput | distributed systems and it's been mostly great (one of my | current systems averages ~38 million docs per minute, | operating currently at 5 9s of uptime). | | Definitely some pain points initially in the transition | to WiredTiger, but that largely was a positive move for | the stack as a whole. Durability fires have not plagued | my experience thus far, not to say they won't in the | future of course. | | As you noted, Postgres has had its own headaches as well. | Finding out that all their literature claiming their | transactions were serializable when they were in fact | _not_ serializable could be considered a mar on their | record. But much like mongo, they have been quick to | address implementation bugs as they are found. | funcDropShadow wrote: | > I can't comment on your own experience obviously, but | I've been using mongo since 2011 in high throughput | distributed systems and it's been mostly great (one of my | current systems averages ~38 million docs per minute, | operating currently at 5 9s of uptime). | | > Definitely some pain points initially in the transition | to WiredTiger, but that largely was a positive move for | the stack as a whole. Durability fires have not plagued | my experience thus far, not to say they won't in the | future of course. | | Good to read that some are actually using MongoDB to | their benefit. Indeed I have encountered problems with | durability in the wild. Nothing, that I would like to | repeat. But as always for a specific use case the answer | is: it depends. For a general advice with what to start a | new project I would select PostgreSQL in 10 out of 10 | cases, if a database server is actually required. | dspillett wrote: | _> schemas_ | | The confusion there is due to the fact that non-R-DBMS | (particular when referred to as noSQL) can mean several | different things. | | In this context I was replying to a comment about not | knowing the shape of your data which implies that person | was thinking about solutions that are specifically | described as schemaless, which is what a lot of people | assume (in my experience) if you say non-relational or | noSQL. | | That is the sort of constraints I was meaning: primary & | unique keys and foreign keys for enforcing referential | integrity and other validity rules enforced at the | storage level. There are times when you can't enforce | these things immediately with good performance | (significantly distributed data stores that need | concurrent distributed writes for instance - but the need | for those is less common for most developers than the big | data hype salespeople might have you believe) so then you | have to consider letting go of them (I would advise | considering it very carefully). | happimess wrote: | > There is need to iterate and adapt frequently, | sometimes as part of your go-to-market strategy, in order | to fully understand the problem space. | | If you're pivoting so hard that your SQL schema breaks, | how is a schemaless system going to help you? You'll | still have to either throw out your old data (easy in | both cases) or figure out a way to map old records onto | new semantics (hard in both cases). | | I agree with GP that this is a central problem to solve, | not something to figure out _after_ you write software. | Build your house on stone. | spmurrayzzz wrote: | >If you're pivoting so hard that your SQL schema breaks, | how is a schemaless system going to help you? You'll | still have to either throw out your old data (easy in | both cases) or figure out a way to map old records onto | new semantics (hard in both cases). | | I agree with your comment that it's a central problem to | solve and that both options, throwing out data or map old | records onto new semantics, is an endemic choice both | stacks need to make. I don't agree that it's always | possible to solve entirely up front though. | | In my experience, it has been less so about whether the | storage engine is schemaless or not, even many modern | nosql stacks now ship with schemas (e.g. MongoDB). I | think the differentiation I make between these platforms | is mostly around APIs. Expressive, flexible semantics | that (in theory) let you move quickly. | | As an aside, I also think the differentiation between all | these systems is largely unimpactful for most software | engineers. And the choice often made is one of | qualitative/subjective analysis of dev ergonomics etc. At | scale there are certainly implementation details that | begin to disproportionately impact the way you write | software, sometimes prohibitively so, but most folks | aren't in that territory. | funcDropShadow wrote: | Admittedly, my experience with MongoDB and Cassandra has | gained some rust over the last decade, but what makes you | say such nosql databases have expressive APIs? Compared | to PostgreSQL they have miniscule query languages and it | is very hard, if at all possible, to express constraints. | And constraints, sometimes self-imposed sometimes not, | are what makes projects successful, even startups. Many | startups try to find this one little niche they can | dominate. That is a self-imposed constraint. People tend | to think freedom makes them creative, productive, and | inventive, while in fact the opposite is the truth. With | opposite of freedom I mean carefully selected constraints | not oppression. | roflyear wrote: | No absolutely not. 1 hr spent making a schema and a few hours | of migrations is way less than the headaches you'll have by | going nosql first. | marcosdumay wrote: | > Is 'Not performance bound, and dot knowing the future shape | of your data' a valid reason? | | That's a very good reason for going with a RDBMS even if | looks like it's not the clearest winner for your use case. | | If you invert any of those conditions, it may become | interesting to study alternatives. | jpdb wrote: | Hard disagree. The operational overhead of RDBMS and | specifically their inherent reliance on a single primary node | makes them, in my opinion, a bad place to start your | architecture. | | I want to be able to treat the servers in my database tier as | cattle instead of pets and RDBMSs don't fit this paradigm well. | Either NoSQL or NewSQL databases are, in my opinion, a much | better place to start. | | I feel like RDBMSs being the "default" option is because most | people have worked with them in the past and already understand | them. It doesn't mean they are the best tool for the job or | even the tool most likely to solve the unknown problems you'll | encounter in the future. | roflyear wrote: | What a joke. | imachine1980_ wrote: | You can have multi tb postgree database, that are fast and | usable whit limited number of cache layers for speed, but you | probably don need it. mediums migrate from single postress in | 2020. | funcDropShadow wrote: | I don't understand why this comment is down voted. I've | been part of a project that uses a dozen different | PostgreSQL databases from different services, one of those | databases is multi TB with an OLAP usage patterns. And it | beats performance-wise all the previous attempts to solve | that specific problem. | anonymousDan wrote: | What is your go to NewSQL database these days (and why) out | of interest? | hhjinks wrote: | Only once have I worked on a project where a document | database did not completely gimp our ability to deliver the | data that was required of us, and that was only because that | data was regularly cloned to a relational database we could | use for asynchronous tasks. As a project grows, I have, | without fail, come to find that you need relations to | efficiently deliver the data that new requirements demand. | corpMaverick wrote: | Similarly. People don't use Object Modeling/Entity relation- | ship diagrams anymore. | | Every day, I see people struggling with problems that would be | easy to understand if you had one. You don't even need to have | an RDBMs. They are good just to model how things are related to | each other. | aoms wrote: | You're speaking my language. After more than 20 years of custom | software dev, this statement has so much merit. | jimnotgym wrote: | The other day I said to a junior dev, when you started planning | a locking scheme to handle concurrency in you file based system | it is time to swap to a db | roflyear wrote: | Very frequently polled queues come to mind, but usually I'll | use a db first anyway as there are benefits to it. | dobin wrote: | I just use files | spmurrayzzz wrote: | The premise here (I think, correct me if I'm mistaken) is that | there are net-negative tradeoffs to using nosql/non-rdbms. | | If that assumption is true, then it follows that the same | argument used in the last statement also applies-- that if | you're not at massive scale, then its likely the aforementioned | tradeoff of not using RDBMS is likely de minimis. | | (This assumes that the tradeoffs are of the magnitude that they | only manifest impact at scale, hard to address that without | concrete examples though) | [deleted] | [deleted] | bcrosby95 wrote: | > (This assumes that the tradeoffs are of the magnitude that | they only manifest impact at scale, hard to address that | without concrete examples though) | | The tradeoff is usually flexibility. You run into flexibility | problems anytime requirements change. Scale doesn't factor | in. | nlnn wrote: | I've found it's not just scale, but also down to query patterns | across the data being stored. | | I'm with you on using an RDBMS for almost everything, but | worked on quite a few projects where alternatives were needed. | | One involved a lot of analytics queries (aggregations, filters, | grouping etc.) on ~100-200GB of data. No matter what we tried, | we couldn't get enough performance from Postgres (column-based | DBs / Parquet alternatives gave us 100x speedups for many | queries). | | Another was for storing ~100M rows of data in a table with ~70 | columns or so of largely text based data. Workload was | predominantly random reads of subsets of 1M rows and ~20 | columns at a time. Performance was also very poor in | Postgres/MySQL. We ended up using a key/value store, heavily | compressing everything before storing, and got a 30x speedup | compared to using an RDBMS using a far smaller instance host | size. | | I wouldn't call either of them massive scale, more just data | with very specific query needs. | snarfy wrote: | It's pretty old problem as they are competing ideas. It's | OLTP vs OLAP. Postgres is designed for OLTP. | giovannibonetti wrote: | > Another was for storing ~100M rows of data in a table with | ~70 columns or so of largely text based data. Workload was | predominantly random reads of subsets of 1M rows and ~20 | columns at a time. | | Kimball's dimensional modelling helps a lot in cases like | this, since probably there is a lot of repeated data in these | columns. | randomdata wrote: | _> RDBMS can solve pretty much every data storage /retrieval | problem you have._ | | Except the most important problem: A pleasant API. Which is, no | doubt, why 95% of those considering something other than an | RDBMS are making such considerations. | | RDBMS _can_ have pleasant APIs. It is not a fundamental | limitation. We have built layers upon layers upon layers of | abstraction over popular RDBMSes to provide nice APIs and they | work well enough. But those additional layers come with a lot | of added complexity and undesirable dependencies that most | would prefer to see live in the DBMS itself instead. | | At least among the RDBMSes we've heard of, there does not seem | to be much interest in improving the APIs at the service level | to make them more compelling to use natively like alternative | offerings outside of the relational space have done. | swagasaurus-rex wrote: | EdgeDB looks promising. Postgres under the hood so you know | it's stable. | dgb23 wrote: | With SQL you kind of have two options/extremes that are | unpleasant in their own way. | | You either model things in a very domain specific and classic | fashion. Here you get the benefit of being quite declarative | and ad-hoc queries are natural. Also your schema is stronger, | as in it can catch more misuse by default. But this kind of | schema tends to have _logical_ repetition and is so specific | that change/evolution is quite painful, because every new | addition or use-case needs a migration. | | Or you model things very generically, more data driven than | schema driven. You lose schema strength and you definitely | lose sensible ad-hoc queries. But you gain flexibility and | generality and can cover much more ground. | | You can kind of get around this dichotomy with views, perhaps | triggers and such. In an ideal world you'd want the former to | be your views and the latter to be your foundational schema. | | But now you get into another problem, which is that | homogeneous tables are just _super_ rigid as result sets. | There are plenty of very common types you cannot cover. For | example tagged unions, or any kind of even shallowly nested | result (extremely common use case), or multiple result groups | in one query. All of these things usually mean you want | multiple queries (read transaction) or you use non-SQL stuff | like building JSONs (super awkward). | | If you can afford to use something like SQLite, then some of | the concerns go away. The DB is right there so it's fine to | query it repeatedly in small chunks. | | I wonder if we're generally doing it wrong though, especially | in web development. Shouldn't the backend code quite | literally live on the database? I wish my backend language | would be a data base query language first and a general | purpose language second, so to speak. Clojure and its datalog | flavors come close. But I'm thinking of something even more | integrated and purpose built. | N_A_T_E wrote: | Isn't that what the ActiveRecord pattern is supposed to be? | (Something something n+1's and over fetching data.) | eurasiantiger wrote: | If only we had a generic GraphQL resolver for entity- | based SQL schemas. | | Oh wait, we do have Prisma. And it suffers from those | same issues. | drpotato wrote: | Have you looked at PostGraphile? It's doesn't have n + 1 | or over-fetching issues. | paulryanrogers wrote: | What is a pleasant API? For what kind of data? | Yhippa wrote: | I wonder if parent post meant something other than SQL? | brightball wrote: | I've honestly never understood why people have such a | distaste for SQL. SQL and Linux/Unix have been the biggest | constants of my entire programming career to this point | (20ish years). I always know I can count on them. | bakuninsbart wrote: | SQL is great at what it is designed to do, and absolutely | horrible for anything else. Sometimes I have had to use | tools that _only_ allow (a subset of) SQL for querying | data. (Especially BI) Doing regex date validation in a json | derived from a string in an sql dialect without function- | definitions is horrendous. These kinds of "why the f do i | have to use sql for _this_ "-moments happened surprisingly | often to me working as a data analyst. | | But then I'm trying to do relatively simple things in | Pandas or R, like changing column values based on multiple | field value conditions, and it is a struggle requiring at | least 17 brackets, while in SQL it would be a simple | statement that anyone can understand after reading it once. | funcDropShadow wrote: | The problem you are describing is probably rooted in the | specific SQL dialect you had to use. Selecting from | inside JSON strings and matching regular expressions | should be a no brainer. And it is straight forward in | PostgreSQL, e.g. | nequo wrote: | > simple things in Pandas or R, like changing column | values based on multiple field value conditions, and it | is a struggle requiring at least 17 brackets | | With dplyr, wouldn't this be mutate(x = case_when(... ~ | y, TRUE ~ x)) or the same but with ifelse? | oarabbus_ wrote: | I haven't used much R, but how would you do something | like: CASE WHEN SUM(daily_revenue) | OVER (PARTITION BY department, TRIM(SUBSTR(region, 5)) IN | ('North','West','Misc')) > AVG(revenue) OVER | (ORDER BY sale_time ASC rows BETWEEN 28 PRECEDING AND | CURRENT ROW) AND NOT COALSECE(had_prev_month_party, | FALSE) THEN pizza_party_points + 5 | WHEN <above> AND had_prev_month_party THEN | pizza_party_points + 3 WHEN MIN(sale_time) | over (PARTITION BY department) = DATE_TRUNC('month', | current_date) then 5 ELSE | GREATEST(pizza_party_points - 1, 0) END as | pizza_party_performance_points_current | | this example may be a bit esoteric but it actually draws | from lots of real-world cases (comparing a partitioned | sum with an overall average, checking conditions on other | columns, messy data parsing and manipulation, implicit | waterfall nature of CASE WHEN vs. explicit elseif, etc) | oarabbus_ wrote: | >Doing regex date validation in a json derived from a | string in an sql dialect without function-definitions is | horrendous. | | Doing regex is horrendous, but doing it on SQL in a | modern database is no more difficult than in a full- | fledged programming language. Most modern DBs have strong | JSON support and built-in regex functions | | 1. JSON_CAST/JSON_PARSE your data | | 2. REGEXP_EXTRACT() on the result, here's several date | validator regex from a SO post | (https://stackoverflow.com/questions/15491894/regex-to- | valida...) | | And that's it. In fact in many cases it's probably faster | to do it natively in SQL than to export it to python or R | and parse there. | robonerd wrote: | I blame ORMs. ORMs are promoted by scare mongering novice | developers away from learning SQL in the first place. I'm | ashamed to say I fell for it for a few years. When I | eventually learned SQL it was like a fog being lifted from | my mind that I hadn't even noticed before. | reidjs wrote: | I tried not to use an ORM for my last project, but I you | ended up rolling my own ORM for the application anyways | because it was easier to keep the structures in the web | application tied to the database. How else do you keep | the two consistent? | jstarfish wrote: | I wholly agree with you, but I'll say this: | | When it comes to prototyping, I'm not going to fuck with | something like Java-- I'm going to reach for Python. If I | don't _know_ what I 'm doing to begin with, I don't need | something telling me all the ways in which I'm doing it | wrong. | | Same goes for SQL/NoSQL. I loosely know what I need to | model and may revise it arbitrarily. SQL does not lend | itself to this. NoSQL was _designed_ for it. | | NoSQL is the "fuck you dad you can't tell me what to do" of | the database world. SQL is your annoying dad that tries to | institute curfews and won't let you go to underage drinking | parties. In the end, it's the latter you're going to be | calling from a holding cell, but there's a lot more fun you | can have with the former. | funcDropShadow wrote: | In the case of SQL/NoSQL I think it is the other way | around. In order to use a NoSQL database to its strength, | you have to know all access path up front [1]. Because | you are designing your table structure with all the | queries in mind. In SQL on the other hand, you can always | go for a normalized structure and you are pretty safe to | be prepared for most requirements that are coming in the | future. | | Additionally, I think drafting a DDL schema is a great | way to learn about a new application domain. It forces me | to ask hard questions. And that improves my understanding | of the domain. I guess that is similar to some people who | like to prototype using Haskell type signatures. | | [1] Unless you mean specifically a document store without | schemas. | jrop wrote: | > If I don't know what I'm doing to begin with, I don't | need something telling me all the ways in which I'm doing | it wrong. | | This is unrelated to this conversation, but this is my | main beef with Rust. I love Rust (like a lot), but it's | just not good for prototyping. End of non-pertinent rant. | oarabbus_ wrote: | Honestly, I think it's because a lot of folks go their | entire careers avoiding set-based thinking. Many of these | folks are talented programmers so I would assume they'd | become quite decent at SQL if they devoted time to it. I'm | speaking more on the DML side than the DDL side here. | | In my experience <20% of developers are good enough to be | dangerous with SQL, and maybe 5% what I'd consider adept. | | The rest range from "SELECT * is as far as I'll go; where's | the ORM" to "why the hell are you using cursors to | aggregate" | | SQL is powerful, elegant, and reliable. With modern DB | support of JSON, ARRAY, statistical functions, and much | more, SQL is probably the #1 most underutilized/improperly | leveraged tool today. SQL-killers have been coming out for | 40 years now and (for its domain!) SQL's lead is pulling | farther away if anything. | | *yes there are some questionable implementations, so please | replace "SQL" with "PostgreSQL" if nonstandard SQL | implementations have caused nightmares for you in the past. | omginternets wrote: | Where can I learn about the relationship between set- | theory and SQL? | oarabbus_ wrote: | I'm not too well-versed in the academic/mathematical | background of SQL, other than than to say it's closer to | relational algebra than Set theory, and the resource to | read is probably the 1970 Codd paper: | https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf | | SQL does implement the set operations of UNION, | INTERSECT, and EXCEPT, but I meant "thinking in sets" | more colloquially. | | edit: this is more practical than theoretical but the | author here actually does a nice job of discussing "set- | based" thinking vs. procedural | https://www.itprotoday.com/open-source-sql/fangraphs- | tags-cl... | randomdata wrote: | SQL has some annoying quirks but on balance I am not sure | there is the distaste for SQL that you claim. It's fine, | even good in many circumstances, but also too low level for | some tasks which pushes common problems onto application | developers to solve over and over again or build up layers | of dependencies to utilize someone else's solution. That's | not the fault of SQL but the fault of the RDBMS. | | Our layers of abstractions atop SQL along with approaches | taken by DBMSes outside of the common relational names have | shown that there is room for improvement within those | tasks, able to be done natively by the DBMS, and it does | not have to come at the expense of losing SQL when you need | to be able to describe lower level questions. Different | tools for different jobs, but little will within the RDMBS | space to explore those other tools. | zwkrt wrote: | I don't think that people have a distaste for SQL, I think | that large companies have a vested interest in trying to | sell no SQL solutions because they more tightly integrate | into a "cloud" ecosystem. A database solves that problem | domain too well, it's not marketable. | eurasiantiger wrote: | Can you write an SQL query to return arbitrary JSON? | Returning queried data in a nested form is a must nowadays. | ElectricalUnion wrote: | > Can you write an SQL query to return arbitrary JSON? | Returning queried data in a nested form is a must | nowadays. | | Yes. https://www.sqlite.org/json1.html | TheaomBen wrote: | Also https://www.postgresql.org/docs/current/functions- | json.html | slaymaker1907 wrote: | Yeah, a lot of RDBMS are adding JSON support, but the support | is often a bit clunky to use and may be missing important | features. If you're dealing with a bunch of semistructured | APIs that return JSON natively, Mongo makes it really easy to | just dump all that into a collection and then just add | indices as needed. | jjav wrote: | > Except the most important problem: A pleasant API | | A pleasant API is clearly not the most important business | problem a database is there to solve. | | The data in it is presumably the life and blood of the | business, whereas the API is something only developers need | to deal with. | | But that aside, the interface will be SQL which is quite | powerful, long-lived (most important) and, fortunately, very | pleasant. | wrs wrote: | From the SERIALIZABLE explanation: "The database runs the queries | one by one ... It is essential to have some retry mechanism since | queries can fail." | | I know they're trying to simplify, but this is confusing. If the | first part is true, the second part can't be. In reality the | database does execute the queries concurrently, but will try to | make it _seem_ like they were done one by one. If it can't manage | that, a query will fail and have to be retried by the | application. | googletron wrote: | I believe there was a caveat around this exact point later in | the post. It was really tough striking a balance for people | learning this for the first time and more knowledgeable | audience without confusing them further. | | I do appreciate the feedback and will look to add some more | color here! Thank you! | blupbar123 wrote: | It's kind of saying something which isn't true. Optimally one | would find a wording that doesn't confuse beginners but also | is factual, IMHO. | r0b05 wrote: | Nicely written and informative! | googletron wrote: | Thank you! | ssd8991 wrote: | jrm4 wrote: | To go big picture; I'm kind of glad databases are largely like | cars in this respect, in ways that other software tooling isn't. | | Which is to say they're frequently good enough such that the | human working with them on whatever level can safely _not know_ a | lot of these details and get a LOT done. Kudos to whoever | deserves them here. | charcircuit wrote: | Isn't that true for almost all software? You only need to know | the implementation of a small subset of parts. I would say | databases are worse since you need to know how they are | implemented else you will start making O(rows) queries or doing | other inefficient stuff. | jrm4 wrote: | Going broadly (which is all I can do because I teach this | stuff and don't build in depth) -- "the database" is the part | I can most easily "abstract" away as if it were walled off? | | As opposed to aspirationally discrete classifications that | end up being porous, e.g. MVC, "Object Oriented" etc. | Linda703 wrote: | dennalp wrote: | Really nice guide. | otherflavors wrote: | why is this tagged "MySQL" but not also "SQL" | googletron wrote: | Thanks! Added! | googletron wrote: | This is a quick rundown of database indexes and transactions. | Excited to continue sharing these notes with community! | mgrouchy wrote: | I have been really enjoying the content so far, any hits on | whats coming up? | googletron wrote: | We have another couple of notes from a few companies like | Temporal, Sentry, and Gadget. | itsmemattchung wrote: | Awesome. Looking forward to additional content. | | If possible, would be great to get Mark Brooker (Principal | at AWS) to provide some notes on bridging the gap between | CAP theorem and how AWS relaxes constraints for building | out Elastic Block Storage (EBS). | | https://brooker.co.za/blog/2014/07/16/pacelc.html | googletron wrote: | I will definitely reach out! Thanks for the suggestion! ___________________________________________________________________ (page generated 2022-06-27 23:00 UTC)