[HN Gopher] Foundations of Databases (1995)
       ___________________________________________________________________
        
       Foundations of Databases (1995)
        
       Author : harperlee
       Score  : 167 points
       Date   : 2021-04-14 16:22 UTC (6 hours ago)
        
 (HTM) web link (webdam.inria.fr)
 (TXT) w3m dump (webdam.inria.fr)
        
       | ExcavateGrandMa wrote:
       | Behavior DB, let's call it that way :D
        
       | macando wrote:
       | It's incredible how little attention is paid to data modeling and
       | querying in the education of people entering the software
       | engineering workforce.
       | 
       | Getting your database model right, on the logical and physical
       | level, will make developing and deploying any data-driven app
       | simpler and easier. Getting it wrong? No modern programming
       | language or architectural pattern will save you from the worst
       | kinds of bugs, workarounds and bad performance.
       | 
       | Do yourself a favor and read this or any other legendary DB book.
        
         | adamnemecek wrote:
         | > this or any other legendary DB book.
         | 
         | What are the other legendary DB books?
        
           | macando wrote:
           | https://www.amazon.com/Modeling-Essentials-Third-Graeme-
           | Sims...
           | 
           | My favorite. Not an easy read, it took me weeks to complete
           | it.
           | 
           | https://www.amazon.com/Date-Database-
           | Writings-2000-2006-Chri...
           | 
           | Relatively unknown but great book if you want to read the
           | thoughts of a DB heavyweight Christopher Date who
           | collaborated with Edgar Codd.
           | 
           | https://www.amazon.com/NoSQL-Distilled-Emerging-Polyglot-
           | Per...
           | 
           | The best book to start exploring the land outside the
           | relational world. An easy read, can be completed in two to
           | three days.
        
         | vishnugupta wrote:
         | Copy pasting my comment[1] from an earlier thread.
         | 
         | A couple of years ago I spent quite some time trying to
         | evaluate the tech stack (and general engineering culture) of
         | merger/acquisition targets of my employer. It was quite a fun
         | exercise, all said and done. I encountered all sorts; from a
         | small team start up who had their tech sorted out more or less
         | to a largish organisation who relied on IBM's ESB which exactly
         | one person in their team knew how it worked!!
         | 
         | I discovered this exact method during the third tech evaluation
         | exercise. When the team began explaining various modules top-
         | down and user-flows etc., I politely interrupted them and asked
         | for DB schema. It was just on a whim because I was bored of
         | typical one way session interrupted by me asking minor
         | questions. Once I had a hang of their schema rest of the
         | session was literally me telling them what their control and
         | user flows were and them validating it.
         | 
         | Since then it's become my magic wand to understand a new
         | company or team. Just go directly to the schema and work
         | backwards.
         | 
         | Conversely, I've begun paying more attention to data modelling.
         | Because once a data model is fixed it's very hard to change and
         | once enough data accumulates the inertia just increases and
         | instead if changing the data model (for the fear of data
         | migration etc.,) the tendency is to beat the use cases to fit
         | the data model. It's not your usual fail-fast-and-iterate
         | thing.
         | 
         | [1] https://news.ycombinator.com/item?id=24137997
        
           | macando wrote:
           | > I politely interrupted them and asked for DB schema.
           | 
           | In order of importance:
           | 
           | 1. DB schema
           | 
           | 2. List of dependancies
           | 
           | 3. List of 3rd party integrations
           | 
           | Depending on the domain, 2. and 3. can be switched.
        
           | macintux wrote:
           | Many years ago, my employer was transitioning ERP systems
           | from...something CA-owned whose name escapes me and I'd
           | probably have pretty awful memories dredged up if I actually
           | went looking for it.
           | 
           | Anyway, we were lucky enough to have a very talented intern
           | who was assigned the task of understanding the database
           | schema behind the CA product in order to migrate our data to
           | the new system.
           | 
           | It was...horrific. Absolute nightmare of spaghetti. I'd like
           | to say I've seen worse, but I've never seen anything even in
           | the same ballpark.
           | 
           | I think we finally gave up after weeks of digging into it and
           | just started over with a mostly clean slate.
        
         | msluyter wrote:
         | To wit: I made it through a master's in CS without a database
         | class.
         | 
         | This reminds me of the famous Rob Pike quote:
         | 
         | "Data dominates. If you've chosen the right data structures and
         | organized things well, the algorithms will almost always be
         | self-evident. Data structures, not algorithms, are central to
         | programming."
         | 
         | I've often found that if I'm coding something and the code
         | starts looking increasingly gnarly, that rethinking the data
         | structures / data model will clean up the code.
        
           | gautamdivgi wrote:
           | We had two books when I was doing my bachelors in CS many
           | moons ago (late 90's). One was by Silberschatz & Galvin and
           | the other by Ullman. The first one was used for relational
           | algebra, various normal forms and introducing table design,
           | keys, constraints,etc. The second was used to teach the
           | theory for internals on how a DB is implemented - B+ trees,
           | deadlock handling, etc, etc. That was a hard course and it
           | was mandatory.
           | 
           | I can see not electing to have databases as a course for
           | masters though. Masters is to allow you to specialize and be
           | more choice driven.
        
           | andrewl wrote:
           | On that topic, Fred Brooks, author of _The Mythical Man
           | Month,_ said  "Show me your flowchart and conceal your
           | tables, and I shall continue to be mystified. Show me your
           | tables, and I won't usually need your flowchart; it'll be
           | obvious."
        
           | slver wrote:
           | "Data structures, not algorithms, are central to programming.
           | 
           | Can we unify this to simply "data structures are essential to
           | algorithms". It's really weird to put these in opposition,
           | when algorithm with no data and data without algorithm make
           | no sense.
           | 
           | Data structures are encoded with use cases in mind, those use
           | cases at least at the very low level are their algorithms.
        
             | Twisol wrote:
             | I think programming is more than just algorithms. System
             | design and architecture have a real impact on the longevity
             | and maintainability of your system, and I'd argue that data
             | models (not _structures_ per se, the distinction being
             | logical vs. physical) are foundational to a good design. In
             | contrast, algorithms play a much more focused role.
             | 
             | Put differently:
             | 
             | > Data structures are encoded with use cases in mind, those
             | use cases at least at the very low level are their
             | algorithms.
             | 
             | As you've lampshaded, it's the use cases that are most
             | fundamental. The data model should be designed to serve
             | those use cases. The data structures and algorithms reflect
             | the physical reality of the logical data model.
        
           | macando wrote:
           | > "Data dominates. If you've chosen the right data structures
           | and organized things well, the algorithms will almost always
           | be self-evident. Data structures, not algorithms, are central
           | to programming."
           | 
           | I'm saving this quote.
           | 
           | > I've often found that if I'm coding something and the code
           | starts looking increasingly gnarly that rethinking the data
           | structures / data model will clean up the code.
           | 
           | I've faced this over and over again. Writing algorithms is
           | hard when the underlying data is not optimal. It simply
           | invites writing complicated code and workarounds.
        
           | AlphaSite wrote:
           | Although it says data structures, isn't it more important to
           | talk about the structure of the data than the choice of data
           | structure?
        
           | bshipp wrote:
           | every one of my data scraping projects is littered with files
           | titled "database.db.bak1", "database.db.bak2".... for this
           | exact reason. "Oh I scraped 1000 pages and realized I missed
           | an entire nested data structure....better blow away the dB
           | and try again.
        
         | achn wrote:
         | This is very true, but also a hugely unfortunate reality. It
         | amazes me that we have reached this far with "object" graph
         | oriented data models backed with relational DBs. They are very
         | seldom conducive to one another.
        
       | rgbimbochamp wrote:
       | Plugging in Andy Pavlo's Database lectures @ CMU which are
       | completely free on Youtube. Great guy and great lectures.
        
         | alistairw wrote:
         | Agreed. I finished watching them at the start of this year and
         | it's significantly helped me in using and making choices about
         | databases. I've also since started implementing my own database
         | for learning off the back of them. Couldn't be more grateful
         | for those courses being public.
         | 
         | link if anyone is interested: https://youtu.be/oeYBdghaIjc
        
         | golergka wrote:
         | And hands down best music choice of all the CS lectures I've
         | ever seen.
        
         | tinmandespot wrote:
         | Seconded. I have been watching them over the past 2-weeks -
         | he's a brilliant guy and a great teacher.
        
       | jenkstom wrote:
       | We don't need "more theory". The vast majority of computer
       | science theory goes unapplied in solving real-world problems.
       | While I'm sure this is great for postgrad computer science
       | students, it's not very useful for the every day programmers out
       | there.
        
         | dudeman13 wrote:
         | >We don't need "more theory". The vast majority of computer
         | science theory goes unapplied in solving real-world problems
         | 
         | The sheer amount of rewriting that is being done every day
         | around the world because people didn't bother understanding
         | what they were doing (and how they should have thought more
         | before trying to make it work) disagrees with this.
        
         | ggambetta wrote:
         | ...and this is how we ended with NoSQL.
        
         | marcodiego wrote:
         | >We don't need "more theory".
         | 
         | >it's not very useful for the every day programmers out there
         | 
         | So, programmer's talent is not being used to their fullest. The
         | ones which can't understand basic theory can be easily
         | condemned to always be mere users of tools while never
         | improving them. They run a greater risk of becoming obsolete or
         | devalued.
        
       | anonymousDan wrote:
       | Anyone recommend a good book specifically on distributed
       | databases (not more general distributed systems stuff like e.g.
       | klepmann's DDIA)?
        
         | jkaptur wrote:
         | In case someone isn't aware, Designing Data-Intensive
         | Applications is a very good introduction to distributed
         | databases, even if it doesn't specialize in them.
        
           | anonymousDan wrote:
           | Yeah I don't disagree, I've just read it already :)
        
           | artificial wrote:
           | I'll second this. It's an excellent intro that's very
           | approachable.
        
         | convolvatron wrote:
         | I got alot of value out of                 Distributed
         | Databases: Principles and Systems       Stefano Ceri, Giuseppe
         | Pelagatti       McGraw-Hill, 1985
         | 
         | it adopts a very specific approach, and is obviously missing
         | some newer techniques, but it certainly leaves you with a
         | feeling of 'yeah, i can build this'
        
         | e12e wrote:
         | Not a book... But foundationdb documentation (and source) might
         | be worth looking at? https://apple.github.io/foundationdb/
        
         | pcthrowaway wrote:
         | Not a book, but the podcast series "Scaling Postgres" looks
         | really good.
        
         | throwaway823882 wrote:
         | Not a book, but I would add to the foreword: "Try not to use
         | them."
        
           | victor106 wrote:
           | Genuinely curious: What are some challenges with using
           | distributed databases?
        
             | biggestdummy wrote:
             | Tons of complexity behind this, but the simple problem is
             | source of truth. When you store data in multiple places, if
             | they become unsynchronized, which is the true answer???
             | Within a monolithic system, you don't have differing
             | clocks, network partitions that can be very weird, etc.
             | And, a related problem, which is race conditions. What if
             | you are reading one copy of the data while the other
             | replica is being updated. Most complexity, imo, arises from
             | this fundamental problem and the various techniques to deal
             | with it.
        
           | bshipp wrote:
           | This is a valid comment that I'd expand to say "...without
           | exhausting all the tools available."
           | 
           | For example, if you're facing lengthy search queries and are
           | looking at partitioning the database to speed them up, many
           | databases include helpful internal tools that should be
           | attempted first. Proper indexing is a textbook all on it's
           | own, and the appropriate design of full-text search
           | dictionaries and queries is another.
           | 
           | The one that shocked me a year or two ago was implementing
           | full-text search in a gigantic sqlite DB. I was preparing to
           | migrate it to postgresql or elasticsearch because my
           | traditional "ilike" queries, even indexed, were ridiculously
           | slow, and I was not aware FTS was an included library.
           | 
           | Although FTS pretty much doubled the size of the DB, a three
           | minute query dropped to a few seconds. Since read queries are
           | non-blocking in sqlite, this allowed me to continue using the
           | DB without needing a bigger, more complex solution.
        
         | decebalus1 wrote:
         | Database Internals: A Deep Dive into How Distributed Data
         | Systems Work by Alex Petrov
        
       | ChrisArchitect wrote:
       | (1995)
        
         | slver wrote:
         | Year programmers have ignored relational theory since:
        
       | senthil_rajasek wrote:
       | Previous hn post with comments
       | 
       | https://news.ycombinator.com/item?id=19726520
        
       | slyrus wrote:
       | Was lucky enough to take Serge's class when he was visiting UC
       | Berkeley back in the 90's. And yet I'm still writing SQL :(
       | Curious to know what the state in the art in solving the then-
       | thornier theoretical problems (negation mostly) is.
        
       | tingletech wrote:
       | I think this was my textbook when I took a relational database
       | class at UCSD Extension back in the 90s.
        
       ___________________________________________________________________
       (page generated 2021-04-14 23:00 UTC)