[HN Gopher] Kafka Is Not a Database
       ___________________________________________________________________
        
       Kafka Is Not a Database
        
       Author : andrioni
       Score  : 208 points
       Date   : 2020-12-08 15:52 UTC (7 hours ago)
        
 (HTM) web link (materialize.com)
 (TXT) w3m dump (materialize.com)
        
       | amai wrote:
       | Elasticsearch is also not a database.
        
         | charlieflowers wrote:
         | Could you elaborate on that? Because it sure seems like a DB to
         | me.
         | 
         | Not a relational one. And should not replace a typical CRUD
         | OLTP DB.
         | 
         | But it sure seems like a no-sql DB to me.
        
       | detay wrote:
       | using the any tool for correct problem requires skills.
        
       | joking wrote:
       | neither has to be.
        
       | diehunde wrote:
       | Relevant to the discussion:
       | 
       | Martin Kleppmann | Kafka Summit SF 2018 Keynote (Is Kafka a
       | Database?) [1]
       | 
       | [1] https://www.youtube.com/watch?v=v2RJQELoM6Y
        
       | jdmichal wrote:
       | So, the problem really being addressed but not named is that
       | eventing systems give _eventual_ consistency. But sometimes that
       | 's not good enough. And it's OK to admit that and bring in
       | another technology when you need a stronger guarantee than that.
       | 
       | The example I was taught with was a booking system, where the
       | inventory management system-of-record was separate from the
       | search system. Search does not need 100% up-to-date inventory. A
       | delay between the last item being booked and it being removed
       | from the search results is acceptable. In fact, it has to be
       | acceptable, because it can happen anyway. If someone books the
       | last item after another hit the search button... There's nothing
       | the system can do about that.
       | 
       | When actually committing a booking, however, then that must be
       | atomically done within the inventory management system.
       | 
       | So, to bring it home, it's OK for the search system to be
       | eventually consistent against bookings, and read bookings off of
       | an event stream to update its internal tracking. However, the
       | bookings themselves cannot be eventually consistent without
       | risking a double-booking.
        
       | Cojen wrote:
       | Jim Gray disagrees:
       | https://arxiv.org/ftp/cs/papers/0701/0701158.pdf
        
         | georgewfraser wrote:
         | It's funny that you use that example, we actually cited that in
         | an earlier draft of this post. Despite the seemingly opposite
         | title "Queues are Databases", that note actually makes many of
         | the same arguments, that message brokers are missing much of
         | the functionality of database management systems and this is a
         | problem.
        
       | [deleted]
        
       | arthurcolle wrote:
       | I had an issue with RabbitMQ where I didn't know how my consumer
       | was going to use the data that I was writing to a queue yet (from
       | a producer that was listening on a SocketIO or WebSockets
       | stream), and I was kind of just going to figure it out in an hour
       | or something.
       | 
       | Eventually, my buffer ran out of memory and I couldn't write
       | anything else to it, and it was dropping lots of messages. I was
       | bummed. Is there a way to avoid this in Kafka?
        
         | atmosx wrote:
         | RabbitMQ and Kafka serve different needs.
         | 
         | RabbitMQ is the most widely used open source implementation of
         | the AMQP protocol. It is slower but can support complex routing
         | scenarios internally and handle situations were at-least-once-
         | delivery guarantees are important. RabbitMQ supports on-disk
         | persistent queues, which you can tune if you like. Compared to
         | Kafka, RabbitMQ is slow in terms of volume that can be managed
         | per queue.
         | 
         | Kafka is fast because it is horizontally scalable and you have
         | parallel producers and consumers per topic. You can tune the
         | speed and move needle where you need between consistency and
         | availability. However, if you want things like at-least-once-
         | delivery and such, you'll have to use the building blocks kafka
         | gives you, but ultimately you'll have to handle this on the
         | application side.
         | 
         | Regarding storage, by default kafka stores data for 7 days.
         | IIRC the NY Times stores articles from 1970 onwards on kafka
         | clusters. The storage is horizontally scalable and durable.
         | This is a common use case. As many have pointed out, the
         | cluster setup depends highly on you needs. We store data for 7
         | days in kafka as well and it's in the order of 500GB or more
         | per node.
         | 
         | Looks like you have a configuration issue. You can configure
         | rabbitMQ to store queues on the hard disk and with a quick
         | calculation you can make sure you have enough space for 10 or
         | 150 hours of data. I don't see any reason to switch to kafka, a
         | different tool with different characteristics, just because you
         | need more storage.
        
       | jgraettinger1 wrote:
       | This post doesn't mention the _actual_ answer, which is to:
       | 
       | 1) Write a event recording a _desire_ to checkout. 2) Build a
       | view of checkout decisions, which compares requests against
       | inventory levels and produces checkout _results_. This is a
       | stateful stream/stream join. 3) Read out the checkout decision to
       | respond to the user, or send them an email, or whatever.
       | 
       | CDC is great and all, too, but there are architectures where ^
       | makes more sense than sticking a database in front.
       | 
       | Admittedly working up highly available, stateful stream-stream
       | joins which aren't challenging to operate in production is...
       | hard, but getting better.
        
         | fuckadtech wrote:
         | Hard is an understatement. Particularly so if you are using
         | Kafka Streams to attempt to run a highly available, fault
         | tolerant, zero downtime, etc., service. The race condition and
         | compaction bugs in that library are not fun to debug.
        
       | EamonnMR wrote:
       | Kafka is a very nice communication channel. You can dump the
       | results into a database and query it if you need a database.
        
       | jkarneges wrote:
       | Another potential misuse of Kafka I've been wondering about is
       | how a single Kafka instance/cluster is often shared by multiple
       | microservices.
       | 
       | On one hand the ability to connect multiple microservices to a
       | central message broker is convenient, but on the the other hand
       | this goes against the microservice philosophy of not sharing
       | subcomponents (databases, etc). I wonder where the lines should
       | be drawn.
        
         | ivalm wrote:
         | Wait, what? Isn't the whole point of having multiple
         | publishers/subscribers?
        
           | mrweasel wrote:
           | I think the point was about using a single cluster for
           | multiple topics, for different services.
           | 
           | Depending on the scenario I can see the point. If the micro
           | services are all part of the larger overall solution, having
           | a single cluster is perfectly fine. Using the same cluster
           | for multiple "product" is a little like having one central
           | database server for a number of different solutions. You can
           | do it, but it potentially become a bottleneck or a central
           | point for your different solutions to impact performance of
           | each other.
        
             | jkarneges wrote:
             | I'd agree there is an arguable difference between sharing a
             | server vs sharing data within the server.
             | 
             | Bottleneck issues aside, letting two microservices connect
             | to the same Postgres cluster but access different
             | "databases" (collection of tables) within that cluster
             | could be considered an acceptable data separation.
             | Certainly with multi-tenant DBaaS systems there may be some
             | server sharing by unrelated microservices/customers.
             | Whereas letting two microservices access the same database
             | tables would probably be frowned upon.
             | 
             | Nevertheless, sharing the same Kafka topics between
             | microservices seems to be a common thing to do.
        
               | ivalm wrote:
               | > Whereas letting two microservices access the same
               | database tables would probably be frowned upon.
               | 
               | > Nevertheless, sharing the same Kafka topics between
               | microservices seems to be a common thing to do.
               | 
               | I think if it is part of one whole isn't this fine? You
               | have one service that generates customer facing output,
               | you may have another service that powers
               | analytics/dashboards you may have yet another service
               | that ETLs data into some data mart. Why wouldn't they
               | touch the same table/subscribe to the same topic (since
               | they just need read-only access to the data)? Genuinely
               | curious what the problem is except for
               | bottleneck/performance; and if it just bottleneck then
               | wouldn't scaling horizontally solve it?
        
               | jkarneges wrote:
               | Sure. I believe microservice boundaries are more about
               | development agility rather than scalability. By limiting
               | each microservice to a minimal API surface and a "2
               | pizza" team, everyone can iterate faster. And if a
               | particular microservice is implemented as multiple sub-
               | services sharing database tables only known by the team
               | in charge, that seems fine.
        
         | mumblemumble wrote:
         | I would argue that, if it's being used properly, the message
         | broker itself is a service. It runs as a separate process, you
         | communicate with it over an API, and its subcomponents (e.g.,
         | the database) are encapsulated.
         | 
         | It's all about framing and perspective, of course. But that's
         | how I'd want to try and frame it from a system architecture
         | point of view.
        
           | wongarsu wrote:
           | By that same reasoning Postgres is it's own micro service. It
           | runs as a separate process, you communicate with it over a
           | well defined API, and it's subcomponents (data store, query
           | optimizer etc) are encapsulated.
           | 
           | With enough framing everything is possible, and in some
           | contexts it will even make sense.
        
             | mumblemumble wrote:
             | It has to do with how it functions in practice, IMO.
             | PostgreSQL itself is arguably a service, but the database
             | probably is not - you're probably crawling all over its
             | implementation details and data model.
             | 
             | You _could_ take a stand and say,  "All access is through
             | stored procedures. They are the API." And, if that API
             | operates as the same semantic level as a well-crafted REST
             | API, then you could make an argument that that particular
             | database is a microservice that just happens to have been
             | implemented on top of PostgreSQL. But I don't think I've
             | ever seen such a thing happen in the wild. It's much more
             | popular to use an ORM to get things nice and tightly
             | coupled.
        
               | selfhoster11 wrote:
               | Such implementations exist and have been discussed a few
               | days ago here on HN. There are also REST adapters for
               | Postgres: https://github.com/PostgREST/postgrest
        
         | onefuncman wrote:
         | this isn't any different than microservices getting a deathball
         | dependency on a user service or logging service or security
         | service or ...
         | 
         | you either don't allow microservices to consume from others'
         | topics, or you publish event schemas so they can still iterate
         | independently.
         | 
         | the move to a 2nd kafka cluster in my experience has always
         | been driven by isolation and fault tolerance concerns, not
         | scalability.
        
       | tacitusarc wrote:
       | I think because software engineers tend to excel at pattern
       | recognition, oftentimes solutions to different problems appear so
       | similar that it seems like with a small amount of abstraction,
       | they can be reused. But it's a trap!
       | 
       | Everything abstracted to the highest level is the same, but
       | problems aren't solved at the highest level.
       | 
       | The devil, as they say, is in the details.
        
         | UK-Al05 wrote:
         | This is lack of abstraction. You can certainly fix this in
         | kafka using various hacks, but its implementation of an
         | abstraction you can get in a standard db for free.
         | 
         | Funnily enough a list of events is pretty much what a
         | transaction log is in a standard db. Although the events have
         | more of a business meaning. In many ways event sourcing is
         | removing a lot of abstraction databases give you.
        
           | mrkeen wrote:
           | > a standard db
           | 
           | Yes, ACID works for one database. Many databases? Not so
           | much.
           | 
           | > Funnily enough a list of events is pretty much what a
           | transaction log is in a standard db.
           | 
           | When I "SELECT Balance WHERE user = 12345", I usually just
           | get back a balance, I don't get back the transaction log.
           | 
           | If nothing else, adopting the Kafka model gets your teammates
           | to append updates to your ledger, rather than changing values
           | in-place.
        
             | UK-Al05 wrote:
             | Yes that my point. You get half of a database. The
             | transaction log. Not the up to date state.
        
       | hodgesrm wrote:
       | Is this really a thing? Do people _really_ try to use Kafka as
       | the system of record for financial transactions or similar data?
        
         | mrkeen wrote:
         | Hell yes. Best thing I ever did.
         | 
         | Updating balances using an RDBMS is like managing your finances
         | with pencils and erasers. Unless you somehow ban the UPDATE
         | statement.
         | 
         | Updating balances with Kafka is like working in pen. You
         | can't[1] change the ledger lines, you can only add corrections
         | after the fact.
         | 
         | [1] Yes, Kafka records can be updated/deleted depending on
         | configuration. But when your codebase is written around append-
         | only operations, in-place mutations are hard and corrections
         | are easy, so your fellow programmers fall into the 'pit of
         | success'.
        
           | hodgesrm wrote:
           | I would just put the ledger in a database table if it's that
           | important and maintain the current state of the account in a
           | separate table. ACID transactions and database constraints
           | make this kind of consistency easier to achieve than many
           | alternatives. It's also easier to prove correctness since you
           | can run queries that return consistent results thanks to the
           | isolation guaranteed by ACID. (Modulo some corner cases that
           | are not hard to work around.)
           | 
           | Just my $0.02.
        
             | mrkeen wrote:
             | > ACID transactions and database constraints make this kind
             | of consistency easier to achieve than many alternatives.
             | 
             | If your company only runs one database.
             | 
             | > It's also easier to prove correctness
             | 
             | RDBMSs are wonderful and I don't consider them unreliable
             | at all. But I can't prove the correctness of my teammate's
             | actions. I want them to show their working by putting their
             | updates onto an append-only ledger.
        
               | jacques_chester wrote:
               | > _If your company only runs one database._
               | 
               | I think the same argument can be made with "only one
               | Kafka cluster" and "only one blockchain".
        
           | sixdimensional wrote:
           | Blockchain databases take this model to the extreme - it is a
           | database that is append only and immutable, with
           | cryptographic guarantees of the ledger.
           | 
           | I'm not selling any particular tool but Amazon's QLDB is an
           | interesting example of a blockchain-based database. I am
           | interested to see how things like Kafka and this might come
           | together somehow.
           | 
           | In my opinion, we have evolved to a point where storage is
           | not a concern for temporal use cases - i.e. we can now store
           | every change in an immutable fashion. When you think about
           | this being an append only transaction log that you never have
           | to purge, and you make observable events on that log (which
           | is what most CDC systems do)... yeah it works. Now you have
           | every change, cryptographically secure, with events to
           | trigger downstream consumers and you can really rethink the
           | whole architecture of monolithic databases vs. data
           | platforms.
           | 
           | It is an exciting time we are in, in my opinion.
        
           | jacques_chester wrote:
           | The original sin of SQL is that it begins without memory.
           | 
           | The original sin of Kafka is that it begins with _only_
           | memory.
           | 
           | To me the right middle way is relational temporal tables[0].
           | You get both the memoryless query/update convenience and the
           | ability to travel through time.
           | 
           | [0] SQL:2011 introduced temporal data, but in a slightly
           | janky just-so-happens-to-fit-Oracle's-preference kind of way.
        
         | [deleted]
        
         | fuckadtech wrote:
         | Yes, with great pain, backing a graph database. Powering
         | billions of dollars in revenue.
        
       | fouc wrote:
       | Any sufficiently complex software will end up implementing a
       | database.
        
         | dgb23 wrote:
         | The article links to this talk[0], which has a funny and
         | interesting sounding title: "Did you accidentally build a
         | database?"
         | 
         | [0] https://www.oreilly.com/library/view/strata-
         | hadoop/978149194...
        
           | revertts wrote:
           | That link's a 3min clip for non-subscribers, but the full
           | talk is here https://www.youtube.com/watch?v=Bz2EXg0Fy98
        
         | quickthrower2 wrote:
         | I'd qualify that as "complex infrastructure software"
        
       | UK-Al05 wrote:
       | One way around this is to make sure your kafka command streams
       | are processed in order, in serial partitioned by an id where you
       | want the concurrency control.
       | 
       | Normally you only want concurrency control within certain
       | boundaries.
       | 
       | By figuring out the minimum amount transaction and concurrency
       | boundaries you can inch out quite a bit of performance.
        
         | loopz wrote:
         | Sure, but that defeats the quest for horizontal scalability.
         | You _can_ build highly performant systems based on serial
         | execution, but not sure this is an area where Kafka excels
         | particularly.
        
           | UK-Al05 wrote:
           | That's why you partition by some id. Say stock SKU id for
           | stock control. Then you can handle other SKUs in parallel.
           | It's only in serial for a single SKU. That's probably the
           | maximum performance potential your going to get in a
           | traditional db anyway.
        
             | jacques_chester wrote:
             | This strikes me as mixing the physical and logical models.
        
             | edbrown23 wrote:
             | This definitely seems like the "Kafka" way to solve this
             | problem, but I fear there are implications to this
             | partitioning scheme I'd love to see answered. For example,
             | partition counts aren't infinite, and aren't easily
             | adjusted after the fact. So if you choose, say, 10
             | partitions originally, for a SKU space that is nearly
             | infinite, then in reality you can only handle 10 parallel
             | streams of work. Any SKU that is partitioned behind a bit
             | of slow work is then blocked by that work.
             | 
             | It's doable to repartition to 100 partitions or more, but
             | you basically need to replay the work kept in the log based
             | on 10 partitions onto the new 100 partitions, and that
             | operation gets more expensive over time. Then of course
             | you're basically stuck again once your traffic increases to
             | a high enough level that the original problem returns. If
             | the unit of horizontal scaling is the partition, but the
             | partition count can't be easily changed, consumers
             | eventually lose their horizontal scalability in Kafka, from
             | my perspective.
        
               | morelisp wrote:
               | On the other hand Kafka partitions are relatively cheap
               | on both broker and client side; 100 partitions does not
               | require 100 parallel consumers so over-provisioning is
               | not so risky.
        
       | soumyadeb wrote:
       | The architecture of dumping events into Kafka and creating
       | materialized views is a perfect choice for many use cases - e.g.
       | collecting clickstream data and building analytical reports.
       | 
       | If ACID is a prerequisite, then lot of things won't classify as
       | databases - None of Mongo, Cassandra, ElasticSearch etc. Not even
       | many data-warehouses.
        
       | je42 wrote:
       | Ok. I admit using Kafka as DB is not straight forward but just
       | stating it doesn't provide ACID functionality is not enough.
       | 
       | The example they give is very simplistic. With the correct design
       | of kafka topics and events the problem of the example can be
       | fixed.
       | 
       | And according to oracle https://www.oracle.com/database/what-is-
       | database/ :
       | 
       | > A database is an organized collection of structured
       | information, or data, typically stored electronically in a
       | computer system.
       | 
       | So Kafka clearly fits that definition.
        
         | satisfaction wrote:
         | I'm guilty of using it as a DB, my home weather station writes
         | to kafka topics, sometimes the postgres instance is down for
         | months, no problems letting the kafka topic store the data
         | until i get around to rebooting pg and restarting the
         | connector.
        
         | dathinab wrote:
         | Didn't some newspaper use Kafka to store the newspapers they
         | released in order or something similar? (I think it was the New
         | York Times, maybe??).
         | 
         | Honestly as long as you don't use it as a general purpose
         | database, it might very well be the best choice for your use-
         | case.
        
           | je42 wrote:
           | Indeed. Known the tools and apply the one that has least cons
           | and most pros ;).
           | 
           | Also, a good starter to for knowing how to use a Streaming
           | Store like Kafka as DB is the video Database inside-out.
           | https://www.youtube.com/watch?v=fU9hR3kiOK0
        
           | barnabask wrote:
           | 2017 article: https://open.nytimes.com/publishing-with-
           | apache-kafka-at-the...
        
         | halbritt wrote:
         | The author presumes that every use case requires a
         | transactional database. ACID is nice, especially if it's
         | needed, but generally not needed, especially in many streaming
         | data applications for which Kafka is most suitable.
        
       | based2 wrote:
       | https://www.postgresql.org/docs/10/rules-materializedviews.h...
        
       | Kalium wrote:
       | As recently as last year, I worked for a company where the Chief
       | Architect, in his infinite wisdom, had decided that a database
       | was a silly legacy thing. The future looked like Kafka streams,
       | with each service being a function against Kafka streams, and
       | data retention set to infinite.
       | 
       | Predictably, this setup ran into an interesting assortment of
       | issues. There were no real transactions, no ensured consistency,
       | and no referential integrity. There was also no authentication or
       | authorization, because a default-configured deployment of Kafka
       | from Confluent happily neglects such trivial details.
       | 
       | To say this was a vast mess would be to put it lightly. It was a
       | nightmare to code against once you left the fantasy world of
       | functional programming nirvana and encountered real requirements.
       | It meant pushing a whole series of concerns that isolation
       | addresses into application code... or not addressing them at all.
       | Teams routinely relied on one another's internal kafka streams.
       | It was a GDPR nightmare.
       | 
       | Kafka Connect was deployed to bridge between Kafka and some real
       | databases. This was its own mess.
       | 
       | Kafka, I have learned, is a very powerful tool. And like all
       | shiny new tools, deeply prone to misuse.
        
         | sixdimensional wrote:
         | It sounds like this would have made a better proof of concept
         | than an commitment to the architecture.
         | 
         | The idea on the face of it is not per se a bad one, quite
         | interesting, but the implementations are perhaps not there yet
         | to back such an idea.
         | 
         | It's important to know as an architect when your vision for the
         | architecture is outpacing reality, to know your time horizon,
         | and match the vision with the tools that help you implement the
         | actual use cases you have in your hand right now.
         | 
         | It sounds like this person might have had an interesting idea
         | but not a working system. In another light, this could have
         | been a good idea if all the technology was in place to support
         | it.. but the timing and implementation doesn't sound like it
         | was right, perhaps.
         | 
         | The old saying "use the right tool for the job" comes to mind,
         | but that can be hard to see when the tools are changing so
         | fast, and there is a risk to going too far onto the bleeding
         | edge. Perhaps the saying should have been, "use the rightest
         | tool you can find at the time, that gives you some room to
         | grow, for the job"...
        
           | Kalium wrote:
           | It was definitely an interesting proof of concept that needed
           | some refinement. The core idea was functional services
           | against nicely organized data streams on a flat network.
           | Which is a really cool approach that works quite well for a
           | lot of things.
           | 
           | Several of these points fell apart when credit card handling
           | and PCI-DSS entered the picture.
        
         | JMTQp8lwXL wrote:
         | Instead of single architects, companies need architect boards.
         | And they can vote on these ideas before a single individual
         | becomes a single point of failure.
         | 
         | Expecting 1 person to make 100% correct decisions all the time
         | is too much expectation for one person. People go down rabbit
         | holes and they have weird takeaways, like replace all the
         | databases with queues.
        
           | Kalium wrote:
           | I agree in abstract, but in practice it's quite difficult to
           | set up a successful democratic architecture board. You need
           | teams or departments that all have architects, and an
           | engineering organization where both managers and engineers
           | accept a degree of centralized technical leadership. Getting
           | there is, in my opinion, the work of years. It's especially
           | challenging because spinning up such a board requires a
           | person who can run it single-handedly.
           | 
           | In this particular company, the Chief Architect in theory had
           | a group around him. They did nothing to check his poor
           | decisions, and from the outside seemed primarily interested
           | in living in the functional programming nirvana he promised
           | them.
        
             | sixdimensional wrote:
             | Thinking in terms of building physical buildings..
             | architects are often the visionaries but engineers are the
             | realists. You know, architects come up with these crazy
             | incredible building designs based on their engineering
             | understanding, but ultimately it has to be vetted, proven
             | and implemented by engineers.
             | 
             | I have, however, always wondered why people should be seen
             | as "only architects" and "only engineers". While the
             | separation of duties is critical to ensure the overall
             | construction is sound, people can be visionary engineers,
             | and people can be knowledgeable in both how to do something
             | in real terms as well as dreaming on how to go beyond.
        
         | EamonnMR wrote:
         | At our company we use a ton of services that operate
         | essentially as as functions on a Kafka stream (well, they tend
         | to read/write in batches for efficiency) but we write event
         | streams we want to query later into a regular database for
         | later query. It works out very well. The idea of our poor Kafka
         | cluster having to field queries in addition to the load of
         | acting as a transport layer is frightening. The 'superpower'
         | Kafka gives you is the ability to turn back time if something
         | goes wrong and the ability to orchestrate really big pipeline.
         | You have to build or buy a fair bit of tooling to make it work
         | though.
        
           | Kalium wrote:
           | This particular org did its level best to think of Kafka as
           | the regular database for queries.
        
       | shay_ker wrote:
       | This is maybe a silly question, but what's the difference between
       | the timely dataflow that Materialize uses and Spark's execution
       | engine? From my understanding they're doing very similar things -
       | break down a sequence of functions on a stream of data,
       | parallelize them on several machines, and then gather the
       | results.
       | 
       | I understand that the feature set of timely dataflow is more
       | flexible than Spark - I just don't understand why (I couldn't
       | figure it out from the paper, academic papers really go over my
       | head).
        
         | frankmcsherry wrote:
         | There are a few differences, the main one between Spark and
         | timely dataflow is that TD operators can be stateful, and so
         | can respond to new rounds of input data in time proportional to
         | the new input data, rather than that plus accumulated state.
         | 
         | So, streaming one new record in and seeing how this changes the
         | results of a multi-way join with many other large relations can
         | happen in milliseconds in TD, vs batch systems which will re-
         | read the large inputs as well.
         | 
         | This isn't a fundamentally new difference; Flink had this
         | difference from Spark as far back as 2014. There are other
         | differences between Flink and TD that have to do with state
         | sharing and iteration, but I'd crack open the papers and check
         | out the obligatory "related work" sections each should have.
         | 
         | For example, here's the first para of the Related Work section
         | from the Naiad paper:
         | 
         | > Dataflow Recent systems such as CIEL [30], Spark [42], Spark
         | Streaming [43], and Optimus [19] extend acyclic batch dataflow
         | [15, 18] to allow dynamic modification of the dataflow graph,
         | and thus support iteration and incremental computation without
         | adding cycles to the dataflow. By adopting a batch-computation
         | model, these systems inherit powerful existing techniques
         | including fault tolerance with parallel recovery; in exchange
         | each requires centralized modifications to the dataflow graph,
         | which introduce substantial overhead that Naiad avoids. For
         | example, Spark Streaming can process incremental updates in
         | around one second, while in Section 6 we show that Naiad can
         | iterate and perform incremental updates in tens of
         | milliseconds.
        
         | mamon wrote:
         | There's no difference really. All "Big Data" (tm) tools are
         | trying to capitalize on the hype, so Kafka adds database
         | capabilities, while Spark adds Streaming. At some point they
         | will reach feature parity.
        
       | vladsanchez wrote:
       | I want to Upvote this more than once. So much facts into a
       | condensed into a small essay. Good job!
       | 
       | Money quote: "Event-sourced architectures like these suffer many
       | such isolation anomalies, which constantly gaslight users with
       | "time travel" behavior that we're all familiar with."
        
         | rowland66 wrote:
         | Maybe I am just old because I had to Google what gaslighting
         | meant, but as best I can tell getting gaslighted by your system
         | architecture is really stretching the meaning of the term
         | gaslighting to the point of absurdity.
        
       | theptip wrote:
       | This is a bit dumbed down, and ignores the domain terminology
       | required to properly discuss the trade-offs here (which is
       | puzzling given that it links to a post by Aphyr, where you can
       | find incredibly thorough discussions around isolation levels and
       | anomalies).
       | 
       | > The fundamental problem with using Kafka as your primary data
       | store is it provides no isolation.
       | 
       | This is false. I can only assume the author doesn't know about
       | the Kafka transactions feature?
       | 
       | To be specific, Kafka's transaction machinery offers read-
       | committed isolation, and you get read-uncommitted by default if
       | you don't opt-in to use that transaction machinery (the docs:
       | https://kafka.apache.org/0110/javadoc/index.html?org/apache/...).
       | Depending on your workload, read-committed might be sufficient
       | for correctness, in which case you can absolutely use Kafka as
       | your database.
       | 
       | Of course, proving that your application is sound with just read-
       | committed isolation is can be challenging, not to mention testing
       | that your application continues to be sound as new features are
       | added.
       | 
       | Because of that, in general I think that the underlying point of
       | this article is probably correct, in that you probably shouldn't
       | use Kafka as your database -- but for certain applications / use-
       | cases it's a completely valid system design choice.
       | 
       | More generally this is an area that many applications get wrong
       | by using the wrong isolation levels, because most frameworks
       | encourage incorrect implementations by their unsafe defaults;
       | e.g. see the classic "Feral concurrency control" paper
       | http://www.bailis.org/papers/feral-sigmod2015.pdf. So I think the
       | general message of "don't use Kafka as your DB unless you know
       | enough about consistency to convince yourself that read-committed
       | isolation is and will always be sufficient for your usecase"
       | would be more appropriate (though it's certainly a less snappy
       | title).
        
         | georgewfraser wrote:
         | "Read-committed isolation" is not a meaningful implementation
         | of transactions. If you can't do read, then a write, while
         | guaranteeing the database didn't change in between, then you
         | don't really have transactions.
        
           | [deleted]
        
           | LgWoodenBadger wrote:
           | This sounds like "serializable" which is (in my experience)
           | rarely useful for a meaningful system.
        
           | theptip wrote:
           | Depends on your use-case; if it's meaningless, why is it
           | implemented in all the leading SQL DBs? It's the default in
           | Postgres...
           | 
           | https://www.postgresql.org/docs/9.5/transaction-iso.html
           | 
           | If you're arguing that in practice this isn't enough
           | isolation, then sure, that's what I said in my post; most
           | applications need more than the default isolation levels. I
           | feel like you're making an absolutist point (just like the
           | original article) where my point was that the domain is
           | actually more nuanced, and absolutes just obscure the
           | technical complexity.
        
       | megachucky wrote:
       | Here is my answer if Kafka is a database:
       | 
       | (the answer is yes, but you should still not try to replace every
       | other database)
       | 
       | https://www.kai-waehner.de/blog/2020/03/12/can-apache-kafka-...
       | 
       | Would be curious what the people here think about my post, too.
       | Disclaimer: I work for Confluent - and I am also happy about
       | critical feedback.
        
       | fredliu wrote:
       | Kafka is essentially commit logs, which are at the core of any
       | traditional database engines. Streaming is just turning the gut
       | of DB inside out (mostly for scalability reasons), while DB is
       | wrapped up commit logs that provides higher level functionalities
       | (ACID, Transactions, etc.). It's two sides of the same coin, yin
       | and yang of the same thing... But on the practical side of
       | things, yes, if what you needed more are indeed what's described
       | in this article, your life would be easier with a traditional DB.
        
       | hasanic wrote:
       | I mean, duh? Does Apache Kafka ever made the claim that it is a
       | database?
       | 
       | Other things that are not a database: Apache Traffic Server,
       | Apache Mahout, Apache Jakarta, Apache ActiveMQ... hundreds of
       | these exist.
        
         | dang wrote:
         | The title sounds generic, but the article makes it clear that
         | it's responding to specific proposed uses of Kafka.
        
       | Spivak wrote:
       | I feel like the inventory thing is a bit of a straw-man because
       | the situation is set out in such a way that you need transactions
       | for it to work. If you find yourself wishing you had a global
       | write-lock on a topic to then of course it won't work. Modeling
       | your data for Kafka is work just the same as it is for MySQL. Of
       | course it might not be the best tool for the job but you should
       | at least give it a fair shake.
       | 
       | You should be able to post "buy" messages to a topic without fear
       | that it messes up your data integrity. Who cares if two people
       | are fighting over the last item? You have a durable log. Post
       | both "buys" and wait for the "confirm" message from a consumer
       | that's reading the log at that point in time, validates, and
       | confirms or rejects the buys. At the point that the buy reaches a
       | consumer there is enough information to know for sure whether
       | it's valid or not. Both of the buy events happened and should be
       | recorded whether they can be fulfilled or not.
        
         | jacques_chester wrote:
         | > _Who cares if two people are fighting over the last item?_
         | 
         | The two people, at least. Customers tend to be a bit
         | underwhelmed by "well, the CAP theorem..." as a customer
         | service script.
        
           | Spivak wrote:
           | None of this shows up as user-facing any differently than a
           | relational database. No CAP theorem at all.
           | 
           | Kafka:
           | 
           | User clicks buy and it shows "processing" which behind the
           | scenes posts the buy message and waits for a "confirmed"
           | message. When it's confirmed user is directed to success! If
           | someone else posts the buy before them they get back a
           | "failed: sold out" message.
           | 
           | Relational:
           | 
           | User clicks buy and it shows "processing" which behind the
           | scenes tries to get a lock on the db, looks at inventory,
           | updates it if there's still one available, and creates a row
           | in the purchases table. If all this works the user is
           | directed to success. If by the time the lock was acquired the
           | inventory was zero the server returns "failure: sold out".
        
             | jacques_chester wrote:
             | The CAP theorem line was smart-arsery.
             | 
             | The thing here is that the database can update the cart and
             | the inventory in one logical step, to the exclusion of
             | others.
             | 
             | The Kafka approach doesn't guarantee that out of the box,
             | leading to the creation of de facto locking protocols
             | (write cart intent, read cart intent, write inventory
             | intent ...). A traditional database does that for you with
             | selectable levels of guarantees.
        
               | Spivak wrote:
               | You're absolutely right but I think the advantage is that
               | to the users of the database it doesn't really feel like
               | there's any locking going on and it's this implicit thing
               | that's constructed from the history.
               | 
               | Like for a hypothetical ticket selling platform you just
               | get a log like                   00:00 RESERVE AAA FOR
               | xxx, 5 min         00:02 BUY     BBB FOR qqq
               | 00:15 RESERVE AAA FOR yyy, 5 min         00:23 CONFIRM
               | AAA FOR xxx         00:25 CONFIRM BBB for qqq
               | 00:27 REJECT  AAA FOR yyy, "already reserved"
               | 05:16 RESERVE AAA FOR zzz, 5 min         05:34 CONFIRM
               | AAA FOR zzz
               | 
               | So although there's "locking" going on here enforced by
               | the consumer sending the confirms the producer just sends
               | intents/events and sees whether they're successful or not
               | and both producer and consumer are stateless.
               | 
               | I guess it just depends on which model you think is more
               | a PITA to work with.
        
               | AndyPa32 wrote:
               | Yeah, if you are lucky enough that your cart and
               | inventory is stored in the same database.
        
               | jacques_chester wrote:
               | Of course. That's why Materialize are interesting, they
               | are apparently able to give a greatly improved
               | compute:result ratio than the previous state of the art.
        
       | j-pb wrote:
       | Tbh, It's a weird blog post coming from the materialize folks,
       | considering they know better.
       | 
       | The "event sourced" arch they sketched is missing pieces. Normaly
       | you'd have single writer instances that are locked to the
       | corresponding kafka partition, which ensure strong transactional
       | guarantees, IF you need them.
       | 
       | Throwing shade for maketings sake is something that they should
       | be above.
       | 
       | I mean c'mon, I'd argue that Postgres enhanced with Materialize
       | isn't a database anymore either, but in a good sense!
       | 
       | It's building material. A hybrid between MQ, DB, backend logic &
       | frontend logic.
       | 
       | The reduction in application logic and the increase in
       | reliability you can get from reactive systems is insane.
       | 
       | SQL is declarative, reactive Materialize streams are declarative
       | on a whole new level.
       | 
       | Once that tech makes it into other parts of computing like the
       | frontend, development will be so much better, less code, less
       | bug, a lot more fun.
       | 
       | Imagine that your react component could simply declare all the
       | data it needs from a db, and the system will figure out all the
       | caching and rerendering.
       | 
       | So yeah, they have awesome tech with many advantages, so I don't
       | get why they bad-mouth other architectures.
        
         | dkhenry wrote:
         | I am not terribly surprised. The materialize team was
         | previously at CockroachDB which also had a habit of putting out
         | marketing material like this.
        
           | _jal wrote:
           | I ignored them for a long time because of it. I just assumed
           | they were lightweights being annoying because they didn't
           | have anything else.
           | 
           | It serves as anti-marketing, at least to me.
        
           | j-pb wrote:
           | Maybe I'm biased because I'm such a huge Frank McSherry
           | fanboy. The differential dataflow work he does in rust is
           | simply awesome, and he writes great papers too!
           | 
           | Little known fun fact, the rust type- and borrow-checker uses
           | a datalog engine internally to express typing rules, and that
           | engine was written and improved by Frank McSherry. So
           | whenever you hit compile on a rust program, you're using a
           | tiny bit of Materialize tech.
        
         | dwelch2344 wrote:
         | Totally agree with this. "Kafka is terrible if you use it with
         | poor practices" should be the title of this. It's totally
         | clickbait targeting the "Database Inside Out" articles /
         | concept, which if you read the most common 3-page high level it
         | explicitly states you should be materializing up into other
         | data sources / structures / systems / etc
         | 
         | To date, I've never written code that reads from a Kafka topic
         | that wasn't taking data and transforming + materializing it
         | into domain intelligence
        
         | georgewfraser wrote:
         | We're trying to address a real problem that is happening in our
         | industry: VPs of eng and principal engineers at startups are
         | adopting the "Kappa Architecture" / "Turning the Database
         | Inside Out", without realizing how much functionality from
         | traditional database systems they are leaving behind. This has
         | led to a barrage of consistency bugs in everything from food-
         | delivery apps to the "unread message count" in LinkedIn. We're
         | at the peak of the hype cycle for Kafka, and it's being used in
         | all kinds of places it doesn't belong. For 99% of companies, a
         | traditional DBMS is the right foundation.
        
         | arjunnarayan wrote:
         | Hi! I'm one of the two authors here. At Materialize, we're
         | definitely of the 'we are a bunch of voices, we are people
         | rather than corp-speak, and you get our largely unfiltered
         | takes' flavor. This is my (and George's from Fivetran) take. In
         | particular this is not Frank's take, as you attribute below :)
         | 
         | > SQL is declarative, reactive Materialize streams are
         | declarative on a whole new level.
         | 
         | Thank you for the kind words about our tech, I'm flattered!
         | That said, this dream is downstream of Kafka. Most of our
         | quibbles with the Kafka-as-database architecture are to do with
         | the fact that that architecture neglects the work that needs to
         | be done _upstream_ of Kafka.
         | 
         | That work is best done with an OLTP database. Funnily enough,
         | neither of us are building OLTP databases, but this piece
         | largely is a defense of OLTP databases (if you're curious, yes,
         | I'd recommend CockroachDB), and their virtues at that head of
         | the data pipeline.
         | 
         | Kafka has its place - and when its used downstream of CDC from
         | said OLTP database (using, e.g. Debezium), we could not be
         | happier with it (and we say so).
         | 
         | The best example is in foreign key checks. It is not good if
         | you ever need to enforce foreign key checks (which translates
         | to checking a denormalization of your source data
         | _transactionally_ with deciding whether to admit or deny an
         | event). This is something that you may not need in your data
         | pipeline on day 1, but adding that in later is a trivial schema
         | change with an OLTP database, and exceedingly difficult with a
         | Kafka-based event sourced architecture.
         | 
         | > Normally you'd have single writer instances that are locked
         | to the corresponding Kafka partition, which ensure strong
         | transactional guarantees, IF you need them.
         | 
         | This still does not deal with the use-case of needing to add a
         | foreign key check. You'd have to:
         | 
         | 1. Log "intents to write" rather than writes themselves in
         | Topic A 2. Have a separate denormalization computed and kept in
         | a separate Topic B, which can be read from. This
         | denormalization needs to be read until the intent propagates
         | from Topic A. 3. Convert those intents into commits. 4. Deal
         | with all the failure cases in a distributed system, e.g.
         | cleaning up abandoned intents, etc.
         | 
         | If you use an OLTP database, and generate events into Kafka via
         | CDC, you get the best of both worlds. And hopefully, yes, have
         | a reactive declarative stack downstream of that as well!
        
           | skybrian wrote:
           | It seems like Kafka could be used to record requests and
           | responses (upstream of the database) and approved changes
           | (downstream of the database), but isn't good for the approval
           | process itself?
        
           | skyde wrote:
           | > If you use an OLTP database, and generate events into Kafka
           | via CDC, you get the best of both worlds.
           | 
           | 100% agree this is the way to go instead of rolling your own
           | transaction support you get the "ACID" for free from the DB
           | and use KAFKA to archive changes and subscribe to them.
        
           | vvern wrote:
           | > 1. Log "intents to write" rather than writes themselves in
           | Topic A 2. Have a separate denormalization computed and kept
           | in a separate Topic B, which can be read from. This
           | denormalization needs to be read until the intent propagates
           | from Topic A. 3. Convert those intents into commits. 4. Deal
           | with all the failure cases in a distributed system, e.g.
           | cleaning up abandoned intents, etc.
           | 
           | People do do this. I have done this. I wish I had been more
           | principled with the error paths. It got there _eventually_.
           | 
           | It was a lot of code and complexity to ship a feature which
           | in retrospect could have been nearly trivial with a
           | transactional database. I'd say months rather than days. I
           | won't get those years of my life back.
           | 
           | The products were build on top of Kafka, Cassandra, and
           | Elasticsearch where, over time, there was a desire to
           | maintain some amount of referential integrity. The only
           | reason we bought into this architecture at the time was
           | horizontal scalability (not even multi-region). Kafka, sagas,
           | 2PC at the "application layer" can work, but you're going to
           | spend a heck of a lot on engineering.
           | 
           | It was this experience that drove me to Cockroach and I've
           | been spreading the good word ever since.
           | 
           | > If you use an OLTP database, and generate events into Kafka
           | via CDC, you get the best of both worlds.
           | 
           | This is the next chapter in the gospel of the distributed
           | transaction.
        
             | gunnarmorling wrote:
             | >> If you use an OLTP database, and generate events into
             | Kafka via CDC, you get the best of both worlds.
             | 
             | > This is the next chapter in the gospel of the distributed
             | transaction.
             | 
             | Actually, it's the opposite. CDC helps to _avoid_
             | distributed transaction; apps write to a single database
             | only, and other resources (Kafka, other databases, etc.)
             | based on that are updated asychronously, eventually
             | consistent.
        
               | vvern wrote:
               | I mean, I hear you that people do that and it does allow
               | you to avoid needing distributed transactions. When
               | you're stuck with an underlying NoSQL database without
               | transactions, this is the best thing you can do. This is
               | the whole "invert the database" idea that ends up with
               | [1].
               | 
               | I'm arguing that that usage pattern was painful and that
               | if you can have horizontally scalable transaction
               | processing and a stream of events due to committed
               | transactions downstream, you can use that to power
               | correct and easy to reason about materializations as well
               | as asynchronous task processing.
               | 
               | Transact to change state. React to state changes.
               | Prosper.
               | 
               | [1] https://youtu.be/5ZjhNTM8XU8?t=2075
        
       | AndrewKemendo wrote:
       | Alternatively from Jay Krebs [1] a much more thorough and nuanced
       | discussion that is probably the best send-up on this topic.
       | 
       | "So is it crazy to do this? The answer is no, there's nothing
       | crazy about storing data in Kafka: it works well for this because
       | it was designed to do it. Data in Kafka is persisted to disk,
       | checksummed, and replicated for fault tolerance. Accumulating
       | more stored data doesn't make it slower. There are Kafka clusters
       | running in production with over a petabyte of stored data."
       | 
       | [1] https://www.confluent.io/blog/okay-store-data-apache-kafka/
        
         | pram wrote:
         | Log compaction definitely isn't problem free. I'd say it was
         | crazy when he wrote that.
         | 
         | We ran a cluster with lots of compacted topics, hundreds of
         | terabytes of data. At the time it would make broker startup
         | insanely slow. An unclean startup could literally take _an
         | hour_ to go through all the compacted partitions. It was awful.
        
         | netn3twebw3b wrote:
         | Agreed. I feel like the tech community is afflicted with
         | collective functional fixedness, or some sort of essentialism.
         | 
         | At it's core it's electron state in hardware. So long as those
         | limits are not incidentally exceeded, and you validate outputs,
         | who really cares what gets loaded?
         | 
         | While we rip rare minerals from the ground and toss all that at
         | scale every 3-5 years later we get economical over installing
         | software.
         | 
         | So long as it offers the necessary order of operations to do
         | the work, whatever.
         | 
         | https://en.m.wikipedia.org/wiki/Functional_fixedness
        
         | hodgesrm wrote:
         | Thanks, this is a great article. The money quote for me is:
         | 
         | > I think it makes more sense to think of your datacenter as a
         | giant database, in that database Kafka is the commit log, and
         | these various storage systems are kinds of derived indexes or
         | views.
         | 
         | My company supports analytic systems and we see this pattern
         | constantly. It's also sort of a Pat Helland view of the world
         | that subsumes a large fraction of data management under a
         | relatively simple idea. [1] What's interesting is that Pat also
         | sees it as a way to avoid sticky coordination problems as well.
         | 
         | [1] http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf
        
           | skyde wrote:
           | The problem is that Kafka API and Commit log API are very
           | different.
           | 
           | If you wanted to literally use Kafka for your commit log the
           | same way the Amazon aurora are using a distributed commit
           | log. You would find that a lot of feature a commit log need
           | are missing and impossible to add to kafka.
        
         | georgewfraser wrote:
         | That post explains that there are scenarios where it makes
         | sense to store data permanently in Kafka. "Kafka is Not a
         | Database" makes a different point, which is that Kafka doesn't
         | solve any of the hard transaction-processing problems that
         | database systems do, so it's not an alternative to a
         | traditional DBMS. This is not a straw man---Confluent advocates
         | for "turning the database inside out" all over their marketing
         | materials and conference talks.
        
           | cfontes wrote:
           | It actually does with |Exactly-once Semantics| in fact I've
           | been using as the single source of truth in a cash management
           | system for almost 2 years without a single issue related to
           | transactions.
        
             | jashmatthews wrote:
             | How do you deal with side effects outside of the Kafka
             | cluster?
        
               | theptip wrote:
               | Short answer: you write your consumer's state into the
               | same DB as you're writing the side-effects to, in the
               | same transaction.
               | 
               | Long answer: say your consumer is a service with a SQL DB
               | -- if you want to process Event(offset=123), you need to
               | 1. start a transaction, 2. write a record in your DB
               | logging that you've consumed offset=123, 3. write your
               | data for the side-effect, 4. commit your transaction.
               | (Reverse 2 and 3 if you prefer; it shouldn't make a
               | difference). If your side-effect write fails (say your DB
               | goes down) then your transaction will be broken, your
               | side-effect won't be readable outside the transaction,
               | and the update to the consumer offset pointer also won't
               | get persisted. Next loop around on your consumer's event
               | loop, you'll start at the same offset and retry the same
               | transaction.
        
               | arafsheikh wrote:
               | Persisting message offsets in DB has its own challenges.
               | The apps become tightly coupled with a specific Kafka
               | cluster and that makes it difficult to swap clusters in
               | case of a failover event.
               | 
               | If you expect apps to persist offsets then it's important
               | to have a mechanism/process to safely reset the app state
               | in DB when the stored offset doesn't make sense.
        
               | wdb wrote:
               | Interesting, do you have any resources about how to best
               | handle side effects with message queues (e.g. GCP
               | PubSub)? Trying to find out if its worth the effort, or
               | good practice to allow replay-ability (like from backup)
               | of message and get back to the same state
        
         | andrewem wrote:
         | *Kreps, not Krebs
        
         | gopalv wrote:
         | > Accumulating more stored data doesn't make it slower
         | 
         | That is a valid theory when we talk about readers which look at
         | recent data or when you are trying to append data to the
         | existing system.
         | 
         | But in practice, the accumulation of cold data on a local disk
         | is where this starts to hurt, particularly if that has to serve
         | read traffic which starts from the beginning of time (i.e your
         | queries don't start with a timestamp range).
         | 
         | KSQL transforms does help reduce the depth of the traversal, by
         | building flatter versions of the data set, but you need to
         | repartition the same data on every lookup key you want - so if
         | you had a video game log trace, you'd need multiple
         | materializations for (user) , (user,game), (game) etc.
         | 
         | And on this local storage part, EBS is expensive to just hold
         | cold data, but then replicate it to maintain availability
         | during a node recovery - EBS is like a 1.5x redundant store,
         | better than a single node. I liked the Druid segment model of
         | shoving it off to S3 and still being to read off it (i.e not
         | just stream to S3 as a dumping ground).
         | 
         | When Pravega came out, I liked it a lot for the same - but it
         | hasn't gained enough traction.
        
         | halbritt wrote:
         | I recommend his book, "I heart logs". It's a short read, but
         | changed my perspective.
        
       | tutfbhuf wrote:
       | Well, then you have never heard of ksqlDB. It adds SQL and DB
       | features to Kafka. It is backed by Confluent (LinkedIn) same
       | company that developed Kafka initially.
       | 
       | https://ksqldb.io
        
         | albertwang wrote:
         | We're very aware of ksqlDB. I would recommend this video from
         | last week where Matthias talks about some of the strengths and
         | weaknesses of ksql: https://youtu.be/KUQuegJ4do8
        
         | cirego wrote:
         | My understanding is that ksqlDB is a read-only interface on top
         | of streams and only helps people write better consumers. The
         | problems mentioned in the blog post relate to producers.
        
       | zaphar wrote:
       | If it stores data it's a database. Filesystems are databases,
       | MongoDB is a database. LevelDB is a database. Postgres and MySQL
       | are databases. Kafka is a database. They are all very different
       | in features and functionality though.
       | 
       | What the authors mean is that kafka is not a traditional database
       | and doesn't solve the same problems that traditional databases
       | solve. Which is a useful distinction to make but is not the
       | distinction they make.
       | 
       | The reality is that database is now a very general term and for
       | many usecases you can choose to special purpose databases for
       | what you need.
        
         | yumaikas wrote:
         | I think I'd differentiate between a database and a data store.
         | 
         | I'd argue that a filesystem is a data store, rather than a
         | database.
        
           | Hamuko wrote:
           | Some say that Microsoft Excel is the world's most popular
           | database engine.
        
             | Baeocystin wrote:
             | Honestly, I'd agree with that, too, at least in a
             | colloquial sense.
        
           | zaphar wrote:
           | What is your distinction? Is mongodb a database? What about
           | leveldb?
           | 
           | Whether we want it to be so or not the term database is much
           | more encompassing thank it used to be. You can try to fight
           | that change if you want to but it means you'll be speaking a
           | different language that most of the rest of us as a result.
        
             | IfOnlyYouKnew wrote:
             | You're using the term "traditional database" two levels up.
             | Whatever your definition of that is, it's their definition
             | of "database".
             | 
             | Your initial post says "I have a different definition of
             | 'database'", which is different than anyone else's, as you
             | acknowledge when you refer to that common definition as a
             | 'traditional database'.
             | 
             | Then, you fault them for making an argument that is invalid
             | for your non-standard definition of a database!
             | 
             | You would like Kafka, the author.
        
             | yumaikas wrote:
             | Honestly, where I'd draw the distinction between a database
             | and a datastore is the ability to do multi-statement
             | transactions. So, a filesystem? A datastore. Some support
             | fsync on a single file, but multi-file sync isn't usually
             | supported.
             | 
             | MongoDB and LevelDB both support transactions, so I'd err
             | on calling them databases myself.
        
           | edgyquant wrote:
           | I'd say initially file systems were data stores but once they
           | developed hierarchies they became more akin to a database.
           | I'm not sure there's a huge difference but it seems a
           | database is a collection of data stores (though there or
           | probably a more technical and correct definition.)
        
       ___________________________________________________________________
       (page generated 2020-12-08 23:00 UTC)