[HN Gopher] Materialize Raises a $32M Series B ___________________________________________________________________ Materialize Raises a $32M Series B Author : austinbirch Score : 156 points Date : 2020-12-02 15:55 UTC (7 hours ago) (HTM) web link (materialize.com) (TXT) w3m dump (materialize.com) | jstrong wrote: | congrats to Frank McSherry and the rest of the materialized team! | very impressed by your project. | npiit wrote: | I wonder if BSL becomes the new standard for open source | commercial products. It's a good trade-off between freedom and | real world business pressure. | nickstinemates wrote: | Doubt it. Lots of aversion to the license given its limited use | and some ambiguous terms/education around the various windows. | npiit wrote: | I think it can be, I know a few other potentially successful | examples like CockroachDB and ZeroTier. The BSL license makes | the entire project basically FOSS for you and me, but not for | the big sharks. Which I guess is much better for the world | compared to open-core and of course proprietary SaaS. | [deleted] | mrits wrote: | The headline refers to "incrementally updated materialize views". | How does a company get funding for a feature that has already | existed in other DBs for at least a decade? | | E.g, Vertica refers to this as Live Aggregate Projections. | | It's a cool concept but comes with huge caveats. Keeping tracking | of non-estimated cardinality for COUNT DISTINCT type queries, as | an example. | benesch wrote: | (Disclaimer: I'm one of the engineers at Materialize.) | | > How does a company get funding for a feature that has already | existed in other DBs for at least a decade? ... It's a cool | concept but comes with huge caveats. | | I think you answered your own question here. Incrementally- | maintained views in existing database systems typically come | with huge caveats. In Materialize, they largely don't. | | Most other systems place severe restrictions on the kind of | queries that can be incrementally maintained, limiting the | queries to certain functions only, or aggregations only, or | only queries without joins--or if they do support maintaining | joins, often the joins must occur only on the involved tables' | keys. In Materialize, by contrast, there are approximately no | such restrictions. Want to incrementally-maintain a five-way | join where some of the join keys are expressions, not key | columns? No problem. | | That's not to say there aren't _some_ caveats. We don 't yet | have a good story for incrementally-maintaining queries that | observe the current wall-clock time [0]. And our query | optimizer is still young (optimization of streaming queries is | a rather open research problem), so for some more complicated | queries you may not get the resource utilization you want out | of the box. | | But, for many queries of impressive complexity, Materialize can | incrementally-maintain results far faster than competing | products--if those products can incrementally maintain those | queries at all. | | The technology that makes Materialize special, in our opinion, | is a novel incremental-compute framework called differential | dataflow. There was an extensive HN discussion on the subject a | while back that you might be interested in [1]. | | [0]: https://github.com/MaterializeInc/materialize/issues/2439 | | [1]: https://news.ycombinator.com/item?id=22359769 | Fede_V wrote: | This is one of my favorite types of HN comments: admits the | bias upfront, offers a meaningful technical answer, and links | to relevant documents for a deeper dive. Thank you so much! | mrits wrote: | Thanks for the explanation. I'm going to look more into this | as I'm working on a new service on top of Vertica. There is a | lot I don't like about Vertica and don't see alternatives | such as Snowflake to be much of an improvement. | jamesblonde wrote: | What about the other big problem ignored here: does your | streaming platform separate compute and storage? | | Because GCP DataFlow does. Flink doesn't. DataFlow allows you | to elastically scale the compute you need (Snowflake, | Databricks). If you can't do that, materialized views will be | a more niche feature for bigger 24x7 deployments with | predictable workflows. | albertwang wrote: | As George points out above, we haven't added our native | persistence layer yet. Consistency guarantees are something | we care a lot, so for many scenarios, we leverage the | upstream datastore (often Kafka). | | But to answer your question, yes, our intention is to | support separate cloud-native storage layers. | jacques_chester wrote: | My dim and distant recollection is that Beam and/or GCP | Data Flow require someone to implement PCollections and | PTransforms to get the benefit of that magic. That's not a | trivial exercise, compared to writing SQL. | frankmcsherry wrote: | Hi, I work at Materialize. | | You can read about Vertica's "Live Aggregate Projections" here: | | https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/An... | | In particular, there are important constraints like (among | others) | | > The projections can reference only one table. | | In Materialize you can spin up just about any SQL92 query, join | eight relations together, have correlated subqueries, count | distinct if you want. It is then all maintained incrementally. | | The lack of caveats is the main difference from the existing | systems. | jacques_chester wrote: | > _The headline refers to "incrementally updated materialize | views". How does a company get funding for a feature that has | already existed in other DBs for at least a decade?_ | | They're getting funding for doing it _much_ more efficiently. | | I read into the background papers when it first popped up. This | is legitimate, deep computer science that other DBs don't yet | have. | hnmullany wrote: | Materialize is the real deal - completely different | architecture under the hood. Origin project is Timely Dataflow | & Naiad. | | https://docs.rs/timely/0.11.1/timely/ | acjohnson55 wrote: | I'm so psyched about Materialize. | | An old coworker explained to me about how his previous company | used DBT to create many different projections of messy data to | serve many applications, rather than trying to come up with the | One Canonical Representation. It truly blew my mind in terms of | thinking about how to model data within a business. | | The huge limitation with this vision is that it only works in | places where you can tolerate some pretty significant staleness. | So the promise of this approach excludes most OLTP applications. | I simply assumed it wouldn't be reasonable to create something | that allows for unconstrained SQL-based transformations in real | time, and that no one was working on this. Oh well. | | But several months back, I discovered Materialize and it was an | "oh shit" moment. Someone was actually doing this, and in a very | first principles-driven approach. I'm really excited for how this | project evolves. | coinwitcher wrote: | This is interesting given what AWS just announced (AWS Glue | Elastic Views): | | https://news.ycombinator.com/item?id=25267734 | georgewfraser wrote: | Materialize has tackled the hardest problem in data warehousing, | materialized views, which has _never really been solved_ , and | built a solution on a completely new architecture. This solution | is useful by itself, but I'm also watching eagerly how their road | map [1] plays out, as they go back and build out features like | persistence and start to look more like a full-fledged data | warehouse, but one with the first correct implementation of | materialized views. | | [1] https://materialize.com/blog-roadmap/ | adamnemecek wrote: | How were previous implementations of materialized views | deficient? | ahupp wrote: | Here's a nice writeup of Materialize: | | https://lucperkins.dev/blog/new-db-tech-1/#materialize | | Not really mentioned here, but in standard postgres it might | be quite expensive to update the view so you can only do it | periodically. Materialize keeps that up-to-date continuously. | georgewfraser wrote: | Joins were unavailable or subject to extreme limitations. Or | just plain wrong! | [deleted] | jameslk wrote: | Isn't this pretty similar to what Dremio does? | benesch wrote: | Dremio is a batch processor, not a stream processor. The | fundamental difference is that a batch processor will need to | recompute a query from scratch whenever the input data | changes, while a stream processor can incrementally update | the existing query result based on the change to the input. | | This can make a huge difference when making small changes to | large datasets. Materialize can incrementally compute small | changes to very complicated queries in just a few | milliseconds, while with batch processors you're looking at | latency in the hundreds of milliseconds, seconds, or minutes, | depending on the size of the data. | | Another way of looking at it is that in batch processors, | latency scales with the size of the total data, while in | stream processors, latency scales with the size of the | _updates_ to the data. | jameslk wrote: | I see, thank you for the explanation! | btown wrote: | For a primer on materialized views, and one of the key | rationales for Materialize's existence, there's no better | presentation than Martin Kleppman's "Turning the Database | Inside-Out" (2015). (At my company it's required viewing for | engineers across our stack, because every data structure _is_ a | materialized view no matter where on frontend or backend that | data structure lives.) | | https://www.confluent.io/blog/turning-the-database-inside-ou... | | Confluent is building an incredible business helping companies | to build these types of systems on top of Kafka, Samza, and | architectural principles originally developed at LinkedIn, but | more along the lines of "if you'd like this query to be | answered, or this recommender system to be deployed for every | user, we can reliably code a data pipeline to do so at LinkedIn | scale" than "you can run this query right away against our OLAP | warehouse without knowing about distributed systems." (If it's | more nuanced than this please correct me!) | | On the other hand, Materialize could allow businesses to | realize this architecture, with its vast benefits to | millisecond-scale data freshness and analytical flexibility, | simply by writing SQL queries as if it was a traditional | system. As its capabilities expand beyond parity with SQL | (though I agree that's absolutely the best place for them to | start and optimize), there are tremendous wins here that could | power the next generation of real-time systems. | | EDIT: some clarifications and additional examples | Liron wrote: | I also wrote a primer for why the world needs Materialize | [1]. It had a big discussion on HN [2], and Materialize's | cofounder said it was part of his motivation [3]. | | [1] https://medium.com/@lironshapira/data-denormalization-is- | bro... | | [2] https://news.ycombinator.com/item?id=12613586 | | [3] | https://twitter.com/narayanarjun/status/1241450203095465986 | quodlibetor wrote: | Ha! Your blog post was one of the reasons that I trusted in | the future of Materialize enough to decide to work here! | | I agree, that is exactly the problem that I, in particular, | think we are solving. | dataplayer wrote: | What exactly are "materialized views"? | jacques_chester wrote: | Suppose you have normalized your data schema, up to at least | 3NF, perhaps even further up to 4NF, 5NF or (as Codd | intended) BCNF. | | Great! You are now largely liberated from introducing many | kinds of anomaly at insertion time. And you'll often only | need to write once for each datum (modulo implementation | details like write amplification), because a normalised | schema has "a place for everything and everything in its | place". | | Now comes time to query the data. You write some joins, and | all is well. But a few things start to happen. One is that | writing joins over and over becomes laborious. What you'd | really like is some denormalised intermediary views, which | transform the fully-normalised base schema into something | that's more convenient to query. You can also use this to | create an isolation layer between the base schema and any | consumers, which will make future schema changes easier and | possibly improve security. | | The logical endpoint of doing so is the Data Warehouse | (particularly in the Kimball/star schema/dimensional | modelling style). You project your normalised data, which you | have high confidence in, into a completely different shape | that is optimised for fast summarisation and exploration. You | use this as a read-only database, because it massively | duplicates a lot of information that could otherwise have | been derived via query (for example, instead of a single | "date" field, you have fields for day of week, day of month, | day of year, week of year, whether it's a holiday ... I've | built tables which include columns like "days until major | conference X" and "days since last quarterly release"). | | Now we reach the first problem. It's too slow! Projecting | that data from the normalised schema requires a lot of | storage and compute. You realise after some scratching that | your goal all along was to pay that cost upfront so that you | can reap the benefits at query time. What you want is a | _view_ that has the physical characteristics of a _table_. | Meaning you want to write out the results of the query, but | still treat it like a view. You 've "materialized" the view. | | Now the second problem. Who, or what, does that projection? | Right now that role is filled by ETL, "Extract, Transform and | Load". Extract from the normalised system, transform it into | the denormalised version, then load that into a data | warehouse. Most places do this on a regular cadence, such as | nightly, because it just takes buckets and buckets of work to | regenerate the output every time. | | Now enters Materialize, who have a secret weapon: timely | dataflow. The basic outcome is that instead of re-running an | _entire view query_ to regenerate the materialized view, they | can, from a given datum, determine exactly what will change | in the materialized view and _only_ update that. That makes | such views potentially thousands of times cheaper. You could | even run the normalised schema and the denormalised | projections on the same physical set of data -- no need for | the overhead and complexity of ETL, no need to run two | database systems, no need to _wait_ (without the added | complexity of a full streaming platform). | gen220 wrote: | That's a great description! Does materialize describe how | they implement timely dataflow? | | At my current company, we have built some systems like | this. Where a downstream table is essentially a function of | a dozen upstream tables. | | Whenever one of the upstream tables changes, it's primary | key is published to a queue, some worker translates this | upstream primary key into a set of downstream primary keys, | and publishes these downstream primary keys to a compacted | queue. | | The compacted queue is read by another worker, that | "recomputes" each dirty key, one-at-a-time, which involves | fetching the latest-and-greatest version of each upstream | table. | | This last worker is the bottleneck, but it's optimized by | per-key caching, so we only fetch the latest-and-greatest | version once per update. It can also be safely and | arbitrarily parallelized, since the stream they read from | is partitioned on key. | albertwang wrote: | Here's a 15-minute introduction to Timely Dataflow by | Frank, our co-founder: | https://www.youtube.com/watch?v=yOnPmVf4YWo | scott_s wrote: | > Does materialize describe how they implement timely | dataflow? | | It's open source | (https://github.com/TimelyDataflow/timely-dataflow), and | also extensively written about both in academic research | papers and documentation for the project itself. The | GitHub repo has pointers to all of that. See also | differential dataflow | (https://github.com/timelydataflow/differential- | dataflow). | ako wrote: | It's a query of which you save the results in a cache | database table, so next time when it is queried, you can | provide the results from the cache. | | Typically, in a traditional RDBMS, the query is defined as a | sql view, which you either have to manually refresh, or can | be refreshed periodically. | | Using streaming systems like kafka, it's possible to | continously update the cached results based in the incoming | data, so the result is a near realtime up to date query | result. | | Writing the stream processing to update the materialized view | can be complex, using SQL like materialize enables you to do, | makes it a lot more productive. | derefr wrote: | Let's start with views. A database view is a "stored query" | that presents itself as a table, that you can further query | against. | | If you have a view "bar": CREATE VIEW bar | AS $$ SELECT x * 2 AS a, y + 1 AS b FROM foo | $$ | | and then you `SELECT a FROM bar`, then the "question" you're | really asking is just: SELECT a FROM | (SELECT x * 2 AS a, y + 1 AS b FROM foo) | | -- which, with efficient query planning, boils down to | SELECT x * 2 AS a FROM foo | | It's especially important to note that the `y + 1` expression | from the view definition isn't computed in this query. The | inner query from the view isn't "compiled" -- forced to be in | some shape -- but rather sits there in symbolic form, | "pasted" into your query, where the query planner can then | manipulate and optimize/streamline it further, to suit the | needs of the outer query. | | ----- | | To _materialize_ something is to turn it from symbolic- | expression form, into "hard" data -- a result-set of in- | memory row-tuples. Materialization is the "enumeration" in a | Streams abstraction, or the "thunk" in a lazy-evaluation | language. It's the master screw that forces all the activity | dependent on it -- that would otherwise stay abstract -- to | "really happen." | | Databases _don 't_ materialize anything unless they're forced | to. If you do a query like SELECT false | FROM (SELECT * FROM foo WHERE x = 1) | | ...no work (especially no IO) actually happens, because no | data from the inner query needs to be _materialized_ to | resolve the outer query. | | Streaming data out of the DB to the user requires | serialization [= putting the data in a certain wire format], | and serialization requires materialization [= having the data | available in memory in order to read and re-format it.] So | whatever final shape the data returned from your outermost | query has when it "leaves" the DB, _that_ data will always | get materialized. But other processes internal to the DB may | sometimes require data to be materialized as well. | | Materialization is costly -- it's usually the only thing | forcing the DB to actually read the data on disk, for any | columns it wasn't filtering by. Many of the optimizations in | RDBMSes -- like the elimination of that `y + 1` above -- have | the goal of avoiding materialization, and the disk-reads / | memory allocations / etc. that materialization requires. | | ----- | | Those definitions out of the way, a "materialized view" is | something that acts similar to a view (i.e. is constructed in | terms of a stored query, and presents itself as a queriable | table) but which -- unlike a regular view -- has been pre- | materialized. The query for a matview is still stored, but at | some point in advance of querying, the RDBMS actually _runs_ | that query, fully materializes the result-set from it, and | then caches it. | | So, basically, a materialized view is a view with a cached | result-set. | | Like any cache, this result-set cache increases read-time | efficiency in the case where the original computation was | costly. (There's no point in "upgrading" a view into a | matview if your queries against the plain view were already | cheap enough for your needs.) | | But like any cache, it needs to be maintained, and can become | out-of-sync with its source. | | Although materialized views are part of the SQL standard, not | all SQL RDBMSes implement them. MySQL/MariaDB does not, for | example. (Which is why you'll find that much of the software | world just pretends matviews don't exist when designing their | DB architectures. If it ever needs to run on MySQL, it can't | use matviews.) | | The naive approach that some other RDBMSes (e.g. Postgres) | take to materialized views, is to only offer manual, full- | pass recalculation of the cached result-set, via some | explicit command (`REFRESH MATERIALIZED VIEW foo`). This | works with "small data"; but at scale, this approach can be | so time-consuming for large and complex backing queries, that | by the time cache is rebuilt, it's already out-of-date again! | | Because there are RDBMSes that either don't have matviews, or | don't have _scalable_ matviews, many application developers | just avoid the RDBMS 's built in matview abstraction, and | build their own. Thus, another large swathe of the world's | database architecture either will use cron-jobs to regular | run+materialize a query, and then dump its results back into | a table in the same DB; or it will define on- | INSERT/UPDATE/DELETE triggers on "primary" tables, that | transform and upsert data into "secondary" denormalized | tables. These are both approaches to "simulating" matviews, | portably, on an RDBMS substrate that isn't guaranteed to have | them. | | Other RDBMSes (e.g. Oracle, SQL Server, etc.) _do_ have | scalable materialized views, a.k.a. "incrementally | materialized" views. These work less like a view with a | cache, and more like a secondary table with write-triggers on | primary tables to populate it -- but all handled under-the- | covers by the RDBMS itself. You just define the matview, and | the RDBMS sees the data-dependencies and sets up the write- | through data flow. | | Incrementally-materialized views are great for what they're | designed for (reporting, mostly); but they aren't intended to | be the bedrock for an entire architecture. Building matviews | on top of matviews on top of matviews gets expensive fast, | because even fancy enterprise RDBMSes like Oracle don't | _realize_ , when populating table X, that writing to X will | in turn write to matview Y, which will in turn "fan out" to | matviews {A,B,C,D}, etc. These RDBMS's matviews were never | intended to support complex "dataflow graphs" of updates like | this, and so there's too much overhead (e.g. read-write | contention on index locks) to actually make these setups | practical. And it's very hard for these DBMSes to change | this, as their matviews' caches are fundamentally reliant on | _database table_ storage engines, which just aren 't the | right ADT to hold data with this sort of lifecycle. | | ----- | | Materialize is an "RDBMS" (though it's not, really) | engineered from the ground up to make these sorts of dataflow | graphs of matviews-on-matviews-on-matviews practical, by | doing its caching completely differently. | | Materialize looks like a SQL RDBMS from the outside, but | Materialize _is not_ a database -- not really. (Materialize | has no tables. You can 't "put" data in it!) Instead, | Materialize is a data _streaming_ platform, that caches any | intermediate materialized data it 's forced to construct | during the streaming process, so that other consumers can | work off those same intermediate representations, without | recomputing the data. | | If you've ever worked with Akka's Streams, or Elixir's Flows, | or for that matter with Apache Beam (nee Google Dataflow), | Materalize is that same kind of pipeline. But where all the | plumbing work of creating intermediate representations -- | normally a procedural map/reduce/partition kind of thing -- | is done by defining SQL matviews; and where the final output | isn't a fixed output of the pipeline, but rather comes from | running an arbitrary SQL query against any arbitrary matview | defined in the system. | dragonwriter wrote: | > Most RDBMSes (e.g. Postgres) only offer manual (`REFRESH | MATERIALIZED VIEW foo`) full-pass recalculation of the | cached result-set for matviews. | | "Most" here seems very much wrong, at least of major | products: Oracle has an option for on-commit (rather than | manual) and incremental/incremental-if-possible | (FAST/FORCED) refresh, so it is limited to neither only- | manual nor only-full-pass recalculation. SQL Server indexed | views (their matview solution) are automatically | incrementally updated as base tables change, they don't | even have an option for manual full-pass recalculation, | AFAICT. DB2 materialized query tables (their matview | solution) have an option for immediate (on-commit) refresh | (not sure if the algo here is always full-pass, but its at | a minimum not always manual.) Firebird and MySQL/MariaDB | don't have any support for materialized views at all | (though of course you can manually simulate them with | additional tables updated by triggers.) Postgres seems to | be the only major RDBMS with both material view _support_ | and the limitation of only on-demand full-pass | recalculation of matviews (for that matter, except maybe | DB2 having the full-pass limitation, it seems to be the | only one with _either_ the only-manual _or_ only-full-pass | limitation.) | jacques_chester wrote: | I think that it's true that many databases offer | incremental updates and it's incorrect to say that manual | refreshes were the state of the art. | | The important point is that Materialize can do it for | almost any query, very efficiently, compared to existing | options. That opens a lot of possibilities. | dragonwriter wrote: | > The important point is that Materialize can do it for | almost any query, very efficiently, compared to existing | options. That opens a lot of possibilities. | | Yes, this does seem like a very big deal. | derefr wrote: | You're right; I updated my comment. | dragonwriter wrote: | That was a fantastic and illuminating update, thank you. | jacques_chester wrote: | This is an outstanding explanation. Much better than mine. | bluejekyll wrote: | "In computing, a materialized view is a database object that | contains the results of a query. For example, it may be a | local copy of data located remotely, or may be a subset of | the rows and/or columns of a table or join result, or may be | a summary using an aggregate function." | | https://en.m.wikipedia.org/wiki/Materialized_view | findjashua wrote: | updated results of a query - eg if you do some aggregation or | filtering on a table, or join two tables, or anything of the | sort - materialized view will give you the updated results of | the query in a separate table | temuze wrote: | I'm glad more people are tackling this problem. There still isn't | a good solution to real-time aggregation data at large scale. | | At a previous company, we dealt with huge data streams (~1TB data | / minute) and our customers expected real-time aggregations. | | Making an in-house solution for this was incredibly difficult | because each customer's data differed wildly. For example: | | - Customer A's shards might have so much cardinality where memory | becomes an issue. | | - Customer B's shards might have so much throughput where CPU | becomes a constraint. Sometimes a single aggregation may have so | much throughput where you need to artificially increase the | cardinality and aggregate the aggregations! | | This makes the optimal sharding strategy very complex. Ideally, | you want to bin-pack memory-constrained aggregations with CPU- | constrained aggregations. In my opinion, the ideal approach | involves detecting the cardinality of each shard and bin-packing | them. | jstrong wrote: | I've always found that when you are solving a concrete problem, | like you were, it's vastly easier than the case of a general- | purpose database because you can make all the tradeoffs that | benefit your exact use case. but it sounds like that's not what | you experienced. was it just how heterogeneous the clients' | needs were? I guess what I'm saying is, if you are capable of | handling 1TB/minute, seems like you're plenty able to and would | want to be designing the system yourself - but interested what | I'm missing about this. | mwcampbell wrote: | > All of this comes in a single binary that is easy to install, | easy to use, and easy to deploy. | | And it looks like they chose a sensible license for that binary | [1], so they're not giving too much away. | | I wonder though if they could have made this work as a | bootstrapped business, so they would answer only to customers, | not to investors chasing growth at all costs. | | [1]: https://materialize.com/download/ | offtop5 wrote: | Bootstrapping is fun until you can't make payroll. | | If your goal is an exit, and you can raise this much, why not. | adamnemecek wrote: | This is a big win for Rust. | [deleted] | haggy wrote: | Can you point me at documentation for the fault tolerance of the | system? A huge issue for streaming systems (and largely unsolved | AFAIK) is being able to guarantee that counts aren't duplicated | when things fail. How does Materialize handle the relevant | failure scenarios in order to prevent inaccurate counts/sums/etc? | [deleted] | jgraettinger1 wrote: | This is a solved problem, for a few years now. The basic trick | is to publish "pending" messages to the broker which are ACK'd | by a later written message, only after the transaction and all | it's effects have been committed to stable storage (somewhere). | Meanwhile, you also capture consumption state (e.x. offsets) | into the same database and transaction within which you're | updating the materialization results of a streaming | computation. | | Here's [1] a nice blog post from the Kafka folks on how they | approached it. | | Gazette [2] (I'm the primary architect) also solves in with | some different trade-offs: a "thicker" client, but with no | head-of-line blocking and reduced end-to-end latency. | | Estuary Flow [3], built on Gazette, leverages this to provide | exactly-once, incremental map/reduce and materializations into | arbitrary databases. | | [1]: https://www.confluent.io/blog/exactly-once-semantics-are- | pos... | | [2]: https://gazette.readthedocs.io/en/latest/architecture- | exactl... | | [3]: https://estuary.readthedocs.io/en/latest/README.html | haggy wrote: | Interesting! I'm going to read into the info you linked. | Thanks for the info! | frankmcsherry wrote: | Hi! I work at Materialize. | | I think the right starter take is that Materialize is a | deterministic compute engine, one that relies on other | infrastructure to act as the source of truth for your data. It | can pull data out of your RDBMS's binlog, out of Debezium | events you've put in to Kafka, out of local files, etc. | | On failure and restart, Materialize leans on the ability to | return to the assumed source of truth, again a RDBMS + CDC or | perhaps Kafka. I don't recommend thinking about Materialize as | a place to sink your streaming events _at the moment_ (there is | movement in that direction, because the operational overhead of | things like Kafka is real). | | The main difference is that unlike an OLTP system, Materialize | doesn't have to make and persist non-deterministic choices | about e.g. which transactions commit and which do not. That | makes fault-tolerance a _performance_ feature rather than a | _correctness_ feature, at which point there are a few other | options as well (e.g. active-active). | | Hope this helps! | [deleted] | beoberha wrote: | Late to the post, but if anyone wants a good primer on | Materialize (beyond what their actual engineers and a cofounder | are saying in the comments), check out the Materialize Quarantine | Database Lecture: https://db.cs.cmu.edu/events/db-seminar- | spring-2020-db-group... | mavelikara wrote: | The actual talk seems to be here: | https://www.youtube.com/watch?v=9XTg09W5USM | pgt wrote: | Materialize can help us manifest The Web After Tomorrow [^1]. | | My previous comments persuading you why DDF is so crucial to the | future of the Web: | | > "There is a big upset coming in the UX world as we converge | toward a generalized implementation of the "diff & patch" pattern | which underlies Git, React, compiler optimization, scene | rendering, and query optimization." -- | https://news.ycombinator.com/item?id=21683385 also with links to | prior art like Adapton and Incremental. | | > "DD (Differential Dataflow) is commercialized in Materialize" | -- https://news.ycombinator.com/item?id=24846119 | | > "Materialize exists to efficiently solve the view maintenance | problem" https://news.ycombinator.com/item?id=22888396 | [^1]: https://tonsky.me/blog/the-web-after-tomorrow/ | cocoflunchy wrote: | Thanks for this, I'm glad to see I'm not the only one tired of | writing everything twice (once in the frontend and once in the | backend). I'll revisit the links later. ___________________________________________________________________ (page generated 2020-12-02 23:00 UTC)