[HN Gopher] InfluxDB is betting on Rust and Apache Arrow for nex... ___________________________________________________________________ InfluxDB is betting on Rust and Apache Arrow for next-gen data store Author : mhall119 Score : 105 points Date : 2020-11-10 18:26 UTC (4 hours ago) (HTM) web link (www.influxdata.com) (TXT) w3m dump (www.influxdata.com) | thamer wrote: | I'm using Apache Arrow to store CSV-like data and it works | amazingly well. | | The datasets I work with contains a few billion records with 5-10 | fields each, almost all 64-bit longs; I originally started with | CSV and soon switched to a binary format with fixed-sized records | (mmap'd) which gave great performance improvements, but the | flexibility, size gains due to columnar compression and the much | greater performance of Arrow for queries that span a single | column or a small number of them won me over. | | For anyone who has to process even a few million records locally, | I would highly recommend it. | stevesimmons wrote: | Arrow + Parquet is brilliant! | | Right now I'm writing tools in Python (Python!) to analyse | several 100TB datasets in S3. Each dataset is made up of 1000+ | 6GB parquet files (tables UNLOADed from AWS Redshift db). | Parquet's columnar compression gives a 15x reduction in on-disk | size. Parquet also stores chunk metadata at the end of each | file, allowing reads to skip over most data that isn't | relevant. | | And once in memory, the Arrow format gives zero-copy | compatibility with Numpy and Pandas. | | If you try this with Python, make sure you use the latest 2.0.0 | version of PyArrow [1]. Two other interesting libraries for | manipulating PyArrow Tables and ChunkedArrays are fletcher [2] | and graphique[3]. | | [1] I use: conda install -c conda-forge pyarrow python-snappy | | [2] https://github.com/xhochy/fletcher | | [3] https://github.com/coady/graphique | pauldix wrote: | Yeah, Parquet is awesome. One of the things we really want to | do here is to push DataFusion (the Rust based SQL execution | engine) to work on Parquet files, but to push down predicates | and other things and operate on the data while it's | compressed. | | You pay such a high overhead marshalling that data into an | Arrow RecordBatch. Best thing ever is to work with the | Parquet file and not even decompress the chunks that you | don't need. Of course, this assumes that you're writing | summary statistics as part of the metadata, which we plan to | do. | nevi-me wrote: | There's rudimentary statistics support, but I've found that | it accounts for a lot of the write time (I wrote some tests | last weekend, I can ping your team when I put them on GH). | | Improving our stats writing could yield a lot of benefits. | I'll open JIRAs for this in the next few days. | gizmodo59 wrote: | Parquet + Arrow reminds me of a fast SQL engine on data lake | called Dremio. | | https://www.dremio.com/webinars/apache-arrow-calcite- | parquet... | | They also have an OSS version in GitHub. | pauldix wrote: | They're heavily into Arrow. A few years ago they | contributed Gandiva, an LLVM expression compiler for super | fast processing. | https://arrow.apache.org/blog/2018/12/05/gandiva-donation/ | | It's one of the reasons I like being all in on Arrow. Why | do everything ourselves when a ton of other smart people | are working on this too? | nevi-me wrote: | Talking about Gandiva, something that's open for taking: | https://issues.apache.org/jira/browse/ARROW-5315 | (creating Gandiva bindings for Rust). | | I think DataFusion is mature enough that we could plug in | Gandiva into it. | | Disclaimer: I work on the Arrow Rust implementation | pauldix wrote: | Ah yes, of course, hi Nevi :). Thank you again for all | your work on the Rust implementation. We're obviously big | fans. | | Gandiva bindings is definitely something we should look | into, but I'm guessing there's much lower hanging fruit | within DataFusion in terms of optimizing, particularly | for our use case. | nevi-me wrote: | Thanks Paul :) | | I think with compute functions/kernels, we're sitting | under a grapevine, so we'll be able to add a lot to Arrow | without yet needing Gandiva bindings. The Rust stdsimd | [0] work will also enable us to better use SIMD in ~ a | year from now (I hope) | | [0] https://github.com/rust-lang/stdsimd | polskibus wrote: | Did you have to implement your-format to Arrow conversion | /abstraction layer manually or was it already available? Could | you give out some pointers on how to roll out own binary- | format-to-arrow query engine? Why didn't you use parquet for | storage? | mhall119 wrote: | InfluxDB IOx will use Parquet files for persistent storage | pauldix wrote: | InfluxDB creator here. I've actually been working on this one | myself and am excited to answer any questions. The project is | InfluxDB IOx (short for iron oxide, pronounced eye-ox). Among | other things, it's an in-memory columnar database with object | storage as the persistence layer. | Yoric wrote: | What's it best suited for? | rubiquity wrote: | In that sense, InfluxDB will start to have a decent amount of | overlap with something like Snowflake or Redshift. | mhall119 wrote: | Snowflake was actually specifically mentioned as an example | of who else is doing this in the blog post | pauldix wrote: | Yes, this will likely be the case, but it's not a specific | goal. We're focusing our efforts right now on the InfluxDB | time series use cases we see most. More general data | warehousing work isn't our focus, but we expect to pick up | quite a bit of that along the way as we develop and as the | underlying Arrow tools develop. | sciurus wrote: | InfluxData is arguably playing catch-up with Thanos, Cortex, and | other scale-out Prometheus backends for the metrics use case. | Given that, I wonder why they decided to write a new storage | backend from scratch instead of building on the work Thano and | Cortex have done. Those two competing projects are successfully | sharing a lot of code that allows all data to be stored in object | storage like S3. | | https://grafana.com/blog/2020/07/29/how-blocks-storage-in-co... | throwawaytsdb wrote: | I agree, they appear to be playing catch up on many fronts. | Notably, with Cortex, Tempo, and Loki, Grafana Labs seem to | have pulled way ahead in advancing a successful open-source | cloud observability strategy. | | InfluxData have a long history of writing (and rewriting) their | own storage engines, so choosing to do it again is | unsurprising. I guess this sort of hints that the current | TSM/TSI have probably reached their performance and scalability | limits and will be EOL before too long. | | What I find interesting is that this project is already almost | a year old and only has six contributors (two of whom look like | external contractors). It seems more like a fun side project | than the future core of the database that is supposed to be | deployed into production next year. | pauldix wrote: | I think the best new projects are created by a small focused | team. Adding too many people too early actually slows things | down. But, of course, I'm biased. | | The thing about this getting to production next year is that | we're doing it in our cloud, which is a services based system | where we can bring all sorts of operational tooling to bear. | Out of band backups, usage of cloud native services, shadow | serving, red/green deploys, and all sorts of things. | Basically, it's easier to deploy a service to production once | you've built a suite of operational tools to make it possible | to do reliably while testing under production workloads that | don't actually face the customer. | | As for us rewriting the core of the database, that's true. | But I think you're unrealistic about what the data systems | look like in closed source SaaS providers as they advance | through orders of magnitude of scale. Hint: they rewrite and | redo their systems. | | As for Grafna, MetricsTank was their first, Cortex wasn't | developed there, and Loki and Tempo look like interesting | projects. | | None of those things has the exact same goal as InfluxDB. And | InfluxDB isn't meant to be open source DataDog. That's not | our thing. We want to be a platform for building time series | applications across many use cases, some of which might be | modern observability. It also doesn't preclude you from | pairing InfluxDB with those other tools. | mhall119 wrote: | To be clear, InfluxDB IOx will be using Apache Arrow, | DataFusion and Parquet, and contributing to those upstream | projects, not creating our own thing. | nemothekid wrote: | > _What I find interesting is that this project is already | almost a year old and only has six contributors_ | | I don't think this is a fair criticism. Postgres only has 7 | core contributors - and influxDB is far less complex | targeting a much simpler use case. | throwawaytsdb wrote: | PostgreSQL might have only seven people on the "core team", | but the active contributors list is much longer: | https://wiki.postgresql.org/wiki/Committers | | Furthermore, hundreds of people have been responsible for | its development over the years: | https://www.postgresql.org/community/contributors/ | | As an additional, possibly more relevant example - the | Cortex project has 158 contributors: | https://github.com/cortexproject/cortex | | My larger point is that, in contrast with the new project, | InfluxDB currently has ~400 contributors. I'm certain that | many dozens of those were involved in getting the current | storage engine to a stable place. And now that hard work is | on a path to being deprecated by moving to a completely new | language and set of underlying technologies. | | Taking the project from a handful of contributors to a | production-ready technology within an existing ecosystem is | a non-trivial task. I'm sure it will come together | eventually, but the commitment to ship it "early next year" | seems unlikely to me. | pauldix wrote: | We'll be producing builds early next year. Those won't be | anything we're recommending for production. Our goal is | to have an early alpha in our own cloud environment by | the end of Q1. I stress alpha. But we'll also have a | bunch of tooling around it (which we've already built for | the other parts of our cloud product) to backup data out | of band, monitor it, shadow test it against production | workloads, etc. | | We're also building on top of the work of a bunch of | others that have built Arrow and libraries within the | Rust ecosystem. | | When the open source GAs, I don't really know. But we're | doing this out in the open so people can see, comment, | and maybe even contribute. Who knows, maybe after a few | years you'll be a convert ;) | Thaxll wrote: | Also: https://github.com/VictoriaMetrics/VictoriaMetrics from | the guy that did FastHTTP ( the fastest HTTP lib for Go ). | pauldix wrote: | Those two systems are designed to work with Prometheus style | metrics, which are very specific. You have metrics, labels, | float values, and millisecond epochs. | | I'm not totally sure how they index things, but I would guess | that it's by time series with an inverted index style mapping | for the metric and label data to underlying time series. This | means they'll have the same problems with working with high | cardinality data that I outlined in the blog post. | | InfluxDB aims to hit a broader audience over just metrics. We | think the table model is great, particularly for event time | series, which we want to be best in class for. A columnar | database is better suited for analytics queries, and given the | right structure is every bit as good for metrics queries. | candiddevmike wrote: | What a ride. You're close to releasing Influx 2.0 without a clear | migration strategy for your customers, and then you think it's a | good idea to announce yet another storage rewrite? Why should | customers stick with you guys when you have a track record for | shipping half baked software, rewriting it, and leaving people | out in the cold? | pauldix wrote: | Users of InfluxDB 1.x can upgrade to 2.0 today. It's an in- | place upgrade and just requires a quick command to make it | work. Further, InfluxDB 2.0 also has the API for InfluxDB 1.x. | We've been putting out InfluxDB 1.x releases while we've been | developing 2.0. One of the reasons we waited so long to | finalize GA for 2.0 was because we had to make sure there was a | clean migration path and there is. | | For our cloud 1 customers, they'll be able to upgrade to our | cloud 2 offering, but in the meantime, their existing | installations get the same 24x7 coverage and service we've been | providing for years. | | As for how this will be deployed, it will be a seamless | transition for our cloud customers when we do so. Data, | monitoring and analytics companies replace their backend data | planes multiple times over the course of their lifetime. | | For our Enterprise customers, we'll provide an upgrade path, | but not until this product is mature enough to ship an on- | premise binary that won't get a chance to get upgraded but for | a few times a year. | | The only difference here is that we're doing it as open source. | They always do theirs behind closed doors. I'm sure most of our | users and many of our customers prefer our open source | approach. | mhall119 wrote: | InfluxDB 1.x user can upgrade to InfluxDB 2.0 pretty easily, | there is an `upgrade` command that will convert your metadata | from 1.x to 2.0. | | Your time series data doesn't even need to be touched, it'll | "just work" after the upgrade. | candiddevmike wrote: | This is different than the previous `influxd migrate` command | that never seemed to work, right? | mhall119 wrote: | For a while the 2.0 development branch was using a | different storage file format than 1.x, which required | migrating your time series data. | | But by the 2.0 Release Candidate that was reverted so that | it will use the same file format as 1.x, and the 2.0 | functionality was backported onto that, so the upgrade path | for 1.x to 2.0 is much simpler now than it was going to be. | jdub wrote: | I didn't see it in the post, but a huge amount of this is due to | Andy Grove's work on Rust implementations of Apache Arrow and | DataFusion. | | I imagine he's happy where he is, but I hope there's some | opportunity for InfluxDB to give credit and support for his great | work. | pauldix wrote: | It absolutely is and we've been contributing back. A even | bigger amount of this is based on Wes McKinney's work on Arrow. | Andy is great and he's been helpful as we've been working with | DataFusion. | gizmodo59 wrote: | Another SQL engine on data lake that heavily uses arrow is | Dremio. | | https://www.dremio.com/webinars/apache-arrow-calcite-parquet... | | https://github.com/dremio/dremio-oss | | If you have parquet on S3 using an engine like Dremio can give | you good results. Some key innovations in open source on data | analytics: | | Arrow - Columnar in memory format Gandiva - LLVM based execution | kernel Arrow flight - Wire protocol based on arrow Project Nessie | - A git like workflow for data lakes | | https://arrow.apache.org/ | https://arrow.apache.org/docs/format/Flight.html | https://arrow.apache.org/blog/2018/12/05/gandiva-donation/ | https://github.com/projectnessie/nessie | StreamBright wrote: | What service could I replace Athena / PrestoDB that uses Apache | Arrow? | gizmodo59 wrote: | Looking at Dremio's website they seem to be a good competitor | to presto/Athena for some use cases. | | Alternative solutions depends on your use case. If it's about | querying S3 data then Dremio/Athena/Presto are good. | jaymebb wrote: | Very cool! - I'm curious how far Influx will then move into being | a general purpose columnar database system outside of typical | timeseries workloads - moving more into being a general purpose | OLAP DB for analytical "data science" workload? | | Will there be any type of transactional guarantees (ACID) using | MVCC or similar? | | Is the execution engine vectorised? | pauldix wrote: | Execution is vectorized, but that's Arrow really. We'd like | this to be useful for general OLAP workloads, but the focus for | the next year is definitely going to be on our bread and butter | time series stuff. | | That being said, Arrow Flight will be a first class RPC | mechanism, which makes it quite nice for data science stuff as | you can get data in to a dataframe in Pandas or R in a few | lines of code and with almost zero | serialization/deserialization overhead. | | This isn't meant to be a transactional system. More like a data | processing system for data in object storage. I'm curious what | your need is there for OLAP workloads, can you tell me more? ___________________________________________________________________ (page generated 2020-11-10 23:02 UTC)