[HN Gopher] ClickHouse Cost-Efficiency in Action: Analyzing 500B... ___________________________________________________________________ ClickHouse Cost-Efficiency in Action: Analyzing 500B Rows on an Intel NUC Author : hodgesrm Score : 127 points Date : 2020-01-06 17:23 UTC (5 hours ago) (HTM) web link (www.altinity.com) (TXT) w3m dump (www.altinity.com) | kresten wrote: | What's the tldr on why it is fast? | endymi0n wrote: | I'd argue on ClickHouse not even being that fast (compared to | comparable technology like Snowflake, Redshift or BigQuery) but | actually the ScyllaDB example being completely misleading. | Scylla is probably one of the fastest OLTP datastores, yet | they're benchmarking an analytics query -- which is pretty easy | to crack by any columnar datastore. | | The actual point here is that you can execute millions of | (different!) individual _queries_ per second on ScyllaDB, which | beats any columnar datastore hands down. ClickHouse "cheated" | here by translating the (unfortunate) benchmark setup into a | single query that's extremely heavily optimized under the hood. | FridgeSeal wrote: | Why do you say CH isn't as fast as Snowflake? Because my | experience is the opposite. | | Also, Snowflake is so painful to use and has a bunch of weird | edge cases. It's also so expensive. | PeterZaitsev wrote: | Actually while ClickHouse does not have all features of | RedShift, BigQuery etc it usually is much faster than them. | It can be slower on some workloads on GPU powered systems, | when all data fits in GPU memory but it is not the use case | it targets. | | ScyllaDB is amazing when it comes to OLTP performance but not | in the Analytical ones. | | I think they took pretty mediocre Analytical Workload results | and shared them as something outstanding. | caust1c wrote: | Tightly coupled storage and execution engines, along with an | opinionated view on the actual storage format. | eecc wrote: | Precomputed materialized views | brobinson wrote: | Yeah, this post could have been titled "Why Materialized | Views are Awesome" | bdcravens wrote: | Clickhouse is also crazy fast without materialized views - | I've only done some PoC's against it, but in loading a | largish data set of raw invoice CSVs, I was very impressed | with the performance compared to our standard RDBMS. | polskibus wrote: | Heavily optimized column store, incl use of SSE instructions, | etc. Moreover, some architectural tradeoffs mentioned at | https://clickhouse.yandex/docs/en/introduction/features_cons... | SloopJon wrote: | I'd say the materialized view is the main thing: | | > Thus, we add the following materialized view ... At the end | we should have 1440 times fewer rows in the aggregate than the | source table. | | The cost of populating that view is amortized over the 17.5 | hours it took to load the data. | hodgesrm wrote: | Mat views are great as the article showed. I use them to get | query response down to milliseconds, as they vastly reduce | the amount of data ClickHouse must scan. | | That said, there are a lot of other tools: column storage, | vectorwise query, efficient compression including column | codecs, and skip indexes to name a few. If you only have a | few billion rows it's still possible to get sub-second query | results using brute force scans. | | Disclaimer: I work for Altinity, who wrote this article. | hodgesrm wrote: | p.s. Loading the view is low-cost compared to loading the | source data. On the NUC it's 40-60 minutes, so worst case | it's something like 1h / 17.5h = 5.71%. Also, you can still | query the source data. That is fast for individual sensors | as the examples showed. | atombender wrote: | Is ClickHouse good for event data when you want to do rollups? | For example, say all my events are of the form: | | {event: "viewedArticle", article_id: 63534, user_id: 42, topic: | "news", time: "2020-01-06"} | | I want to be able to build aggregations which shows number of | "viewedArticle" events grouped by hour, grouped by topic, | counting unique user_ids within each bucket. | | Or let's say I want the top K articles viewed each day, filtered | by a topic. | | That's something that's trivial with Elasticsearch, which has a | hierarchical aggregation DSL. Is ClickHouse good at this? | | Whenever I see time-series databases such as InfluxDB mentioned, | they look like they're focused on measurements, not discrete | rows. You can attach the event data as "labels", but this isn't | efficient when the cardinality of each column is very high (e.g. | article IDs or user IDs in the above example). | manigandham wrote: | Yes. Clickhouse is a column-oriented relational database among | many others like MemSQL, Vertica, Redshift, BigQuery, | Snowflake, Greenplum, etc. They're all focused on analytical | queries over very large datasets using SQL. | | An aggregation with several `group by` statements is no | challenge and all of these databases also support approximate | counting via HyperLogLog for faster results. | | Clickhouse has some unique features where each table can have a | separate 'engine' including some that automatically apply | aggregations. Start with a normal table though since it'll be | plenty fast enough for most use cases. | atombender wrote: | Thanks! Looks like the only downside is that, as it returns | rows as results, you end up getting a lot of duplicate column | data back and need to "nest" the nested buckets yourself. | | For example, a result like: topic;time;count | news;2020-01-01;44 news;2020-01-02;31 | | Now you have "news" repeated, and to group this into buckets | for rendering summary tables and such (with sub totals at | each level), you need to iterate through the flattened | results and generate nested structures. This is something | Elasticsearch gives you out of the box. | | Last I looked at Clickhouse, it had master/slave replication | only, and if you want shards of data distributed across a | cluster it's something you need to manually manage? | manigandham wrote: | Right, relational databases only return flat tabular | results but that seems minor compared to performance | increase you gain. | | Clickhouse is fast but not as operationally friendly as the | others. It's more much work once you go beyond a single | node so I'd suggest looking at those other options if you | want something easier to operate, or use one of the cloud | data warehouses like Bigquery or Snowflake to eliminate ops | entirely. | jgrahamc wrote: | We use ClickHouse extensively and it's been great: | https://blog.cloudflare.com/http-analytics-for-6m-requests-p... | chupasaurus wrote: | You've lost a Russian IT meme in translation. | | We use the verb " _to use brakes_ " to describe that something | works slowly. There was a long story of threads about Java | performance, until the meme was solidified after the news from | 2005 DARPA Grand Challenge (racing competition for autonomous | robotic self-driven cars): a car named Tommy by Jefferson Team | which was running Java under Linux haven't used breaks before a | turn and crashed to the wall at 70 mph, hence _Java runs fast_ | and _Java doesn 't use brakes_ were both described perfectly by | the same sentence. | | Yandex used the meme to advertise Clickhouse for engineers. | | edit: formatting | | edit2: brakes instead of breaks, wtf | PeterZaitsev wrote: | I think this is poor translation. In Russian "Tormozit'" may | mean to "use brakes" when applied to a car or just "be slow" | when applied to a program (or a person). "MySQL segodnia | tormozit" would mean MySQL is acting slow today, not what it | is using brakes. So that meme I think is best translated as | "Clickhouse is never slow" or "Clickhouse never acts slow" | [deleted] | nkrisc wrote: | I think you mean "brakes," as I thought you meant "breaks" | like in an iterator. Not that familiar with Java though, just | assuming based on the reference to a vehicle failing to brake | before a turn. | | I thought maybe it meant using break statements in your loops | slows things down or something. | chupasaurus wrote: | Yeah, my bad. | ofek wrote: | Hey, Ofek from Datadog here! | | I recently implemented our ClickHouse integration [1], so if any | of you would like to try it out we would appreciate feedback. I | really enjoyed learning about this database, and it has excellent | docs :) | | Oh fun fact, speaking of docs, this was the first integration of | ours that we scrape docs for as part of the test suite. So when a | new built-in metric is added it will fail our CI until we support | it [2]. We just did this again for Apache Airflow [3]. | | [1]: https://github.com/DataDog/integrations-core/pull/4957 | | [2]: https://github.com/DataDog/integrations-core/pull/5233 | | [3]: https://github.com/DataDog/integrations-core/pull/5311 | [deleted] | rossjudson wrote: | Curious about what Google does in this area? | | https://blog.acolyer.org/2019/09/11/procella/ | rossjudson wrote: | Push-down operator algebra, taken to an extreme... | | https://www.yellowbrick.com/ | codexon wrote: | One thing I haven't seen anyone note about clickhouse though | which would be really important to many for data durability, is | that it does not use fsync anywhere at all. | FridgeSeal wrote: | I'm not understanding the implication of this, could you | explain? | DasIch wrote: | When you write to a file, you generally don't write to | physical storage. Instead the writes get buffered in memory | and written to physical storage in batches. This | substantially improves performance but creates a risk: If | there is some sort of outage before the data is flushed to | disk, you might lose data. | | In order to address that risk, you can explicitly force data | to be written to disk by calling fsync. Databases generally | do this to ensure durability and only signal success after | fsync succeeded and the data is safely stored. | | So ClickHouse not calling fsync implies that it might lose | data in case of a power outage or a similar event. | caust1c wrote: | The implication is that clickhouse can't easily support | transactional queries. That's why it's an OLAP not OLTP | database. (On-Line Analytics Processing vs On-Line | Transaction Processing). | gautamcgoel wrote: | Yes, I have the same model. I think I bought it for between | $650 and $750. | caust1c wrote: | It's pretty clearly laid out in the docs. Hopefully anyone | seriously considering using Clickhouse reads the docs | thoroughly and understands what they're implementing. | codexon wrote: | What do you mean clearly laid out? This is the only mention | of fsync I could find through google or their own search | function. | | https://clickhouse.yandex/docs/en/operations/settings/settin. | .. | caust1c wrote: | The title of the page might be a little snarky, but it's in | the introduction that transactional queries are not | supported: | | https://clickhouse.yandex/docs/en/introduction/features_con | s... | | Sure it's not specifically about `fsync` but presumably | this is what the consumer of the database actually wants to | know. | DasIch wrote: | I can't find anything about this in the docs except[1]. I also | can't find any issues in their bug tracker related to | clickhouse not using fsync[2]. | | I can however find code that actually calls fsync[3][4]. To be | fair I haven't read enough to determine how this (doesn't) | affect durability. Nevertheless I'm wondering do you have a | source for this claim? [1]: https://clickhouse. | yandex/docs/en/operations/settings/settings/#fsync-metadata | [2]: https://github.com/ClickHouse/ClickHouse/search?q=fsync&ty | pe=Issues [3]: https://github.com/ClickHouse/ClickHouse/b | lob/355b1e5594119e036a2d62988bfa42bc8b1a1687/dbms/src/IO/WriteB | ufferFromFileDescriptor.cpp#L113 [4]: https://github.com/ | ClickHouse/ClickHouse/blob/e765733a26cfc4cecc13c981686560338256 | a6b1/dbms/src/IO/WriteBufferAIO.cpp#L98 | patelh wrote: | Not exactly a good comparison if you don't generate the data the | same way for the test setup. Your generated data is more | compressible by clickhouse, that skews the comparison. Would have | been better to not change the test data if you wanted to do a | comparison. | hodgesrm wrote: | The important difference is that we used a more realistic | temperature profile, which as you say does affect compression | _for that column_. Schema design (including sort order, | compression, and codecs) for the remaining columns is just good | ClickHouse practice. Much of the storage and I /O savings is in | the date, time, and sensor_id and columns. | | It's also useful to note that the materialized view results | would be essentially the same no matter how you generate and | store data because the materialized view down-samples | temperature max/min to daily aggregates. The data are vastly | smaller no matter how you generate them. | | The article illustrates that if you really had such an IoT app | and designed it properly you could run analytics with | surprisingly few resources. I think that's a significant point. | patelh wrote: | Then you should provide results for both test datasets to | make the point of using a more realistic approach. | Materialized views are not news, nor is properly designed | analytics applications. For me the importance is how click | house is better and why. | delusional wrote: | That's what you wanted to show, but what you ended up showing | is that if you have different data, then the query | performance can be quite good. | | I get the desire to critique the temperature profile, but | completely changing it makes the comparison worthless. From a | data perspective it's like saying "if all the sensors just | report 1 for temperature every reading, computing the min, | max, and average is super fast". No shit, that wasn't the | task though. | manigandham wrote: | It doesn't matter. ScyllaDB is a Cassandra clone, an advanced | nested key/value database that stores data per-row and requires | slow iteration to scan through an entire table. | | Column-oriented databases will always be much faster at | analytical queries because of the difference in physical layout | and vectorized processing. Scylla's has very impressive OLTP | performance but really shouldn't be compared to OLAP databases | at all. That original 1B rows/sec blog post by them is kind of | a strange benchmark to begin with. | sheeshkebab wrote: | I bet results would be roughly the same even for the exact same | dataset - Scylla and other K/v data stores can't compete with | columnar databases that are purpose built for complex analytics | queries. the many orders of magnitude query performance | differences (not to count storage, compute overhead) show it | enough. | | It was kind of a crummy use case for Scylla anyway (it's a | transactional write store, not an analytics engine) | johnchristopher wrote: | Note: not your standard $250 NUC, it's a canyon something with | much more oomphs. | tkyjonathan wrote: | This is going to be your next cloud agnostic data warehouse. | FridgeSeal wrote: | For what it's worth, I've used Clickhouse and Snowflake and I | strongly prefer Clickhouse. | | Performance was superior, client libraries and built-in HTTP | interface was a god-send, it supported geospatial queries. I | had perpetual issues with getting Snowflake to properly escape | strings in CSV, handle JSON in anything approaching a sensible | way, there's claims that it integrates properly with Kafka as a | consumer, but it most certainly does not. The UX is horrible to | boot. | manigandham wrote: | What were the issues with JSON? Snowflake is the best cloud | data warehouse when it comes to support for unstructured data | and far better than Redshift, Bigquery or the others. | Snowflake also has geo support. | | It doesn't support streaming data though so things like Kafka | aren't a good fit yet. They have a connector but it's | basically a little app that automates consuming from Kafka, | writing files to S3, then loading them into your tables. ___________________________________________________________________ (page generated 2020-01-06 23:00 UTC)