[HN Gopher] ClickHouse Cost-Efficiency in Action: Analyzing 500B...
       ___________________________________________________________________
        
       ClickHouse Cost-Efficiency in Action: Analyzing 500B Rows on an
       Intel NUC
        
       Author : hodgesrm
       Score  : 127 points
       Date   : 2020-01-06 17:23 UTC (5 hours ago)
        
 (HTM) web link (www.altinity.com)
 (TXT) w3m dump (www.altinity.com)
        
       | kresten wrote:
       | What's the tldr on why it is fast?
        
         | endymi0n wrote:
         | I'd argue on ClickHouse not even being that fast (compared to
         | comparable technology like Snowflake, Redshift or BigQuery) but
         | actually the ScyllaDB example being completely misleading.
         | Scylla is probably one of the fastest OLTP datastores, yet
         | they're benchmarking an analytics query -- which is pretty easy
         | to crack by any columnar datastore.
         | 
         | The actual point here is that you can execute millions of
         | (different!) individual _queries_ per second on ScyllaDB, which
         | beats any columnar datastore hands down. ClickHouse  "cheated"
         | here by translating the (unfortunate) benchmark setup into a
         | single query that's extremely heavily optimized under the hood.
        
           | FridgeSeal wrote:
           | Why do you say CH isn't as fast as Snowflake? Because my
           | experience is the opposite.
           | 
           | Also, Snowflake is so painful to use and has a bunch of weird
           | edge cases. It's also so expensive.
        
           | PeterZaitsev wrote:
           | Actually while ClickHouse does not have all features of
           | RedShift, BigQuery etc it usually is much faster than them.
           | It can be slower on some workloads on GPU powered systems,
           | when all data fits in GPU memory but it is not the use case
           | it targets.
           | 
           | ScyllaDB is amazing when it comes to OLTP performance but not
           | in the Analytical ones.
           | 
           | I think they took pretty mediocre Analytical Workload results
           | and shared them as something outstanding.
        
         | caust1c wrote:
         | Tightly coupled storage and execution engines, along with an
         | opinionated view on the actual storage format.
        
         | eecc wrote:
         | Precomputed materialized views
        
           | brobinson wrote:
           | Yeah, this post could have been titled "Why Materialized
           | Views are Awesome"
        
             | bdcravens wrote:
             | Clickhouse is also crazy fast without materialized views -
             | I've only done some PoC's against it, but in loading a
             | largish data set of raw invoice CSVs, I was very impressed
             | with the performance compared to our standard RDBMS.
        
         | polskibus wrote:
         | Heavily optimized column store, incl use of SSE instructions,
         | etc. Moreover, some architectural tradeoffs mentioned at
         | https://clickhouse.yandex/docs/en/introduction/features_cons...
        
         | SloopJon wrote:
         | I'd say the materialized view is the main thing:
         | 
         | > Thus, we add the following materialized view ... At the end
         | we should have 1440 times fewer rows in the aggregate than the
         | source table.
         | 
         | The cost of populating that view is amortized over the 17.5
         | hours it took to load the data.
        
           | hodgesrm wrote:
           | Mat views are great as the article showed. I use them to get
           | query response down to milliseconds, as they vastly reduce
           | the amount of data ClickHouse must scan.
           | 
           | That said, there are a lot of other tools: column storage,
           | vectorwise query, efficient compression including column
           | codecs, and skip indexes to name a few. If you only have a
           | few billion rows it's still possible to get sub-second query
           | results using brute force scans.
           | 
           | Disclaimer: I work for Altinity, who wrote this article.
        
             | hodgesrm wrote:
             | p.s. Loading the view is low-cost compared to loading the
             | source data. On the NUC it's 40-60 minutes, so worst case
             | it's something like 1h / 17.5h = 5.71%. Also, you can still
             | query the source data. That is fast for individual sensors
             | as the examples showed.
        
       | atombender wrote:
       | Is ClickHouse good for event data when you want to do rollups?
       | For example, say all my events are of the form:
       | 
       | {event: "viewedArticle", article_id: 63534, user_id: 42, topic:
       | "news", time: "2020-01-06"}
       | 
       | I want to be able to build aggregations which shows number of
       | "viewedArticle" events grouped by hour, grouped by topic,
       | counting unique user_ids within each bucket.
       | 
       | Or let's say I want the top K articles viewed each day, filtered
       | by a topic.
       | 
       | That's something that's trivial with Elasticsearch, which has a
       | hierarchical aggregation DSL. Is ClickHouse good at this?
       | 
       | Whenever I see time-series databases such as InfluxDB mentioned,
       | they look like they're focused on measurements, not discrete
       | rows. You can attach the event data as "labels", but this isn't
       | efficient when the cardinality of each column is very high (e.g.
       | article IDs or user IDs in the above example).
        
         | manigandham wrote:
         | Yes. Clickhouse is a column-oriented relational database among
         | many others like MemSQL, Vertica, Redshift, BigQuery,
         | Snowflake, Greenplum, etc. They're all focused on analytical
         | queries over very large datasets using SQL.
         | 
         | An aggregation with several `group by` statements is no
         | challenge and all of these databases also support approximate
         | counting via HyperLogLog for faster results.
         | 
         | Clickhouse has some unique features where each table can have a
         | separate 'engine' including some that automatically apply
         | aggregations. Start with a normal table though since it'll be
         | plenty fast enough for most use cases.
        
           | atombender wrote:
           | Thanks! Looks like the only downside is that, as it returns
           | rows as results, you end up getting a lot of duplicate column
           | data back and need to "nest" the nested buckets yourself.
           | 
           | For example, a result like:                 topic;time;count
           | news;2020-01-01;44       news;2020-01-02;31
           | 
           | Now you have "news" repeated, and to group this into buckets
           | for rendering summary tables and such (with sub totals at
           | each level), you need to iterate through the flattened
           | results and generate nested structures. This is something
           | Elasticsearch gives you out of the box.
           | 
           | Last I looked at Clickhouse, it had master/slave replication
           | only, and if you want shards of data distributed across a
           | cluster it's something you need to manually manage?
        
             | manigandham wrote:
             | Right, relational databases only return flat tabular
             | results but that seems minor compared to performance
             | increase you gain.
             | 
             | Clickhouse is fast but not as operationally friendly as the
             | others. It's more much work once you go beyond a single
             | node so I'd suggest looking at those other options if you
             | want something easier to operate, or use one of the cloud
             | data warehouses like Bigquery or Snowflake to eliminate ops
             | entirely.
        
       | jgrahamc wrote:
       | We use ClickHouse extensively and it's been great:
       | https://blog.cloudflare.com/http-analytics-for-6m-requests-p...
        
         | chupasaurus wrote:
         | You've lost a Russian IT meme in translation.
         | 
         | We use the verb " _to use brakes_ " to describe that something
         | works slowly. There was a long story of threads about Java
         | performance, until the meme was solidified after the news from
         | 2005 DARPA Grand Challenge (racing competition for autonomous
         | robotic self-driven cars): a car named Tommy by Jefferson Team
         | which was running Java under Linux haven't used breaks before a
         | turn and crashed to the wall at 70 mph, hence _Java runs fast_
         | and _Java doesn 't use brakes_ were both described perfectly by
         | the same sentence.
         | 
         | Yandex used the meme to advertise Clickhouse for engineers.
         | 
         | edit: formatting
         | 
         | edit2: brakes instead of breaks, wtf
        
           | PeterZaitsev wrote:
           | I think this is poor translation. In Russian "Tormozit'" may
           | mean to "use brakes" when applied to a car or just "be slow"
           | when applied to a program (or a person). "MySQL segodnia
           | tormozit" would mean MySQL is acting slow today, not what it
           | is using brakes. So that meme I think is best translated as
           | "Clickhouse is never slow" or "Clickhouse never acts slow"
        
           | [deleted]
        
           | nkrisc wrote:
           | I think you mean "brakes," as I thought you meant "breaks"
           | like in an iterator. Not that familiar with Java though, just
           | assuming based on the reference to a vehicle failing to brake
           | before a turn.
           | 
           | I thought maybe it meant using break statements in your loops
           | slows things down or something.
        
             | chupasaurus wrote:
             | Yeah, my bad.
        
       | ofek wrote:
       | Hey, Ofek from Datadog here!
       | 
       | I recently implemented our ClickHouse integration [1], so if any
       | of you would like to try it out we would appreciate feedback. I
       | really enjoyed learning about this database, and it has excellent
       | docs :)
       | 
       | Oh fun fact, speaking of docs, this was the first integration of
       | ours that we scrape docs for as part of the test suite. So when a
       | new built-in metric is added it will fail our CI until we support
       | it [2]. We just did this again for Apache Airflow [3].
       | 
       | [1]: https://github.com/DataDog/integrations-core/pull/4957
       | 
       | [2]: https://github.com/DataDog/integrations-core/pull/5233
       | 
       | [3]: https://github.com/DataDog/integrations-core/pull/5311
        
       | [deleted]
        
       | rossjudson wrote:
       | Curious about what Google does in this area?
       | 
       | https://blog.acolyer.org/2019/09/11/procella/
        
       | rossjudson wrote:
       | Push-down operator algebra, taken to an extreme...
       | 
       | https://www.yellowbrick.com/
        
       | codexon wrote:
       | One thing I haven't seen anyone note about clickhouse though
       | which would be really important to many for data durability, is
       | that it does not use fsync anywhere at all.
        
         | FridgeSeal wrote:
         | I'm not understanding the implication of this, could you
         | explain?
        
           | DasIch wrote:
           | When you write to a file, you generally don't write to
           | physical storage. Instead the writes get buffered in memory
           | and written to physical storage in batches. This
           | substantially improves performance but creates a risk: If
           | there is some sort of outage before the data is flushed to
           | disk, you might lose data.
           | 
           | In order to address that risk, you can explicitly force data
           | to be written to disk by calling fsync. Databases generally
           | do this to ensure durability and only signal success after
           | fsync succeeded and the data is safely stored.
           | 
           | So ClickHouse not calling fsync implies that it might lose
           | data in case of a power outage or a similar event.
        
           | caust1c wrote:
           | The implication is that clickhouse can't easily support
           | transactional queries. That's why it's an OLAP not OLTP
           | database. (On-Line Analytics Processing vs On-Line
           | Transaction Processing).
        
         | gautamcgoel wrote:
         | Yes, I have the same model. I think I bought it for between
         | $650 and $750.
        
         | caust1c wrote:
         | It's pretty clearly laid out in the docs. Hopefully anyone
         | seriously considering using Clickhouse reads the docs
         | thoroughly and understands what they're implementing.
        
           | codexon wrote:
           | What do you mean clearly laid out? This is the only mention
           | of fsync I could find through google or their own search
           | function.
           | 
           | https://clickhouse.yandex/docs/en/operations/settings/settin.
           | ..
        
             | caust1c wrote:
             | The title of the page might be a little snarky, but it's in
             | the introduction that transactional queries are not
             | supported:
             | 
             | https://clickhouse.yandex/docs/en/introduction/features_con
             | s...
             | 
             | Sure it's not specifically about `fsync` but presumably
             | this is what the consumer of the database actually wants to
             | know.
        
         | DasIch wrote:
         | I can't find anything about this in the docs except[1]. I also
         | can't find any issues in their bug tracker related to
         | clickhouse not using fsync[2].
         | 
         | I can however find code that actually calls fsync[3][4]. To be
         | fair I haven't read enough to determine how this (doesn't)
         | affect durability. Nevertheless I'm wondering do you have a
         | source for this claim?                 [1]: https://clickhouse.
         | yandex/docs/en/operations/settings/settings/#fsync-metadata
         | [2]: https://github.com/ClickHouse/ClickHouse/search?q=fsync&ty
         | pe=Issues       [3]: https://github.com/ClickHouse/ClickHouse/b
         | lob/355b1e5594119e036a2d62988bfa42bc8b1a1687/dbms/src/IO/WriteB
         | ufferFromFileDescriptor.cpp#L113       [4]: https://github.com/
         | ClickHouse/ClickHouse/blob/e765733a26cfc4cecc13c981686560338256
         | a6b1/dbms/src/IO/WriteBufferAIO.cpp#L98
        
       | patelh wrote:
       | Not exactly a good comparison if you don't generate the data the
       | same way for the test setup. Your generated data is more
       | compressible by clickhouse, that skews the comparison. Would have
       | been better to not change the test data if you wanted to do a
       | comparison.
        
         | hodgesrm wrote:
         | The important difference is that we used a more realistic
         | temperature profile, which as you say does affect compression
         | _for that column_. Schema design (including sort order,
         | compression, and codecs) for the remaining columns is just good
         | ClickHouse practice. Much of the storage and I /O savings is in
         | the date, time, and sensor_id and columns.
         | 
         | It's also useful to note that the materialized view results
         | would be essentially the same no matter how you generate and
         | store data because the materialized view down-samples
         | temperature max/min to daily aggregates. The data are vastly
         | smaller no matter how you generate them.
         | 
         | The article illustrates that if you really had such an IoT app
         | and designed it properly you could run analytics with
         | surprisingly few resources. I think that's a significant point.
        
           | patelh wrote:
           | Then you should provide results for both test datasets to
           | make the point of using a more realistic approach.
           | Materialized views are not news, nor is properly designed
           | analytics applications. For me the importance is how click
           | house is better and why.
        
           | delusional wrote:
           | That's what you wanted to show, but what you ended up showing
           | is that if you have different data, then the query
           | performance can be quite good.
           | 
           | I get the desire to critique the temperature profile, but
           | completely changing it makes the comparison worthless. From a
           | data perspective it's like saying "if all the sensors just
           | report 1 for temperature every reading, computing the min,
           | max, and average is super fast". No shit, that wasn't the
           | task though.
        
         | manigandham wrote:
         | It doesn't matter. ScyllaDB is a Cassandra clone, an advanced
         | nested key/value database that stores data per-row and requires
         | slow iteration to scan through an entire table.
         | 
         | Column-oriented databases will always be much faster at
         | analytical queries because of the difference in physical layout
         | and vectorized processing. Scylla's has very impressive OLTP
         | performance but really shouldn't be compared to OLAP databases
         | at all. That original 1B rows/sec blog post by them is kind of
         | a strange benchmark to begin with.
        
         | sheeshkebab wrote:
         | I bet results would be roughly the same even for the exact same
         | dataset - Scylla and other K/v data stores can't compete with
         | columnar databases that are purpose built for complex analytics
         | queries. the many orders of magnitude query performance
         | differences (not to count storage, compute overhead) show it
         | enough.
         | 
         | It was kind of a crummy use case for Scylla anyway (it's a
         | transactional write store, not an analytics engine)
        
       | johnchristopher wrote:
       | Note: not your standard $250 NUC, it's a canyon something with
       | much more oomphs.
        
       | tkyjonathan wrote:
       | This is going to be your next cloud agnostic data warehouse.
        
         | FridgeSeal wrote:
         | For what it's worth, I've used Clickhouse and Snowflake and I
         | strongly prefer Clickhouse.
         | 
         | Performance was superior, client libraries and built-in HTTP
         | interface was a god-send, it supported geospatial queries. I
         | had perpetual issues with getting Snowflake to properly escape
         | strings in CSV, handle JSON in anything approaching a sensible
         | way, there's claims that it integrates properly with Kafka as a
         | consumer, but it most certainly does not. The UX is horrible to
         | boot.
        
           | manigandham wrote:
           | What were the issues with JSON? Snowflake is the best cloud
           | data warehouse when it comes to support for unstructured data
           | and far better than Redshift, Bigquery or the others.
           | Snowflake also has geo support.
           | 
           | It doesn't support streaming data though so things like Kafka
           | aren't a good fit yet. They have a connector but it's
           | basically a little app that automates consuming from Kafka,
           | writing files to S3, then loading them into your tables.
        
       ___________________________________________________________________
       (page generated 2020-01-06 23:00 UTC)