[HN Gopher] ClickHouse Cloud is now in Public Beta
       ___________________________________________________________________
        
       ClickHouse Cloud is now in Public Beta
        
       Author : taubek
       Score  : 251 points
       Date   : 2022-10-04 14:10 UTC (8 hours ago)
        
 (HTM) web link (clickhouse.com)
 (TXT) w3m dump (clickhouse.com)
        
       | gsanderson wrote:
       | Checking the pricing, $0.0125 per write unit. It says each write
       | "generates one or more write units". So ... $1.25 per 100 writes?
       | That can't be right. I wondered if it meant writes per second
       | (like how AWS DynamoDB or Azure CosmosDB work, with their unit-
       | based billing).
        
         | tbragin wrote:
         | With an analytical database like ClickHouse, you can write many
         | rows with a single INSERT statement (thousands of rows,
         | millions of rows, and more). In fact, this kind of batching is
         | recommended. Larger inserts will consume more write units than
         | smaller inserts. Check out our billing FAQ for some examples,
         | and we will be enhancing it with more detail as questions from
         | our users come in (we'll work on clarifying this specific
         | point): https://clickhouse.com/docs/en/manage/billing/ We also
         | provide a $300 credit free trial to try out your workload and
         | see how it translates to some of the usage dimensions. Finally,
         | this is a Beta product, so keep the feedback coming!
        
           | gsanderson wrote:
           | Thanks, I agree batching inserts would indeed be a good idea
           | and it makes sense that's recommended. However that link you
           | mention (as of now) does not specify what a write unit is. So
           | if that could be clarified, that would be great. Since from
           | your reply it sounds like one INSERT would indeed (at a
           | minimum) incur one write unit. And thus 100 writes could
           | indeed cost $1.25. Which could get expensive, fast.
        
             | tbragin wrote:
             | An INSERT can consume less than one write unit, depending
             | on how many rows and bytes it writes to how many
             | partitions, columns, materialized views, data types, etc.
             | So, a "write unit", which corresponds to hundreds of low-
             | level write operations, typically translates to many
             | batched INSERTS. We are working to improve our examples in
             | the FAQ to clarify - thank you so much for asking the
             | question!
        
               | ok_dad wrote:
               | You know what would be really helpful? Some tooling
               | around measuring "write units", like a fake local
               | ClickHouseCloud API you could submit a write to and see
               | how many "write units" that particular write would take.
               | Of course, that would make pricing transparent and easy
               | and SaaS companies don't seem like that. I challenge you
               | folks to actually figure out a way to prevent surprise
               | bills, until then all this "write unit" stuff is
               | bullshit.
        
       | singhrac wrote:
       | I'm pretty interested in running Clickhouse, but to poll the
       | room: is it easy to run without an infra person maintaining it?
       | We don't need any kind of persistence guarantee or backups, happy
       | to tear it down every day and refresh the data from sources, but
       | does it have complicated config? How well does it work on say, a
       | single core?
        
         | IanCal wrote:
         | Another voice to say it's really easy on a single node. I ran
         | it for a lot of work purely because of this.
        
         | danielbln wrote:
         | Much, much simpler than a lot of dbs I've worked at. It's
         | mostly fire and forget, easily doable without an infra person,
         | imo.
        
         | klysm wrote:
         | If you don't really care about the data then I'd say it's
         | probably pretty easy to run without an infra person right? If
         | you hit a problem just nuke it and try again?
        
         | fermuch wrote:
         | I've been running clickhouse for production as a solo dev for a
         | year now and it's been great so far. I've had to tune it a bit
         | for the machines I have, but everything works and it's
         | incredibly fast. Not even comparable to postgres. For
         | analytics, it goes way beyond what you'd expect, even under
         | heavy writing.
         | 
         | Even join-heavy queries work great.
        
         | jcuenod wrote:
         | I've been running it in a container on docker swarm (on a
         | celeron 3205U--2x1.5GHz cores). My CH container stores read
         | only data, and I periodically update the snapshot, which
         | involves deploying a newer image. This is very straightforward
         | but, because it's read only, I don't even need a persistent
         | volume.
         | 
         | I've found the documentation to be pretty good and there's a
         | really active telegram group where devs offer help (if you get
         | noticed, but that's not too hard).
        
       | chrisballinger wrote:
       | At first I was a bit confused about why The Onion's spin-off site
       | had a cloud offering, but that one is called ClickHole.
        
       | axlee wrote:
       | Unsollicited feedback: I do not receive any validation email when
       | creating an account, so I cannot log in. It is also impossible to
       | reset the password.
        
         | tbragin wrote:
         | So sorry about that! Could you send us an email at
         | support@clickhouse.com and we'll get to the bottom of it?
        
       | encoderer wrote:
       | I'm a little sad to see them embrace magic "insert unit" pricing,
       | instead of taking the approach Altinity uses where you are
       | renting an instance size that you can compare apples-to-apples
       | with running your own cluster on ec2.
        
         | aseipp wrote:
         | Honestly that's something they can probably offer separately,
         | but I really prefer this pricing for most use cases where I
         | want a database that is always available but has bursty
         | request/response patterns. This means I can have an analytical
         | database available for all my small services, websites, etc
         | without having to think too much about availability, support,
         | and a constant price overhead. But ClickHouse is so fast you
         | can get pretty far with a $10 VPS, I admit.
         | 
         | Probably the best comparison is CockroachDB Cloud. They have a
         | "serverless" offering based on unit pricing and a dedicated
         | offering based on provisioned servers + support/maintenance
         | overhead. I think that would be the ideal place to go long-
         | term, but I'm super excited for this current one. I love
         | ClickHouse and want to support them.
         | 
         | ClickHouse is also an interesting case because there's lots of
         | options to migrate clusters, use S3 as long-term storage, etc
         | to where I don't particularly feel locked into this offering if
         | I ever wanted to shift into my own.
        
         | hodgesrm wrote:
         | > I'm a little sad to see them embrace magic "insert unit"
         | pricing, instead of taking the approach Altinity uses where you
         | are renting an instance size that you can compare apples-to-
         | apples with running your own cluster on ec2.
         | 
         | Thanks for this comment. We'll be publishing a blog at Altinity
         | to compare both models. My view is that they both have their
         | place. The BigQuery pricing model is great for testing or
         | systems that don't have constant load. The Altinity model is
         | good for SaaS systems that run constantly and need the ability
         | to bound costs to ensure margins.
         | 
         | Having a selection of vendors that offer different economics
         | seems better for users than everyone competing for margin on
         | the same operational model.
         | 
         | Disclaimer: I'm CEO of Altinity.
        
         | izrailev wrote:
         | Serverless "pay for usage" is different than the fixed-size
         | dedicated cluster pricing model, and can be quite a bit
         | cheaper, especially with spiky query traffic. It should also be
         | more reliable, since you don't have to predict and provision
         | capacity for your peak usage ahead of time.
         | 
         | Disclaimer: I work for ClickHouse.
        
           | encoderer wrote:
           | Ok but it can also be quite a bit more expensive and your
           | bill is less predictable.
           | 
           | Disclaimer: I'm a customer of altinity.
        
             | izrailev wrote:
             | Start a free trial of ClickHouse Cloud
             | (https://clickhouse.cloud/signUp) and compare
             | price/performance with your current set up.
             | 
             | Pay-for-usage should be cheaper in most use cases, and you
             | can further limit your costs by reducing your scale-up
             | limits in the "Advance Scaling" setting (it will impact
             | your performance though -- best to just let the autoscaler
             | do its job...)
        
       | ian-whitestone wrote:
       | Outside of being open source, how does ClickHouse differ from
       | Snowflake/BigQuery? In what scenarios would I choose ClickHouse
       | over those existing solutions?
        
         | 62951413 wrote:
         | Druid and Pinot are more likely to be the peer group (e.g. see
         | https://leventov.medium.com/comparison-of-the-open-source-
         | ol...)
        
       | datalopers wrote:
       | Is this fully hosted or runs in my cloud? Is this running
       | https://github.com/ClickHouse/ClickHouse or a closed-source
       | Clickhouse Cloud variant with added features?
        
         | kbiyer wrote:
         | It is a fully hosted offering and is based on ClickHouse core
         | v22.10
        
           | lsllc wrote:
           | Curious how CH are ultimately actually hosting this, on a
           | public cloud or is this their own cloud.
        
             | morelisp wrote:
             | Billing page says AWS for storage, so I'd assume the same
             | for compute.
             | 
             | If they're really doing elastic reads as suggested in
             | another comment I don't think this can be the standard CH
             | server (at least for reads) - or I've missed something very
             | exciting in recent versions.
        
               | datalopers wrote:
               | It feels like there's gotta be something proprietary
               | beyond just using a s3 disk setup but maybe not? is
               | simply having data_cache_enabled and cache_path pointing
               | to local nvme ssd sufficient to achieve similar speeds?
               | 
               | * https://clickhouse.com/docs/en/guides/sre/configuring-s
               | 3-for...
               | 
               | *
               | https://clickhouse.com/docs/en/integrations/s3/s3-merge-
               | tree...
        
       | skadamat wrote:
       | For context, there are multiple Clickhouse companies. This is the
       | one that spun out of Yandex and took the Clickhouse name as their
       | namesake for the company!
       | 
       | Altinity has been in the space for a while offering Clickhouse
       | services as well: https://altinity.com/
       | 
       | EDIT: brazenly was the wrong word here :)
        
         | stingraycharles wrote:
         | Wait, wasn't Clickhouse itself spun out of Yandex anyway? I.e.
         | Yandex opensources Clickhouse, then later starts Clickhouse
         | (the company)?
        
           | kelp wrote:
           | That is literally what happened. There is nothing brazen
           | here.
        
         | AdamProut wrote:
         | I was keeping a tally of how many companies were offering
         | "ClickHouse as a service" at one point last year. I think I got
         | up to 7 or 8.
         | 
         | It will be interesting to watch this unfold from a code
         | licensing perspective. Will Clickhouse Inc. move to a more
         | restrictive license to block all these other ClickHouse
         | services?
        
         | sccxy wrote:
         | Strange name and domain.
         | 
         | It is included in many adblocker lists.
        
           | tylerhannan wrote:
           | And we request removal as and where we can.
           | 
           | https://github.com/StevenBlack/hosts/issues/1781
           | 
           | Unfortunately mvps.org no longer seems to reply to emails.
        
         | qaq wrote:
         | "brazenly" :)? Are you for real? Top 3 commiters work there
         | including Alexey who created CH in the first place
         | 
         | alexey-milovidov 17,522 commits 5,648,537 ++ 5,580,633 --
         | 
         | alesapin 4,618 commits 1,024,262 ++ 932,501 --
         | 
         | KochetovNicolai 3,867 commits 377,420 ++ 314,035 --
        
         | hodgesrm wrote:
         | Hi! Thank you for mentioning my company Altinity. We love
         | ClickHouse and have been supporting customers on it for years.
         | 
         | That said I would like to defend ClickHouse and their use of
         | the logo. They acquired the IP and it's their right to use it
         | as they please. It's hard to imagine them _not_ using the
         | ClickHouse name. There 's a crowd of companies using/supporting
         | ClickHouse so it's the obvious way to market themselves.
         | 
         | The interesting question is whether it's a good thing for users
         | to have a large company controlling ClickHouse rather than
         | having it in a foundation.
        
         | tylerhannan wrote:
         | It is an interesting perspective...
         | 
         | FWIW, here is Alexey's perspective on the topic directly for
         | those of you following along at home.
         | 
         | https://clickhouse.com/blog/introducing-click-house-inc
        
       | Jemaclus wrote:
       | We are using Clickhouse Cloud at our company and the speed at
       | which it serves up data is mind-blowing to our team, which had
       | been using older systems for almost 10 years. Congrats on the
       | public beta, and we can't wait to see what comes next!
        
       | andrewmutz wrote:
       | Clickhouse was spun out of Yandex, which is a Russian
       | corporation. Given existing geopolitical tensions, is there
       | anything to worry about there?
       | 
       | Does anyone know how much of the Clickhouse team (or ownership)
       | is still located in Russia?
        
         | u2315 wrote:
         | They support Ukraine, however, given the company spun out of
         | Yandex, the latter is most certainly financially benefitting
         | from their success, and is paying taxes that are funding the
         | war. AFAIR, Yandex also has 2 director sits on Clickhouse
         | board, although that could have changed.
        
           | MajimasEyepatch wrote:
           | > ClickHouse, Inc. is a Delaware company with headquarters in
           | the San Francisco Bay Area. We have no operations in Russia,
           | no Russian investors, and no Russian members of our Board of
           | Directors. We do, however, have an incredibly talented team
           | of Russian software engineers located in Amsterdam, and we
           | could not be more proud to call them colleagues.
           | 
           | From their "We Stand With Ukraine" page. [1]
           | 
           | [1] https://clickhouse.com/blog/we-stand-with-ukraine
        
         | 46Bit wrote:
         | They have been extremely clear in their support of Ukraine
         | https://clickhouse.com/blog/we-stand-with-ukraine
        
           | risyachka wrote:
           | It's just a bunch of text, doesn't show any support
           | whatsoever.
           | 
           | They can show support by donating to UA defence and showing
           | proof (important - not some neutral org). Otherwise it is not
           | support but a bunch of bs
        
             | ericb wrote:
             | In places like Russia, those words are very dangerous, and
             | could get you jailed or sent to the front. Even calling it
             | a "war" was/is a punishable offense.
             | 
             | So, yes, words, but the potential consequences of these
             | words have more significance than empty air.
        
         | kelp wrote:
         | You can see on their jobs page and 'our story' page, they are
         | mostly in the US and The Netherlands.
        
         | izrailev wrote:
         | https://clickhouse.com/blog/we-stand-with-ukraine
        
           | risyachka wrote:
           | Just a bunch of words. Everyone says they support Ukraine if
           | it benefits their business.
           | 
           | Show proof
        
             | danielbln wrote:
             | What kind of proof are you looking for here?
        
       | jcuenod wrote:
       | Just wanted to chime in and say that I'm using a self-hosted
       | instance of CH for a small hobby project and the performance is
       | awesome.
        
       | kawsper wrote:
       | We're currently using InfluxDB for some timeseries metrics, but
       | the 2.0 migration path have been terrible, even for simple
       | examples, so we're looking for something else.
       | 
       | Have anyone migrated from InfluxDB to ClickHouse?
        
         | monstrado wrote:
         | Disclaimer: I work at ClickHouse
         | 
         | At a previous company, I wrote a simple TCP server to receive
         | LineProtocol, parse it and write to ClickHouse. I was
         | absolutely blown away by how fast I could chart data in Grafana
         | [1]. The compression was stellar, as well...I was able to store
         | and chart years of history data. We basically just stopped
         | sending data to Influx and migrated everything over to the
         | ClickHouse backend.
         | 
         | [1] https://grafana.com/grafana/plugins/grafana-clickhouse-
         | datas...
        
         | danielvf wrote:
         | Clickhouse is so stupidly, mindbogglingly fast at what it does
         | that you can often replace a much more specialized databases or
         | specialized schemas / queries with brute force Clickhouse and
         | come out far ahead. It really depends on the size of what you
         | are doing. I've used Clickhouse for timeseries data in the tens
         | of millions events per day and it worked well.
        
       | lambdadmitry wrote:
       | It's interesting how diligently they wiped our any and all
       | mentions of Yandex from their website, to the extent that Google
       | struggles to find meaningful mentions apart from a few exceptions
       | like [1]. Quite peculiar considering that just two years ago,
       | according to that presentation, they were still considering
       | themselves a Yandex project, even the link in that tweet is
       | https://clickhouse.yandex .
       | 
       | For those unaware, Yandex is a Russian internet megacorp, think
       | Google but in cahoots with the authoritarian government:
       | cherrypicked news coverage friendly to the government,
       | effectively a monopoly across multiple verticals (eg they bought
       | out Uber in Russia), etc. In 2020, the year that deck seems to be
       | from, they were already censoring their news feed [2] and
       | tweaking search ranking to promote pro-government results [3] for
       | years.
       | 
       | [1]: https://presentations.clickhouse.com/meetup40/introduction/
       | 
       | [2]: https://meduza.io/feature/2022/05/05/my-zamuchilis-borotsya
       | in Russian, but google translate does a reasonable job
       | 
       | [3]: https://www.svoboda.org/a/30580605.html same
        
         | ceejayoz wrote:
         | > think Google but in cahoots with the authoritarian government
         | 
         | Sure, but the split appears have been very successful in this
         | regard.
         | 
         | https://clickhouse.com/blog/we-stand-with-ukraine
        
           | lambdadmitry wrote:
           | Yes, it seems the split was triggered by the war, but the
           | censorship/promotion/collaboration with the government [0]
           | were not worth breaking the ties.
           | 
           | A cynical view would be that it's only now that the
           | association became bad for business.
           | 
           | [0]: By the way, Yandex was showing Crimea as unqualifiedly
           | Russian territory since 2014, until the war, when they just
           | removed borders between countries completely
           | 
           | Edit: qualified the remark
        
             | kgeist wrote:
             | >By the way, Yandex was showing Crimea as unqualifiedly
             | Russian territory since 2014
             | 
             | Google and Apple used to show it as part of Russia to
             | Russian users as well.
             | 
             | https://www.google.com/amp/s/techcrunch.com/2019/11/27/appl
             | e...
        
       | vorillaz wrote:
       | Happy ClickHouse user here. This is one amazing piece of software
       | to be honest, for anyone ever wanted to parse, analyse and query
       | billions of time series entry points ClickHouse is the way to go.
       | 
       | The cloud offering seems like an amazing product for companies
       | that they could afford I am not sure if the billing is right, but
       | for 5M inserts per month the total bill would be $62K.
        
         | IanCal wrote:
         | A write unit is not the same thing as a single insert, if for
         | that cost you've multiplied it up.
        
         | teacpde wrote:
         | That's quite high price tag per insert, do you have to write
         | large amount of data per insert?
        
         | jcims wrote:
         | I hate the part of my brain that has allowed the name to
         | interfere with my interest in even looking at it.
        
       | edf13 wrote:
       | Slightly aside...
       | 
       | OLAP/Column dbs generally work well for large bulk inserts with
       | many analytical queries...
       | 
       | How do clickhouse (or other column dbs) work with larger inserts
       | (e.g. log data type inserts)?
        
         | sharms wrote:
         | Uber is using it for several PB of log data at scale:
         | https://www.uber.com/en-IN/blog/logging/
        
       | wizwit999 wrote:
       | Clickhouse is great but the ops and scaling make it notoriously
       | difficult to self host.
       | 
       | If you have a lot of log data and want something open source and
       | serverless you can self host, check out Matano
       | (https://github.com/matanolabs/matano).
        
         | [deleted]
        
       | didip wrote:
       | The 2 things that makes me hesitant about ClickHouse is that:
       | 
       | 1. Data rebalancing is not automatic.
       | 
       | 2. It doesn't really have a concept of cold-tier pushed to S3
       | directly, so cluster management is not simple for a small team.
       | 
       | Other than that, ClickHouse looks super amazing.
        
         | zX41ZdbW wrote:
         | ClickHouse can push cold data directly to S3, so S3 will be
         | used as cold storage and the local filesystem as hot storage.
         | 
         | Another approach is to store all in S3 with local caching, it
         | is much more easy and somewhat more efficient.
         | 
         | ClickHouse Cloud covers these concerns.
        
           | didip wrote:
           | Ah this must be a new feature? If you could share the docs
           | related to this, I'd really appreciate it.
        
       | base wrote:
       | Is there an easy way to have ClickHouse Cloud ingest data from
       | MySQL hosted in Amazon RDS?
        
         | thomoco wrote:
         | There are a few options for migrating or synchronizing data
         | from MySQL - I'd recommend starting with this page in the
         | ClickHouse Docs - there is a nice video there that explains
         | some of those options:
         | 
         | https://clickhouse.com/docs/en/integrations/migration/
         | 
         | Depending on what you are trying to achieve, you could use
         | clickhouse-local with the MySQL engine to move data, or could
         | use an ETL/ETL tool to migrate/sync
        
           | hodgesrm wrote:
           | Good point. If you just want to copy data far and away the
           | easiest way to transfer data is using MySQLDatabaseEngine.
           | You can even copy table definitions. Watch for issues with
           | Decimal datatypes if you do this.
        
             | morelisp wrote:
             | Be careful with this engine, it's easy to accidentally
             | expose the password as you only need table read permissions
             | if it wasn't set up using an external credential file.
             | 
             | https://github.com/ClickHouse/ClickHouse/issues/3311
             | 
             | I also had some pretty bad join performance (CH table
             | joined to MySQL table), the quick solution to both of these
             | is that we instead use the table function
             | (https://clickhouse.com/docs/en/sql-reference/table-
             | functions...) to copy the data periodically.
        
               | hodgesrm wrote:
               | Use Named Collections to protect credentials. They are
               | very handy. Here's an article that discusses use in the
               | JDBC Bridge.
               | 
               | https://altinity.com/blog/connecting-clickhouse-to-
               | external-...
               | 
               | Disclaime: I work for Altinity.
        
         | hodgesrm wrote:
         | Check out the Altinity Sink Connector for ClickHouse [0]. This
         | is advancing quite quickly and already has prod deployments.
         | Please feel free to try it out.
         | 
         | [0] https://github.com/Altinity/clickhouse-sink-connector
        
       | epberry wrote:
       | Great stuff! I know the ClickHouse team and they are world class.
       | If you're more familiar with a relational database some things
       | will feel weird. For example, you should not insert 1 row at a
       | time, there's over 1,000 built-in functions, and the default
       | connection is often over HTTPS, but not 443.
        
       | zX41ZdbW wrote:
       | I have (an extremely boring, but quite hands-on) video about
       | various ways of ClickHouse optimizations on top of external
       | storage: https://www.youtube.com/watch?v=rK2BsaaaOCA (starting
       | from around 40:00).
        
       | datalopers wrote:
       | Clickhouse is an amazing product but this pricing looks
       | excessive.
       | 
       | A single node instance with a fast disk is more than sufficient
       | for most needs: https://hub.docker.com/r/clickhouse/clickhouse-
       | server
       | 
       | If you need a cluster, https://github.com/Altinity/clickhouse-
       | operator makes things easy
        
         | derefr wrote:
         | We're a company that has been using the private beta of
         | ClickHouse Cloud.
         | 
         | IMHO, the unique value-prop of this offering is that it
         | elastically scales reads (compute), like
         | Redshift/Snowflake/BigQuery/other cloud data warehouses do,
         | while also "being Clickhouse", and so giving you very
         | traditional SQL querying capabilities (where those others all
         | ask you to bend your SQL to fit the DB, sometimes in pretty
         | ridiculous ways.)
         | 
         | I would suggest not thinking of this offering as "ClickHouse,
         | but in the cloud"; but rather, thinking of this as "a cloud
         | data warehouse, but rather than using its own proprietary query
         | engine, it's just ClickHouse."
         | 
         | If you haven't evaluated other cloud data warehouses as
         | alternatives to solving your problem (and found them wanting
         | for one reason or another), then you're likely not in the niche
         | that would see a positive profit margin on using ClickHouse
         | Cloud.
        
         | beoberha wrote:
         | I'm sure there are tons of people very willing to pay for this
         | if it means not having to run their own k8s clusters. Managed
         | offerings like this are expensive, but worth every penny for
         | some.
        
       | thomoco wrote:
       | I wanted to note that ClickHouse Cloud results are now also being
       | reported in the public ClickBench results:
       | https://benchmark.clickhouse.com/
       | 
       | Good to see transparent comparisons available now for Cloud
       | performance vs. self-hosted or bare metal results as well as
       | results from our peers. The ClickHouse team will continue to
       | optimize further - as scale and performance is a relentless
       | pursuit here at ClickHouse, and something we expect to be
       | performed transparently and in a reproducible manner. Public
       | benchmarking benefits all of us in the tech industry as we learn
       | from each other in sharing the best techniques for attaining high
       | performance within a cloud architecture
       | 
       | Full disclosure: I do work for ClickHouse, although have also
       | been a past member of SPEC in developing and advocating for
       | public, standardized benchmarks
        
         | morelisp wrote:
         | Can you clarify what a "write unit" is? Naively it sounds like
         | it might be blocks x partitions x replicas that actually hit
         | disk. (Which is also probably not very clear to people not
         | already using CH, but I have at least middling knowledge of
         | CH's i/o patterns and I have no clue what a "write unit" is
         | from the page's description.)
        
           | tylerhannan wrote:
           | It's Tyler from ClickHouse.
           | 
           | Check out the response below that has a reference to some of
           | our billing FAQs.
        
             | morelisp wrote:
             | Right, that link covers read units which is also what I
             | expected - essentially the number of files I have to touch
             | - but I still have no clue about write units.
             | 
             | Is one block on one non-partitioned non-distributed table
             | one write unit? What about one insert that's two blocks on
             | such a table? What about one block on a null engine with
             | two MVs listening to insert into two non-partitioned non-
             | distributed tables? What if the table is a replacing
             | mergetree, do I incur WUs for compactions? etc.
             | 
             | My worry is that it is essentially 1 WU = 1 new part file,
             | which I understand makes sense to bill on but is
             | tremendously intransparent for users - at least I have no
             | clue how often we roll new part files, instead I'm focused
             | on total network and disk i/o performance on one side and
             | client query latency on the other.
        
               | qoega wrote:
               | I may assure you that 1WU is not 1 part. Not even close.
               | You can check it using trial credits with your data.
               | 
               | For example, I just checked that uploading 1.1GB example
               | table(cell_towers with 14 columns) cost me 0.38 write
               | units.
        
               | morelisp wrote:
               | Then I'm even more confused, because the pricing page
               | clearly says write operations consume _at least_ one WU.
        
               | aseipp wrote:
               | Where does it say that? The pricing page says on "Writes"
               | in the info tooltip: "Each write operation (INSERT,
               | DELETE, etc) consumes write units depending on the number
               | of rows, columns, and partitions it writes to."
               | 
               | This doesn't imply to me that each individual INSERT
               | costs 1 WU, but that it could be fractional. I guess it
               | depends on how you read it?
        
               | morelisp wrote:
               | The tooltip has been changed since my comment was posted;
               | it's now not incorrect, but it still doesn't really tell
               | me more useful information.
               | 
               | (See https://news.ycombinator.com/item?id=33081099 for
               | the original wording.)
        
               | yashap wrote:
               | With analytical column store DBs the standard is to do
               | massive batches writes of thousands to millions of
               | records at a time, vs. inserting individual records.
               | Inserting individual records is basically always crazy
               | inefficient with column stores. So a single write is
               | generally for thousands to millions of records.
        
               | morelisp wrote:
               | Buddy, if you look just a _couple_ posts up you 'll see
               | me comment on how ClickHouse's actual disk format works.
               | You don't need to explain batching to me.
               | 
               | Nonetheless you can't insert a-whole-file-and-just-that-
               | file in _less than one write_.
        
             | eloff wrote:
             | It doesn't mention anything about what a write unit is,
             | except to say you can reduce write units by batching
             | inserts (that part I guessed already.)
             | 
             | There's no way to think about what an actual write unit
             | means. You could measure the costs on a sample workload,
             | but that's far from ideal. Some transparency here would be
             | nice.
             | 
             | I understand the answer is complicated, based on hairy
             | implementation details, and subject to change. Give me the
             | complexity and let me interpret it according to my needs.
        
               | tylerhannan wrote:
               | Absolutely.
               | 
               | Working on updating the FAQ and tooltips now and sharing
               | your feedback. <3
        
         | latchkey wrote:
         | Wow, I hadn't heard of StarRocks before... seems like an
         | interesting competitor.
         | 
         | https://starrocks.io/blog/clickhouse_or_starrocks
        
         | ignoramous wrote:
         | Interesting set of results. Ignoring _ClickHouse_ , _StarRocks_
         | seems to be better in almost all metrics.
         | 
         | I was curious to compare MonetDB, DuckDB, ClickHouse-Local,
         | Elasticsearch, DataFusion, QuestDB, Timescale, and Athena.
         | Amazingly, MonetDB shows up better than DuckDB in all metrics
         | (except storage size), and Athena holds its own and fares
         | admirably well, esp given that it is _stateless_. While,
         | Timescale and Quest did not come up as good as I hoped they
         | would.
         | 
         | https://benchmark.clickhouse.com/#eyJzeXN0ZW0iOnsiQXRoZW5hIC...
         | 
         | It'd be interesting to see how rockset, starburst
         | (presto/trino), and tiledb fare, if and when they get added to
         | the benchmark.
        
           | mytherin wrote:
           | The particular way in which the data is loaded into DuckDB
           | and the particular machine configuration on which it is run
           | triggers a problem in DuckDB related to memory management.
           | Essentially the standard Linux memory allocator does not like
           | our allocation pattern when doing this load, which causes the
           | system to run out-of-memory despite freeing more memory than
           | we allocate. More info is provided here [1].
           | 
           | As it is right now the benchmark is not particularly
           | representative of DuckDB's performance. Check back in a few
           | months :)
           | 
           | [1] https://github.com/duckdb/duckdb/issues/3969#issuecomment
           | -11...
        
             | ignoramous wrote:
             | Thanks. Btw, we use DuckDB (via Node/Deno) for analytics
             | (on Parquet/JSON), and so I must point out that despite the
             | dizzying variation among various language bindings (cpp and
             | python seem more complete), the pace of progress, given the
             | team size, is god-like. It has been super rewarding to
             | follow the project. Also, thanks for permissively licensing
             | it (unlike most other source-available databases).
             | 
             | Goes without saying, if there are cost advantages to be had
             | due to DuckDB's unique strengths, then _serverless_ DuckDB
             | Cloud couldn 't come here soon enough.
        
         | avereveard wrote:
         | Is lower time the right metric here? Seems normalizing per
         | price would make a more useful metric for big data as long as
         | the response time is reasonable
        
           | thomoco wrote:
           | Yes, ClickBench results are presented as Relative Time, where
           | lower is better. You can read more on the specifics of
           | ClickBench methodology in the GitHub repository here:
           | https://github.com/ClickHouse/ClickBench/
           | 
           | There are other responses from ClickHouse in the comments on
           | the pricing, so I'll defer to their expertise on that topic
           | there. Thank you for your feedback and ideas, as normalizing
           | a price-based benchmark is an interesting concept (and where
           | ClickHouse would expect to lead also given the architecture
           | and efficiency)
        
           | tbragin wrote:
           | This benchmark focuses on analytical query latency for
           | representative analytical queries, so yes - lower number is
           | better.
        
         | carlineng wrote:
         | To help understand the results of the benchmark, I find it
         | helpful to look at how the benchmark is constructed, and what
         | it tests for. From the README:
         | 
         | "The dataset is represented by one flat table. This is not
         | representative of classical data warehouses, which use a
         | normalized star or snowflake data model. The systems for
         | classical data warehouses may get an unfair disadvantage on
         | this benchmark."
         | 
         | Taking a look at the queries [0], it looks like it mostly
         | consists of full table scans with filters, aggregations, and
         | sorts. Since it's a single table, there are no joins.
         | 
         | [0]:
         | https://github.com/ClickHouse/ClickBench/blob/main/snowflake...
        
       | MentallyRetired wrote:
       | Some unsolicited feedback:
       | 
       | 1) I had no idea what Clickhouse was for the first 30 seconds
       | looking at the homepage. I now understand it to be a database of
       | some sort. I shouldn't have seen the words "performance" and
       | "cloud" and "serverless" before seeing the word database, right?
       | I'm starting off confused. There shouldn't be an assumption that
       | I know what you all do.
       | 
       | 2) I have no idea what a column oriented database is. I've been a
       | developer for 29 years (mostly frontend but I do a lot of full
       | stack too). If I need an explainer, a lot of devs will.
       | 
       | Aside from that, it looks like a nice offering and I wish you all
       | the best!
        
         | TotoHorner wrote:
         | Column oriented database really isn't esoteric knowledgeable.
         | You should check out DDIA, especially if you're doing full
         | stack
        
         | tylerhannan wrote:
         | Thanks for the well wishes!
         | 
         | And thanks for the honest feedback.
         | 
         | It's always an interesting balance of promoting a new thing
         | (Cloud) and explaining an existing thing. This might be
         | helpful.
         | 
         | https://clickhouse.com/docs/en/home
         | 
         | (note: I work at ClickHouse)
        
           | tylerhannan wrote:
           | And https://clickhouse.com/docs/en/intro/ is a bit lower
           | level.
        
         | lkrubner wrote:
         | "column oriented database"
         | 
         | We all have our specialties, and that is fine. It is a common
         | pattern that a developer gets comfortable with a particular
         | tech stack, and then uses it for many years without seeing the
         | need for much else. Some developers use Ruby on Rails plus
         | Postgres for everything, others use C# and .NET and SQL Server.
         | It's fine, if that's all you need.
         | 
         | Still, this is the year 2022. Cassandra, to take one example,
         | was released in 2008. For everyone who has needed these fast-
         | read databases, they've been much discussed for 14 years,
         | including here on Hacker News, and on every other tech forum.
         | At this point I think a company can simply assume that most
         | developers will have some idea what a column database is.
        
           | ipaddr wrote:
           | You never explained what a column database is. One row and
           | unlimited columns?
        
         | cinbun8 wrote:
         | If you don't know what a column oriented DB is, that page is
         | probably not for you.
        
           | pqdbr wrote:
           | Care to elaborate? I'm a fullstack Rails dev for 12 years and
           | I had no idea either, just like OP. Why alienate potential
           | users from the get go?
        
           | bagels wrote:
           | I know what a column oriented DB is, but ClickHouse was not
           | on my radar before.
           | 
           | The pitch on the landing page is that ClickHouse Cloud is
           | great if you love ClickHouse. If you don't know what
           | ClickHouse is, you have to do some work to find out.
        
       | risyachka wrote:
       | Note: the final beneficiaries are russians and there is huge
       | probability that taxes from clickhouse cloud will go to russian
       | budget to sponsor war.
       | 
       | Clickhouse inc is subsidiary of Clickhouse B.V in Netherlands and
       | is controlled by a bunch of russians.
        
         | hunterb123 wrote:
         | Under that logic all of SV sponsors war in the middle east.
         | 
         | Can we leave the russiaphobia in the political threads at
         | least?
        
         | DandyDev wrote:
         | You have left multiple comments now om this post with
         | disparaging remarks about the Clickhouse team without any
         | evidence to backup your claims.
         | 
         | Not super classy if you ask me.
        
       ___________________________________________________________________
       (page generated 2022-10-04 23:00 UTC)