[HN Gopher] No, QuestDB is not Faster than ClickHouse ___________________________________________________________________ No, QuestDB is not Faster than ClickHouse Author : krnaveen14 Score : 89 points Date : 2022-06-16 16:21 UTC (6 hours ago) (HTM) web link (telegra.ph) (TXT) w3m dump (telegra.ph) | bluestreak wrote: | Our article in question can be found here: | https://questdb.io/blog/2022/05/26/query-benchmark-questdb-v... | | The intent of the article was to showcase JIT-optimised WHERE | clause and we did not use any indexes on QuestDB. | [deleted] | [deleted] | PeterZaitsev wrote: | If your intent it to showcase the new optimization in the | product it is best to compare it to your own old version | olluk wrote: | Comparison with old version is actually in the article for | the patient reader. It could go to the top but I don't think | it will make a difference. At the end of the day it is the | article at the official QuestDB website which gives the | reader a spoiler about the bias. | | I am intrigued what Timescale is going to publish next. | qoega wrote: | Agree. And for a blog post it can even have a story like: "We | compared with ClickHouse and we were 10x slower, than we | looked at this case and made it 100x faster. Thank you, | benchmark and ClickHouse developers that showed us use case | where we could do better." | | For me benchmarking is usual - "Why this query takes so long? | We need to improve it. Sometimes 1000x times." | untitaker_ wrote: | Right? How do the folks at QuestDB know that their new JIT | engine is actually responsible for those performance | improvements? My understanding is that, index or not, data is | still sorted by time in questdb, which is exactly what the | ClickHouse engineers are replicating in the new schema. | bluestreak wrote: | The query Clickhouse picked on does not actually leverage | time order. Perhaps clickhouse vendors on this thread can | comment on relevance of the date partitioning for this | query. My best guess is that it might help the execution | logic to create data chunks for parallel scan. | | QuestDB does also use partitions for this purpose but we | also calculate chunks dynamically based on available CPU to | distribute load across cores more evenly | datalopers wrote: | Please don't post competitor benchmarks until you can hire | someone who has a slight clue what they're doing. All you're | demonstrating is the sheer incompetency at QuestDB. | bluestreak wrote: | I am in fact very proud of my team, who worked very hard on | both implementation and the article. It is disappointing to | read unfounded insults where we made every effort to be fair. | hodgesrm wrote: | I appreciate your benchmark and was interested to learn | about how QuestDB processes TSBS queries efficiently. I | work extensively with ClickHouse and it's always | enlightening to learn about how other databases achieve | high performance. Your descriptions of the internals are | clear and easy to follow, especially since you included | comparisons with older versions of QuestDB. | | That said, I think I can understand how some users might be | a little put off by the comparisons. Your article | effectively says "ClickHouse is really slow" without giving | readers any easy way to judge what was happening under the | covers. I was personally a bit frustrated not to have the | time to set up TSBS and dig into what was going on. I | therefore appreciated Geoff's effort look up the results | and show that the default index choices didn't make a lot | of sense for this particular case. That does not detract | from QuestDB's performance at least from my perspective. | | Anyway congratulations on the performance improvement. As a | famous character in Star Wars said, "we will watch your | career with great interest." | | edit: correct typo | Dzugaru wrote: | "while QuestDB utilizes its full indexing strategy to read | just a tiny fraction of the actual data" | | Can you please elaborate on this? | bluestreak wrote: | Full disclosure: I am CTO of QuestDB and I took part in | JIT implementation. The quote above is not mine, it was | written by Clickhouse staff. "utilizes its full indexing | strategy" statement is false and is news to me. | Dzugaru wrote: | So you do a full scan and it's ~50 CPU cycles per row (48 | CPUs at 4 GHZ), correct? This is possible I guess? And in | this case Clickhouse is wrong. | pepemon wrote: | So, QuestDB is faster or not? I'm puzzled now! | olluk wrote: | Looks like QuestDB is faster if you don't optimize your | table storage for 1 query. | | But if you are okay that only limited number of columns | to be scanned faster than others ClickHouse comes first. | datalopers wrote: | PeterZaitsev wrote: | I wonder what "every effort to be fair" means ? The first | thing you could have done is reach out to ClickHouse | Community to ask for optimization suggestions | bluestreak wrote: | "fair" means that we comparing apples to apples. Ad-hoc, | unindexed predicate, compiled by QuestDB into AVX2 | assembly (using AsmJIT) vs same predicate complied by | Clickhouse (I'm assuming by LLVM). One can perhaps view | this as comparing SIMD-based scans from both databases. | Perhaps we generate better assembly, which incidentally | offers better IO. | | We all understand that creating very specific index might | improve specific query performance. Great, Clickhouse | geared the entire table storage model to be ultra | specific for latitude search. What if you search by | longitude, or other column? Back to the beginning. | | JIT-compiled predicates offer arbitrary query | optimisation with zero impact on ingestion. This is | sometimes useful. | | What would you offer assuming that we reached out, other | than creating an index? | | Clickhouse does better than we do in other areas. It JITs | more complicated expressions, such as some date | functions. It optimises count() queries specifically. For | example we collect "found" rowed_ids in an array. | Clickhouse does not specifically for count(). We still | have work to do. On other hand we ingested this very | dataset about 5x quicker than clickhouse, which we left | out because article is not about "QuestDB is faster than | Clickhouse" | olluk wrote: | What if the purpose of the article is to compare queries | without indexes? | jsnell wrote: | Doesn't matter, since that clearly wasn't the purpose of | the article. After all, they were totally happy to add an | index for another competing DB as long as they happened | to win that comparison. Then they crow about how they | beat having an index. | | Pretty sleazy. | xenator wrote: | So, maybe do not create specific scenarios for corner | cases and then generalize outcome? And write articles | about common scenarios that is important for people who | will use technology on daily basis. | olluk wrote: | My personal view is that having fast queries without | indexes is quite general outcome. | avianlyric wrote: | What an extremely unfair comment. Having read QuestDBs blog, | it's quite clear they've taken great pains to point out that | a single specific benchmark isn't the be all and end all of | DB analysis. | | They quite clearly start out by saying they're only looking | to demonstrate the impact of a specific new DB feature | they've created, and are using benchmarks that illustrate the | difference. They make zero claims that QuestDB is faster than | Clickhouse overall, and quite carefully point out that | prospective users need to run their own benchmarks on their | own data to figure out what DB will work for them. | dimgl wrote: | > They make zero claims that QuestDB is faster than | Clickhouse overall | | Are you sure? Just one look at their website says | differently. | | https://questdb.io/time-series-benchmark-suite/ | | I don't use these tools. I just wanted to point out that | what you're saying is disingenuous. | thegeomaster wrote: | Sounds like they didn't re-do the QuestDB benchmark with same | change to the indexes, and so their claim is that Clickhouse is | 27x faster with a specific index than QuestDB without that index. | Which is not a fair comparison. | | Also, the tone of the post sounds really arrogant. They try to | hide it a bit, I feel, but it just seeps through. | axlee wrote: | I didn't really read it as arrogant, more as annoyed about a | mischaracterization that was disparaging their product. | SOLAR_FIELDS wrote: | It's also part of a longer trend of saber rattling between | these vendors - there's a history of these types of posts | also from TimescaleDB: | https://news.ycombinator.com/item?id=29096541 | qoega wrote: | There is a small list of vendors that do not forbid to run | benchmarks with their systems. | https://cube.dev/blog/dewitt-clause-or-can-you-benchmark- | a-d... | | That is why there is a small subset of vendors that are | being 'attacked' by this comparisons. | bombcar wrote: | More and more we start to see _why_ these forbids are in | place. | Dzugaru wrote: | Well, I don't know how QuestDB works, and I couldn't find | anything in the original benchmark, but probably they already | have some sort of (geo)index in place? It's really strange to | search geo-data by scanning the whole surface of the Earth. The | point that Clickhouse outperforms this by just sorting on one | axis (and even not using any fancy 2D indices) is reasonable. | olluk wrote: | No, there are no indexes in QuestDB in the article. None. | Zero. That's bold mistake in the ClickHouse article. Should | be named Yes, QuestDb is Faster. | [deleted] | [deleted] | Dzugaru wrote: | Yeah, I've read more carefully and it seems they're doing | full scan. | tomhallett wrote: | I was curious to hear more details about this statement - | "while QuestDB utilizes its full indexing strategy to read | just a tiny fraction of the actual data". Did QuestDB create | indexes in their QuestDB benchmark but just not mention it? | Are there geoindexes which are automatically enabled which do | help (but are of less value in the general sense from | Clickhouse' perspective)? | twoodfin wrote: | I don't know how QuestDB is implemented in any detail, but | this statement struck me as confused. My understanding is | that for this query, QuestDB is performing a full scan of | the relevant columns, and the point of the blog post was | how fast their JIT engine for filtering makes this. | [deleted] | olluk wrote: | There were 2 queries in the QuestDB benchmark over the same | table. ClickHouse didn't even try to match both of them | choosing one as a victim. I guess that's what happens when you | optimise the data storage for one query. | gauravphoenix wrote: | I have always felt that DB benchmarks are useless, always use | your own dataset | | https://gauravkumar.blog/performance-benchmarks-are-useless.... | nojito wrote: | This is why the commercial offerings do not allow you publish | benchmarks. | PeterZaitsev wrote: | Which is horrible thing. Even bad benchmarks often create | create discussions | nojito wrote: | Not true at all. Most people take benchmarks as gospel | because they value their time. | capableweb wrote: | All benchmarks are always useless, in 90% of the cases. They | could maybe give some baseline understanding, but it's | important to always do your own benchmarks as your performance | can be very different than what the benchmark showed, simply | because the data/data structures are slightly different. | | Do your own benchmarks people! | PeterZaitsev wrote: | This response illustrates important point - if you're expert in | technology A and compare it to technology B, you're not expert | in, comparison is very likely to be unfair. | | I very much would like to see vendors at least to follow | Journalist ethics and reach out to their competition for | optimization comments and suggestions before publishing it, so | others are given a chance to suggest optimizations | hodgesrm wrote: | Agree. Or just load test on your own software, publish how you | did it, and let other vendors respond for themselves. | klysm wrote: | Yeah this happens a lot. I like it when people maintain a repo | that accepts changes for the comparison | snikolaev wrote: | Then you should like https://db-benchmarks.com/ | PeterZaitsev wrote: | Great idea. | noxvilleza wrote: | Is there an existing named adage for something like "if one | creates a benchmark in order to rank general performance of some | products, some of those products will ultimately sacrifice | general performance in order to optimize for that benchmark"? | nathanwh wrote: | https://en.wikipedia.org/wiki/Goodhart's_law | | You stated it almost directly. | noxvilleza wrote: | Oh dear. I did a brief search for 'adage on benchmarking' and | only saw Rugg/Feldman benchmarks. | bombcar wrote: | It's also why the only true benchmark _is using the thing | as it needs to be used_ - but this is hard to compare | because often you need code to work with the tool and vice- | versa. | gfody wrote: | there are the TPC benchmarks which try to cover a wide | variety of use cases and scenarios and are designed | independently from any one engine: | https://www.tpc.org/information/benchmarks5.asp | | you post the results for your own product, others do the | same, customers can compare: | https://www.singlestore.com/blog/tpc-benchmarking- | results/ | qoega wrote: | It is partially true, but this benchmarks force schema. | You can't reorganise data for example in wide table or | add indices. So it actually does not show you how to use | the system to solve this type of problems in a best way | possible, but checks unoptimised results as if you never | learn and never utilise best practices of the DBMS you | choose for production. | [deleted] | [deleted] ___________________________________________________________________ (page generated 2022-06-16 23:00 UTC)