[HN Gopher] Building a new vector based storage model ___________________________________________________________________ Building a new vector based storage model Author : bluestreak Score : 51 points Date : 2021-05-11 13:51 UTC (9 hours ago) (HTM) web link (questdb.slab.com) (TXT) w3m dump (questdb.slab.com) | bluestreak wrote: | We launched QuestDB last summer [1, 2]. Our storage model is | vector-based and append-only. This meant that all incoming data | had to arrive in the correct time order. This worked well for | some use cases but we increasingly saw real-world cases where | data doesn't always land at the database in chronological order. | We saw plenty of developers and users come and go specifically | because of this technical limitation. So it became a priority to | deal with out-of-order data. | | The big decision was which direction to take to tackle the | problem. LSM trees seemed an obvious choice, but we chose an | alternative route so we wouldn't lose the performance we spent | years building. Our latest release supports out-of-order | ingestion by re-ordering data on the fly. That's what this | article is about. | | Also, we had many people asking about the differences between | QuestDB and other open-source databases and why users should | consider giving it a try instead of other systems. When we | launched on HN, readers showed a lot of interest in side-by-side | comparisons to other databases on the market. One suggestion [3] | that we thought would be great to try out was to benchmark | ingestion and query speeds using the Time Series Benchmark Suite | (TSBS) [4] developed by TimescaleDB. We're super excited to share | the results in the article. | | [1] https://news.ycombinator.com/item?id=23975807 | | [2] https://news.ycombinator.com/item?id=23616878 | | [3] https://news.ycombinator.com/item?id=23977183 | | [4] https://github.com/timescale/tsbs | Darkphibre wrote: | Oh, this is fascinating. Seven years ago I architected a true- | realtime telemetry pipeline with end-to-end sequential | guarantees (with roundtrip times <200ms excluding network | latencies, and cloud processing times <20ms, leveraging | BOND/ProtocolBuffer over AMQP over Websocket). It's still used | by every 1st-party game for a large publisher. | | It allowed for non-windowed event sequence analytics, enabling | realtime feedback (think achievements that have multiple | conditions). | | And then the requirement was dropped, and (as you've found), | everyone just uses it like a standard telemetry stream and is | OK with 5-15min bins. :P | | I still have a passion for the space, will definitely be | reading up on this. I firmly believe this is the future of | telemetry analytics; Congratulations on your efforts seeing the | light of day!! | | Disclaimer, I currently work for Microsoft, all words here are | my own and do not necessarily reflect those of my employer, | etc. ;) | j1897 wrote: | Thanks for the kind words and your perspective ! | [deleted] | alcio wrote: | Excited to see this new release. Seems to me this would | (slightly?) negatively impact query performance for recent data | (when the query concerns data is both in O3 and persisted zones), | is that the case? | bluestreak wrote: | Query performance would be affected in so far as ingest jobs | share the same thread pool as query jobs. As I am writing this | I am also realising that perhaps we should have an option to | separate these jobs... If we ignore resource usage and commit() | latency, query performance would remain unaffected. Reader | remains lockless largely unchanged code-wise. This was one of | our major objectives to maintain data model as seen by the | readers. I hope I'm making sense here? | hartem_ wrote: | Congrats on the release! The benchmark results look really | impressive :). | | Curious to learn more about your approach to verifying the | correctness of the implementation. Did you try testing it with | Jepsen or something similar? | bluestreak wrote: | Thank you! We are not yet distributed. That's coming right up | along with Jensen style tests. We are really serious about | testing! ___________________________________________________________________ (page generated 2021-05-11 23:00 UTC)