[HN Gopher] CERN swaps out databases to feed its petabyte-a-day ... ___________________________________________________________________ CERN swaps out databases to feed its petabyte-a-day habit Author : valyala Score : 75 points Date : 2023-09-20 06:46 UTC (1 days ago) (HTM) web link (www.theregister.com) (TXT) w3m dump (www.theregister.com) | bouvin wrote: | One of my fondest memories as a summer student at CERN in 1993 | (in the Electronics and Computing for Physics department) was the | visit to the basement beneath the main computing facility, where | a colossal tape robot was in operation. Even at that time, CERN | was grappling with exceedingly vast amounts of data. | foota wrote: | Weird that the title talks about the petabyte a day, while the | article is actually about their monitoring tooling, not the thing | ingesting the data from experiments, iiuc. | m3kw9 wrote: | Over 24hr period its more then 11 Gigabytes/second or rounding to | 100 gbps. Those shards must be pretty crazy | formerly_proven wrote: | The headline is about the data processed on their compute, the | amount of data in the monitoring system is considerably smaller | (but still not small data): | | > But Brij Kishor Jashal, a scientist in the CMS collaboration, | told The Register that his team were currently aggregating 30 | terabytes over a 30-day period to monitor their computing | infrastructure performance. | | So 1 TB / day, that's about 10 MB/s. | sgt101 wrote: | I can do this on my laptop | | /tumbleweed... | qwertox wrote: | At the end of the article it says | | " _InfluxDB said in March this year it had solved the cardinality | issue with a new IOx storage engine._ " | | Does this mean that in the end it wasn't really necessary to | switch to VictoriaMetrics' offering? | esafak wrote: | tl,dr: | | Speaking to The Register, Roman Khavronenko, co-founder of | VictoriaMetrics, said the previous system had experienced | problems with high cardinality, which refers to the level of | repeated values - and high churn data - where applications can be | redeployed multiple times over new instances. | | Implementing VictoriaMetrics as backend storage for Prometheus, | the CMS monitoring team progressed to using the solution as | front-end storage to replace InfluxDB and Prometheus, helping | remove cardinality issues, the company said in a statement. | amelius wrote: | This is nothing compared to what dragnet surveillance has to deal | with. | local_crmdgeon wrote: | And that's all on MSSQL or RDS, right? | ilyt wrote: | I really like VictoriaMetrics's architecture | | vmagent takes care of all the pesky edge things like emulating | prometheus config parsing and various scraping bits. It also does | buffering in case you lose network connection for a while, and | accept vast spread of different protocols | | vminsert/vmselect scale separately from eachother and your | queries don't bother your ingest all that much. | | vmstorage does just that, storage. Only thing that bothers me | (compared to say, Elasticsearch), is that data can't migrate | between nodes so you can't "just" start a new one and drain an | old one, but a tiny bit ops work in rare cases is IMO price worth | paying for straightforwardness of the stack.. | | PromQL compatibility is also great, tools like Grafana "just | work" without anyone having to write support for it. | | We started migrating from InfluxDB at work, and on my private | stuff I already did. Soo much less memory usage too. | theossuary wrote: | What version of Influx were you running? I'm interested if v3 | will be more competitive than v2. | ilyt wrote: | 1.8, migration path to 2.0 was a no-no. Don't remember exact | reasons back then but we decided to have wait-and-see | approach and see how alternatives grow up as our data | generally grows in predictable rate | | Also frankly Prometheus support is a massive positive. For | better or worse industry standarized on apps using Prometheus | as ingest for metrics, and also most of the materials related | to that will of course give examples in PromQL | | Flux is frankly hieroglyphs for people using it 20 minutes a | month like our developers | | This is given example on how to raise value in Flux to power | of two |> map(fn: (r) => ({ r with _value: | r._value * r._value })) | | This is example of that in prometheus value | ^ 2 | | This is example of calculating percentage in Flux (from their | webpage) data |> | pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: | "_value") |> map( fn: (r) => ({ | _time: r._time, _field: "used_percent", | _value: float(v: r.used) / float(v: r.total) * 100.0, | }), ) | | This is how you do it in PromQL space_used | / space_total * 100 | | Flux is atrocious for "normal users". | iFire wrote: | OPENSOURCE, APACHE2 LICENSE | | https://github.com/VictoriaMetrics/VictoriaMetrics/blob/mast... | [deleted] | [deleted] | Havoc wrote: | That's one hell of an endorsement. Marketing team won the | jackpot. | keep_reading wrote: | I also dropped InfluxDB at work due to its terrible performance. | VictoriaMetrics is great | | I was using Promscale (TimescaleDB) but they EOL'd Promscale | which forced us to Victoria. But either way both of these are | much faster than Influx | | Don't get fooled into the latest InfluxDB rewrite. I think the | latest is cloud hosted only too? So stupid | contravariant wrote: | Honestly the database isn't half as useful as the tool they | wrote to grab the metrics. At least I think telegraf was | written by the same people? It seems to have the exact opposite | design philosophy. | pphysch wrote: | I saw the writing on the wall with InfluxDB v2 (doubling down | on closed platform / SaaS) and advocated exploring | VictoriaMetrics, even though we had some Influx v1 running. No | regrets. | | I also prefer the golang-esque simplicity of the Prometheus | ecosystem. Monitoring is the last place I want unnecessary | abstraction layers and complicated configuration files. | ComputerGuru wrote: | Missing from the title: leaving InfluxDB and Prometheus for | VictoriaMetrics. | hintymad wrote: | This is puzzling. I'm not sure how VictoriaMetrics solved the | cardinality problem? When running an aggregate query that sums | up some counters for a single metric over the dimension of | instances in a time window of larger than a few hours, | VictoriaMetrics would barf with error for the querying having | too many time series (or data points? I forgot the exact | wording). This clearly shows that 1/ Victoria Metrics does not | treat a time series with multiple dimensions as a single time | series; 2/ VictoriaMetrics does not perform hierarchical | aggregation. | | That is, VictoriaMetrics has not really built a true time | series DB that handles reasonable cardinalities. | [deleted] ___________________________________________________________________ (page generated 2023-09-21 23:01 UTC)