[HN Gopher] CERN swaps out databases to feed its petabyte-a-day ...
       ___________________________________________________________________
        
       CERN swaps out databases to feed its petabyte-a-day habit
        
       Author : valyala
       Score  : 75 points
       Date   : 2023-09-20 06:46 UTC (1 days ago)
        
 (HTM) web link (www.theregister.com)
 (TXT) w3m dump (www.theregister.com)
        
       | bouvin wrote:
       | One of my fondest memories as a summer student at CERN in 1993
       | (in the Electronics and Computing for Physics department) was the
       | visit to the basement beneath the main computing facility, where
       | a colossal tape robot was in operation. Even at that time, CERN
       | was grappling with exceedingly vast amounts of data.
        
       | foota wrote:
       | Weird that the title talks about the petabyte a day, while the
       | article is actually about their monitoring tooling, not the thing
       | ingesting the data from experiments, iiuc.
        
       | m3kw9 wrote:
       | Over 24hr period its more then 11 Gigabytes/second or rounding to
       | 100 gbps. Those shards must be pretty crazy
        
         | formerly_proven wrote:
         | The headline is about the data processed on their compute, the
         | amount of data in the monitoring system is considerably smaller
         | (but still not small data):
         | 
         | > But Brij Kishor Jashal, a scientist in the CMS collaboration,
         | told The Register that his team were currently aggregating 30
         | terabytes over a 30-day period to monitor their computing
         | infrastructure performance.
         | 
         | So 1 TB / day, that's about 10 MB/s.
        
       | sgt101 wrote:
       | I can do this on my laptop
       | 
       | /tumbleweed...
        
       | qwertox wrote:
       | At the end of the article it says
       | 
       | " _InfluxDB said in March this year it had solved the cardinality
       | issue with a new IOx storage engine._ "
       | 
       | Does this mean that in the end it wasn't really necessary to
       | switch to VictoriaMetrics' offering?
        
       | esafak wrote:
       | tl,dr:
       | 
       | Speaking to The Register, Roman Khavronenko, co-founder of
       | VictoriaMetrics, said the previous system had experienced
       | problems with high cardinality, which refers to the level of
       | repeated values - and high churn data - where applications can be
       | redeployed multiple times over new instances.
       | 
       | Implementing VictoriaMetrics as backend storage for Prometheus,
       | the CMS monitoring team progressed to using the solution as
       | front-end storage to replace InfluxDB and Prometheus, helping
       | remove cardinality issues, the company said in a statement.
        
       | amelius wrote:
       | This is nothing compared to what dragnet surveillance has to deal
       | with.
        
         | local_crmdgeon wrote:
         | And that's all on MSSQL or RDS, right?
        
       | ilyt wrote:
       | I really like VictoriaMetrics's architecture
       | 
       | vmagent takes care of all the pesky edge things like emulating
       | prometheus config parsing and various scraping bits. It also does
       | buffering in case you lose network connection for a while, and
       | accept vast spread of different protocols
       | 
       | vminsert/vmselect scale separately from eachother and your
       | queries don't bother your ingest all that much.
       | 
       | vmstorage does just that, storage. Only thing that bothers me
       | (compared to say, Elasticsearch), is that data can't migrate
       | between nodes so you can't "just" start a new one and drain an
       | old one, but a tiny bit ops work in rare cases is IMO price worth
       | paying for straightforwardness of the stack..
       | 
       | PromQL compatibility is also great, tools like Grafana "just
       | work" without anyone having to write support for it.
       | 
       | We started migrating from InfluxDB at work, and on my private
       | stuff I already did. Soo much less memory usage too.
        
         | theossuary wrote:
         | What version of Influx were you running? I'm interested if v3
         | will be more competitive than v2.
        
           | ilyt wrote:
           | 1.8, migration path to 2.0 was a no-no. Don't remember exact
           | reasons back then but we decided to have wait-and-see
           | approach and see how alternatives grow up as our data
           | generally grows in predictable rate
           | 
           | Also frankly Prometheus support is a massive positive. For
           | better or worse industry standarized on apps using Prometheus
           | as ingest for metrics, and also most of the materials related
           | to that will of course give examples in PromQL
           | 
           | Flux is frankly hieroglyphs for people using it 20 minutes a
           | month like our developers
           | 
           | This is given example on how to raise value in Flux to power
           | of two                   |> map(fn: (r) => ({ r with _value:
           | r._value * r._value }))
           | 
           | This is example of that in prometheus                   value
           | ^ 2
           | 
           | This is example of calculating percentage in Flux (from their
           | webpage)                   data           |>
           | pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn:
           | "_value")           |> map(               fn: (r) => ({
           | _time: r._time,                   _field: "used_percent",
           | _value: float(v: r.used) / float(v: r.total) * 100.0,
           | }),           )
           | 
           | This is how you do it in PromQL                   space_used
           | / space_total * 100
           | 
           | Flux is atrocious for "normal users".
        
       | iFire wrote:
       | OPENSOURCE, APACHE2 LICENSE
       | 
       | https://github.com/VictoriaMetrics/VictoriaMetrics/blob/mast...
        
         | [deleted]
        
       | [deleted]
        
       | Havoc wrote:
       | That's one hell of an endorsement. Marketing team won the
       | jackpot.
        
       | keep_reading wrote:
       | I also dropped InfluxDB at work due to its terrible performance.
       | VictoriaMetrics is great
       | 
       | I was using Promscale (TimescaleDB) but they EOL'd Promscale
       | which forced us to Victoria. But either way both of these are
       | much faster than Influx
       | 
       | Don't get fooled into the latest InfluxDB rewrite. I think the
       | latest is cloud hosted only too? So stupid
        
         | contravariant wrote:
         | Honestly the database isn't half as useful as the tool they
         | wrote to grab the metrics. At least I think telegraf was
         | written by the same people? It seems to have the exact opposite
         | design philosophy.
        
         | pphysch wrote:
         | I saw the writing on the wall with InfluxDB v2 (doubling down
         | on closed platform / SaaS) and advocated exploring
         | VictoriaMetrics, even though we had some Influx v1 running. No
         | regrets.
         | 
         | I also prefer the golang-esque simplicity of the Prometheus
         | ecosystem. Monitoring is the last place I want unnecessary
         | abstraction layers and complicated configuration files.
        
       | ComputerGuru wrote:
       | Missing from the title: leaving InfluxDB and Prometheus for
       | VictoriaMetrics.
        
         | hintymad wrote:
         | This is puzzling. I'm not sure how VictoriaMetrics solved the
         | cardinality problem? When running an aggregate query that sums
         | up some counters for a single metric over the dimension of
         | instances in a time window of larger than a few hours,
         | VictoriaMetrics would barf with error for the querying having
         | too many time series (or data points? I forgot the exact
         | wording). This clearly shows that 1/ Victoria Metrics does not
         | treat a time series with multiple dimensions as a single time
         | series; 2/ VictoriaMetrics does not perform hierarchical
         | aggregation.
         | 
         | That is, VictoriaMetrics has not really built a true time
         | series DB that handles reasonable cardinalities.
        
           | [deleted]
        
       ___________________________________________________________________
       (page generated 2023-09-21 23:01 UTC)