[HN Gopher] Grafana Mimir - Horizontally scalable long-term stor...
       ___________________________________________________________________
        
       Grafana Mimir - Horizontally scalable long-term storage for
       Prometheus
        
       Author : devsecopsify
       Score  : 226 points
       Date   : 2022-03-30 13:13 UTC (9 hours ago)
        
 (HTM) web link (grafana.com)
 (TXT) w3m dump (grafana.com)
        
       | eatonphil wrote:
       | There isn't a link to the project on the page (that I could find)
       | so it almost looked like it's not open source. But here it is:
       | https://github.com/grafana/mimir.
        
         | notamy wrote:
         | You have to find the "Download" button and click it, it's very
         | non-obvious :< The entire page seems to be designed to funnel
         | you into signing up for their paid service, which makes sense,
         | but still doesn't feel great...
        
           | dewey wrote:
           | The first CTA button on the page "Tutorial" links to a
           | tutorial where the first step is to run the project with
           | Docker. Doesn't really feel like an overly forced funnel to
           | their paid service.
        
           | candiddevmike wrote:
           | Recently switched from their cloud service back to on-
           | premise. The cloud version wasn't being updated and the
           | entire setup experience left a lot to be desired with how you
           | connect their on-premise grafana agent, especially if you
           | aren't using their easy button deployment stuff. Also,
           | billing for metrics is insane, as on any given day my metric
           | load may vary between 5-7k or more. This caused some
           | operational overhead as I was constantly tweaking scrapers to
           | reduce useless metrics.
           | 
           | For $50/mo, you can self host everything easier, cheaper and
           | with more control IMO.
        
             | maccard wrote:
             | > For $50/mo, you can self host everything easier, cheaper
             | and with more control IMO.
             | 
             | Can you give an example as to how you could self host a
             | grafana stack for $50/month? On AWS that buys you 4 cores,
             | 8GB memory and 0 storage, and it's certainly not easier
             | than clicking one button on the grafana website.
        
               | jrwr wrote:
               | Two low end Hetzner/OVH Boxes for redundancy should do
               | the trick
        
               | m1keil wrote:
               | We are running Grafana and Prometheus on a single
               | t3.xlarge instance with 150GB gp3 EBS.
               | 
               | Excluding traffic, it costs ~ $100 USD per month.
               | 
               | We are doing 10 second scrapes and currently have roughly
               | 141k active time series. In Grafana Cloud it would
               | cost...
               | 
               | 15000 metrics for free. 126000/1000 * $8 = $882
               | 
               | Now here's the real kicker.. the pricing Grafana puts on
               | their website are assuming 60 second scrape interval (1
               | data point/minute or DPM). If you are doing 6 DPM, that's
               | $8 * 6 per 1000 time series!
               | 
               | So final bill.. _drum rolls_
               | 
               | 126000/1000 * $8 * 6 = $6048
               | 
               | Yes. That's a 60x.
               | 
               | Now, sure, we don't get the scale, the backups, the SLA..
               | but we can live without it. And when Prometheus will
               | start acting slowly, we will just bump it to t3.2xl, or
               | spend some time and filter out some of the noisy metrics
               | we might have around.
               | 
               | Btw, if you try to find any information about what is a
               | "time series" or a "metric" on the Grafana's pricing
               | page, good luck.
               | 
               | https://grafana.com/docs/grafana-cloud/metrics-control-
               | usage...
        
               | maccard wrote:
               | > Excluding traffic, it costs ~ $100 USD per month
               | 
               | I don't doubt that that's affordable, or cost competitive
               | to AWS, but thats' about as cheap as you can do it, _and_
               | that's not including traffic. It's pretty much impossible
               | to half that bill.
        
               | m1keil wrote:
               | I excluded the traffic because the price is basically 0.
               | This is internal traffic and a bunch of HTTP requests. It
               | doesn't cost us $3000 a month.
        
               | alexjplant wrote:
               | There are Helm charts available for all Grafana products
               | so if you already run a Kubernetes cluster and have spare
               | capacity you can just throw it up there. Loki supports
               | shipping logs to GCS/S3 natively and Prometheus can use
               | Cortex (also available as a Helm chart) to do the same.
               | Once you throw Grafana behind SSO and implement a backup
               | cronjob you're done until you reach scale and have to
               | start deploying/scaling individual components separately.
               | 
               | I implemented most of the above using Terraform on a
               | managed DigitalOcean cluster on a Saturday a few months
               | back; it wasn't super-hard. Alternatively you could rent
               | a few VPSes someplace and use k3s or similar to get an
               | unmanaged cluster.
        
               | westurner wrote:
               | Suggestions for organizing a Helm + Terraform [+
               | k3s/k3d/MicroShift] provisioning _and monitoring_ git
               | repo with CI for job accounting? (without Ansible  & AWX,
               | which I'd create a role with for this too)
               | 
               | - [ ] ENH,BLD: A cookiecutter for this would be cool
        
               | krnlpnc wrote:
               | > $50/month? On AWS that buys you 4 cores, 8GB memory and
               | 0 storage
               | 
               | Self-hosting on AWS is kind of counterproductive. Look
               | into "cloud" metal servers and the money will go much
               | further.
        
         | mdaniel wrote:
         | Still AGPL, which I guess makes sense given the rest of their
         | stack is too:
         | https://github.com/grafana/mimir/blob/mimir-2.0.0/LICENSE
        
       | cfors wrote:
       | More engineering effort going into reinventing things that
       | already exist to upsell people on Grafana cloud.
       | 
       | What about focusing on the core value that Grafana provides,
       | dashboards?
       | 
       | Grafana 8 alerting is still in my opinion at a beta level.
       | Dashboards as code has made no meaningful progress outside of
       | community attempts in the past 3 years. The documentation for
       | Grafana 8 alerts is still subpar.
       | 
       | All of these things as a paid offering are more interesting than
       | migrating my logging system or metrics system. Developers don't
       | want to migrate their observability.
        
         | INTPenis wrote:
         | What issues have you seen with Grafana alerting?
         | 
         | I'm curious because in my view it works so well that we
         | abandoned alertmanager for Grafana alerts only well before v8.
        
           | darkwater wrote:
           | How did you define alarms as code in a practical way before
           | v8? and after?
        
             | ArmandGrillet wrote:
             | Hi, I work on Grafana Alerting. Provisioning of alert rules
             | (and other objects used for alerting) will be possible
             | using a new API in Grafana 8.5 and we will update the
             | Grafana Terraform provider right after to take advantage of
             | this new API.
        
               | darkwater wrote:
               | Great to hear! We are looking into jsonnet based approach
               | but having an explicit and granular API and a Terraform
               | provider would be miles and miles better. Thanks!
        
             | INTPenis wrote:
             | Did not tbh. We have an ops department that do not complain
             | about menial tasks.
             | 
             | But of course IaC is the way we must follow.
        
               | cfors wrote:
               | Building a dashboard by clickety/clacking around is not a
               | menial task, consistency across dashboards is a a core
               | unit of observability to ensure x-functional teams can
               | discuss issues across a common language/viewpoint, which
               | is only enforceable through a declarative dashboard
               | syntax.
        
               | INTPenis wrote:
               | The question was regarding alerts, not dashboards. We
               | obviously deploy dashboards from json.
               | 
               | But I'm not aware of any way to deploy notification
               | channels, probably can do that now via API. But either
               | way we need to deploy notification channels with webhooks
               | and tokens so that part is done manually. And then the
               | alerts is also done manually.
        
           | cfors wrote:
           | Grafana alerts (before version 8) worked great. We use them,
           | but the Grafana 8 alerting features are half-baked at best.
           | 
           | * Grafana 8 alerts removed the Image Preview, which was
           | extremely useful during issues. [0]
           | 
           | * Grafana 8 alerts don't have any way of being stored as
           | code. In fact the API that they provide in their docs [0][1]
           | doesn't work, or isn't up to date.
           | 
           | * The expression languages have zero documentation about
           | them, so aren't exactly useful for things that might get a
           | developer out of bed in the middle of the night.
           | 
           | [0] https://github.com/grafana/grafana/discussions/38030#disc
           | uss...
           | 
           | [1] https://editor.swagger.io/?url=https://raw.githubusercont
           | ent...
           | 
           | [2] https://community.grafana.com/t/posting-an-alert-using-
           | grafa...
        
           | Bad_CRC wrote:
           | No alerts possible with dashboards and variables.
        
         | jchw wrote:
         | Understandable critique, but I absolutely love a lot of
         | Grafana's redundant offerings. For example, operationally
         | speaking it is _drastically_ simpler to set up a scalable
         | Grafana Tempo instance than Jaeger, in my opinion. Grafana
         | offering competent object storage backends for their software
         | has made them dramatically easier to operate and maintain.
         | 
         | That's also another thing: a decent amount of Grafana software
         | (Mimir, Loki, Tempo...) are OSS, so while they definitely are
         | using those softwares in their paid offering, they absolutely
         | still benefit OSS users. I'm messing with Tempo for telemetry
         | in my (admittedly embarrassingly weak) home lab endeavors and
         | it's pretty cool.
        
         | CitizenKane wrote:
         | Hey there! I work at Grafana on many of the dashboard
         | components. Beyond dashboards as code and alerts where are you
         | feeling the pain?
         | 
         | I can say that a lot of effort is going into improving
         | dashboards in a number of different dimensions and there are
         | definitely some exciting things on the horizon.
        
         | detaro wrote:
         | Is there any competitor in the "primarily dashboards" space?
         | Plenty things I know just use Grafana for small amounts of data
         | where all this "5 new datastores!" isn't really useful, but
         | dashboard improvements would be welcome.
        
         | berkes wrote:
         | Seconded. While I like the idea of Grafana, and use it for some
         | projects, it lacks features in the graphing and dashboarding
         | part. I too presumed this is because they are spending more on
         | backends, pipelines and collection..
         | 
         | I don't need more backends, pipelines or collections. I need a
         | frontend to display the data that I have (in backends) already.
         | 
         | I need to:
         | 
         | * Be able to pipe KPIs into a storage. Doesn't need big-data,
         | high-volume, or extreme granularity. OR
         | 
         | * Have grafana grab data from an API/HTTP endpoint. It does
         | this with prometheus just fine.
         | 
         | * Have a way to insert some of my own figures. Currently I wire
         | up some google-sheet to grafana and fill that. I always have
         | some data that I cannot or will not (yet) grab automatically.
         | Like "amount of hours spent working on project" or "MRR" or
         | such.
         | 
         | Its possible with Grafana. But the experience is subpar, the
         | tweaking and wiggling is big and the outcome is an OK-ish, but
         | not too convincing dashboard. I'm convinced an alternative that
         | tackles this better (for niches) will eat into grafana.
        
       | cett wrote:
       | Presumably AGPLv3 is why Grafana would rather develop this than
       | Cortex?
        
         | pracucci wrote:
         | Hi. I'm Marco, I work at Grafana Labs and I'm a Grafana Mimir
         | maintainer. We just published a couple of blog posts about the
         | project, including more details on your question:
         | https://grafana.com/blog/2022/03/30/announcing-grafana-mimir...
         | and https://grafana.com/blog/2022/03/30/qa-with-our-ceo-about-
         | gr...
        
           | cett wrote:
           | Thank you for your answer. That seems like a reasonable
           | strategy.
        
       | MindTooth wrote:
       | How does this compare to https://www.timescale.com/promscale
       | 
       | I'm looking into choosing a backend for my metrics and always
       | open for suggestions.
        
         | vineeth0297 wrote:
         | Hey!
         | 
         | Promscale PM here :)
         | 
         | Promscale is the open source observability backend for metrics
         | and traces powered by SQL.
         | 
         | Whereas Mimir/Cortex is designed only for metrics.
         | 
         | Key differences:
         | 
         | 1. Promscale is light in architecture as all you need is
         | Promscale connector + TimescaleDB to store and analyse metrics,
         | traces where as Cortex comes with highly scalable micro-
         | services architecture this requires deploying 10's of services
         | like ingestor, distributor, querier, etc.
         | 
         | 2. Promscale offers storage for metrics, traces and logs (in
         | future). One system for all observability data. whereas the
         | Mimir/Cortex is purpose built for metrics.
         | 
         | 3. Promscale supports querying the metrics using PromQL, SQL
         | and traces using Jaeger query and SQL. whereas in Cortex/Mimir
         | all you can use is PromQL for metrics querying.
         | 
         | 4. The Observability data in Cortex/Mimir is stored in object
         | store like S3, GCS whereas in Promscale the data is stored in
         | relational database i.e. TimescaleDB. This means that Promscale
         | can support more complex analytics via SQL but Cortex is better
         | for horizontal scalability at really large scales.
         | 
         | 5. Promscale offers per metric retention, whereas Cortex/Mimir
         | offers a global retention policy across the metrics.
         | 
         | I hope this answers your question!
        
           | pracucci wrote:
           | Hi. I'm a Mimir maintainer. I don't have hands-on/production
           | experience with Promscale, so I can't speak about it. I'm
           | chiming in just to add a note about the Mimir deployment
           | modes.
           | 
           | > Cortex comes with highly scalable micro-services
           | architecture this requires deploying 10's of services like
           | ingestor, distributor, querier, etc.
           | 
           | Mimir also supports the monolithic deployment mode. It's
           | about deploying the whole Mimir as a single unit (eg. a
           | Kubernetes StatefulSet) which you then scale out adding more
           | replicas.
           | 
           | More details here:
           | https://grafana.com/docs/mimir/latest/operators-
           | guide/archit...
        
           | tarun_anand wrote:
           | Thanks... how do we do reporting/dashboards/alerts with
           | Promscale?
           | 
           | Also, any performance benchmarks?
        
             | vineeth0297 wrote:
             | Promscale supports reporting/ingestion of data using
             | Prometheus remote-write for metrics, OTLP (OpenTelemetry
             | Line Protocol) for traces.
             | 
             | Dashboards you can use Promscale as Prometheus datasource
             | for PromQL based querying, visualising, as Jaeger
             | datasource for querying, visualising traces and as
             | PostgreSQL datasource to query both metrics and traces
             | using SQL. If you are interested in visualising data using
             | SQL, we recently published a blog on visualising traces
             | using SQL (https://www.timescale.com/blog/learn-
             | opentelemetry-tracing-w...)
             | 
             | Alerts needs to be configured on the Prometheus end,
             | Promscale doesn't support alerting at the moment. But
             | expect the native alerting from Promscale in the upcoming
             | releases.
             | 
             | We have internally tested Promscale at 1Mil samples/sec,
             | here is the resource recommendation guide for Promscale htt
             | ps://docs.timescale.com/promscale/latest/installation/rec..
             | .
             | 
             | If you are interested in evaluating, setting up Promscale
             | reach out to us in Timescale community
             | slack(http://slack.timescale.com/) in #promscale channel.
        
       | Thaxll wrote:
       | So many solutions to the same problem, how does it compare to
       | Victoria Metrics?
        
         | hagen1778 wrote:
         | VictoriaMetrics co-founder here.
         | 
         | There are many similar features between Mimir and
         | VictoriaMetrics: multi-tenancy, horizontal and vertical
         | scalability, high availability. Features like Graphite and
         | Influx protocols ingestion, Graphite query engine are already
         | supported by VictoriaMetrics. I didn't find references to
         | downsampling in Mimir's docs, but I believe it supports it too.
         | 
         | There are architectural differences. For example, Mimir stores
         | last 2h of data in local filesystem (and mmaps it, I assume)
         | and once in 2h uploads it to the object storage (long-term
         | storage). VictoriaMetrics doesn't support object storage and
         | prefers to use local filesystem for the sake of query speed
         | performance. Both VictoriaMetrics and Mimir can be used as a
         | single binary (Monolithic mode in Mimir's docs) and in cluster
         | mode (Microservices mode in Mimir's docs). The set of cluster
         | components (microservices) is different, though.
         | 
         | It is hard to say something about ingestion and query
         | performance or resource usage so far. While benchmarks from the
         | project owners can be 100% objective, I hope community will
         | perform unbiased tests soon.
        
         | outsb wrote:
         | Given Victoria Metrics is the only solution I've seen to make
         | data comparing it to other systems easily accessible as part of
         | official documentation, it's the only one I pay attention to.
         | 
         | I knew from reading the docs what VM excelled at and areas it
         | was weak in, long before I ever ran it (and expectations from
         | running it matched the documentation). I hate aspirational
         | marketing-saturated campaigns for deep tech projects where
         | standards should obviously be higher, it speaks more about
         | intended audience than it does the solution, and that's why in
         | this respect VM is automatically a cut above the rest.
        
           | cip01 wrote:
           | Cortex, Thanos and Mimir all support "remote-read" protocol
           | (documented in Prometheus: https://prometheus.io/docs/prometh
           | eus/latest/storage/#remote...), so external systems (eg
           | Prometheus) can read data from them easily.
        
             | valyala wrote:
             | It would be great if you could provide a few practical
             | examples for "Prometheus remote-read" protocol given its'
             | restrictions [1].
             | 
             | [1] https://github.com/prometheus/prometheus/issues/4456
        
               | cip01 wrote:
               | Which restrictions do you have in mind?
               | 
               | Quick look at the issue looks like it wanted to avoid
               | using local storage by Prometheus, but that's Prometheus
               | specific problem, not remote-read problem.
               | 
               | Remote-read is a generic protocol (https://github.com/pro
               | metheus/prometheus/blob/a1121efc18ba15...), you pass
               | query (start/end time and matchers), and get back data.
        
       | halfmatthalfcat wrote:
       | How does this stack up with https://github.com/thanos-io/thanos,
       | which I've used to pretty good success.
       | 
       | The only criticism I have of Thanos though was the amount of
       | moving pieces to maintain.
        
         | netingle wrote:
         | (Tom here; I started the Cortex project on which Mimir is based
         | and lead the team behind Mimir)
         | 
         | Thanos is an awesome piece of software, and the Thanos team
         | have done a great job building an vibrant community. I'm a big
         | fan - so much so we used Thanos' storage in Cortex.
         | 
         | Mimir builds on this and makes it even more scalable and
         | performance (with a sharded compactor and query engine). Mimir
         | is multitenant from day 1, whereas this is a relatively new
         | thing in Thanos I believe. Mimir has a slightly different
         | deployment model to Thanos, but honestly even this is
         | converging.
         | 
         | Generally: choosing Thanos is always going to be a good choice,
         | but IMO choosing Mimir is an even better one :-p
        
           | AndyNemmity wrote:
           | Okay, but why? I am using Thanos today. It works, it's
           | complex, when it breaks, it's a bit of a challenge to fix,
           | but it happens. It doesn't break often.
           | 
           | It does the job. Mimir, which is based on Cortex, using
           | either Mimir, or Cortex, what benefit am I getting?
           | 
           | I get asked every few months about moving off of Thanos to
           | Cortex, and today now Mimir, and I don't have any substantial
           | reason to do so. It feels like moving for the sake of moving.
           | 
           | I need to see some real reasoning as to why I am going to add
           | value to move everything to Mimir.
        
             | netingle wrote:
             | Sounds like Thanos is working well for you, so in your
             | position I wouldn't change anything.
             | 
             | There are a bunch of other reasons why people might choose
             | Mimir; perhaps they have out grown some of the scalability
             | limits, or perhaps they want faster high cardinality
             | queries, or a different take on multi-tenancy.
             | 
             | Do remember Cortex (on which Mimir is based) predates
             | Thanos as a project; Thanos was started to pursue a
             | different architecture and storage concept. Thanos storage
             | was clearly the way forward, so we adopted it. The
             | architectures are still different: Thanos is "edge"-style
             | IMO, Mimir is more centralised. Some people have a
             | preference for one over the other.
        
               | AndyNemmity wrote:
               | That's fair, thanks for the input. The only reason we
               | implemented Thanos in the first place was a particular
               | feature that we needed at the time of implementation. Now
               | using it in an extremely large environment, I haven't
               | seen any scalability limits. Speed of queries isn't a
               | driver of anything.
               | 
               | Multi Tenancy certainly is, but we have our own custom
               | multi tenancy solution over top of it we built ourselves.
               | I'd like to get rid of that ultimately, but we're not
               | utilizing whatever multi tenant features exist at the
               | moment. Perhaps that will be a driver.
               | 
               | Appreciate your thoughts.
        
           | notacoward wrote:
           | Multi-tenancy is something that shouldn't be underestimated.
           | A lot of people think it's just a checklist item until (a)
           | they need it or (b) they try to implement it in an existing
           | system. Kudos for making it a day-one feature.
        
             | vladvasiliu wrote:
             | While I agree with your point in the general case, would
             | you mind elaborating on the specific case of Prometheus?
             | 
             | My understanding is that the recommended best-practice for
             | Prometheus is to deploy as many of them as necessary, as
             | close to the monitored infrastructure as possible.
             | 
             | What use case would require deploying a single Mimir, so
             | supposedly Prometheus (cluster) in the case of serving
             | multiple tenants? Why not just deploy a dedicated
             | Prometheus / Mimir stack per client?
        
         | pracucci wrote:
         | Mimir has a microservices architecture. However, Mimir supports
         | two deployment modes: monolithic and microservices.
         | 
         | In monolithic mode you deploy Mimir as a single process and all
         | microservices (Mimir components) run inside the same process.
         | Then you scale it out running more replicas. Deployment modes
         | are documented here:
         | https://grafana.com/docs/mimir/latest/operators-guide/archit...
        
         | witcher wrote:
         | (Bartek here: I co-started Thanos and maintain it with other
         | companies)
         | 
         | Thanks for this - it's a good feedback. It's funny you
         | mentioned that, because we actively try to reduce the number of
         | running pieces e.g while we design our query sharding
         | (parallelization) and pushdown features.
         | 
         | As Cortex/Mimir shows it's hard - if you want to scale out
         | every tiny functionality of your system you end up with twenty
         | different microservices. But it's an interesting challenge to
         | have - eventually it comes to trade-offs we try to make in
         | Thanos between simplicity, reliability and cost vs ultra max
         | performance (Mimir/Cortex).
        
       | mgarciaisaia wrote:
       | The thing I need most right now is a confirmation that it's named
       | after this tweet:
       | https://twitter.com/mmoriqomm/status/1272552214658117638
        
       | nosequel wrote:
       | Grafana Labs needs to make a convincing comparison chart of some
       | kind between Mimir, Thanos, and Cortex. Thanos and Cortex are
       | both mature projects and are both CNCF Incubating projects. Why
       | would anyone switch to a new prometheus long-term storage
       | solution from those?
       | 
       |  _*EDIT*_ : I see from another reply there is a basic comparison
       | to Cortex here: https://grafana.com/blog/2022/03/30/announcing-
       | grafana-mimir... To the Mimir folks, I'd love to see something
       | similar Mimir v. Thanos.
        
         | mekster wrote:
         | You're forgetting VictoriaMetrics that's presumably the best
         | choice for Prometheus long term storage.
         | 
         | Such a solid solution exists and yet another competitor? Not
         | sure why they didn't just buy VictoriaMetrics and possibly
         | rebrand it.
        
         | fishpen0 wrote:
         | > Cortex is used by some of the world's largest cloud providers
         | and ISVs, who are able to offer Cortex at a lower cost because
         | they do not invest the same amount in developing the project.
         | 
         | > ...
         | 
         | > All CNCF projects must be Apache 2.0-licensed. This
         | restriction also prevents us from contributing our improvements
         | back to Cortex.
         | 
         | I read this as "Amazon has destroyed the CNCF by not playing
         | nice"
        
           | CameronNemo wrote:
           | Holy crap I did not know CNCF discriminated against copyleft
           | software.
           | 
           | This really discredits the Linux Foundation as an
           | institution.
        
         | netingle wrote:
         | I agree! Which is why I put one in the blog post ;-)
         | https://grafana.com/blog/2022/03/30/announcing-grafana-mimir...
        
           | krnlpnc wrote:
           | I'm not seeing a comparison to Thanos
        
             | alrlroipsp wrote:
             | Why would you? Parent says its a comparison of Mimir and
             | Cortex.
        
               | krnlpnc wrote:
               | Re-read the full thread...
               | 
               | >>Grafana Labs needs to make a convincing comparison
               | chart of some kind between Mimir, Thanos, and Cortex.
               | 
               | >I agree! Which is why I put one in the blog post ;-)
        
         | sciurus wrote:
         | It looks like this is a fork of Cortex driven by the
         | maintainers employed by Grafana Labs, done so they can change
         | the license to one that will prevent cloud providers like
         | Amazon from offering it without contributing changes back.
         | 
         | This is interesting, since Amazon offers both hosted Grafana
         | and Cortex today. I was under the impression Amazon and Grafana
         | Labs were successfully collaborating (unlike e.g. AWS and
         | Elastic), but seems like that's not the case.
        
           | WraithM wrote:
           | Does AWS provide managed Cortex? Is that just a part of the
           | AWS managed prometheus thing?
        
             | sciurus wrote:
             | Yes, Amazon's managed Prometheus is based on Cortex. See
             | the first question at
             | https://aws.amazon.com/prometheus/faqs/
        
       | eatonphil wrote:
       | It's hard to tell exactly how this works but judging from the
       | tutorial's docker-compose.yml [0] it looks like this runs as a
       | separate API next to Prometheus and you tell Prometheus to write
       | [1] to Mimir. I'm unclear how reads work from it or maybe there
       | is no read?
       | 
       | Maybe I'm completely misunderstanding.
       | 
       | [0]
       | https://github.com/grafana/mimir/blob/main/docs/sources/tuto...
       | 
       | [1]
       | https://github.com/grafana/mimir/blob/main/docs/sources/tuto...
        
         | pracucci wrote:
         | Mimir exposes both remote write API and Prometheus compatible
         | API. The typical setup is that you configure Prometheus (or
         | Grafana Agent) to remote write to Mimir and then you configure
         | Grafana (or your preferred query tool) to query metrics from
         | Mimir.
         | 
         | You may also be interested into looking at a 5 minutes
         | introduction video, where I cover the overall architecture too:
         | https://www.youtube.com/watch?v=ej9y3KILV8g
        
           | eatonphil wrote:
           | Cool! Personally I don't like watching videos, preferring to
           | read prose or code or see an arch diagram. But good that it's
           | available.
        
             | pracucci wrote:
             | I'm the author of the video, but personally I also prefer
             | to read prose instead of watching videos!
             | 
             | The architecture is covered here:
             | https://grafana.com/docs/mimir/latest/operators-
             | guide/archit...
             | 
             | There's also an hands-on tutorial here:
             | https://grafana.com/tutorials/play-with-grafana-mimir/
        
         | bboreham wrote:
         | It's a centralised multi-tenant store, supporting the
         | Prometheus query API. So you can point clients directly at
         | Mimir, they send in PromQL and they get data back in Json.
         | 
         | (Note I work on Mimir)
        
           | eatonphil wrote:
           | Is there an example of running mimir without prometheus?
        
             | bboreham wrote:
             | For example sending metrics from an OpenTelemetry pipeline.
             | 
             | Mimir accepts the Prometheus remote-write api, which is
             | protobuf-over-http; can be generated by anything really.
        
           | k8sToGo wrote:
           | But who does the scraping of the prometheus agents? Mimir or
           | still prometheus server?
        
             | Duologic wrote:
             | Last year I wrote a blog post about this exact question:
             | Who watches the watchers?
             | 
             | The general takeaway is that you run a minimal
             | prometheus/alertmanager setup that only scrapes the agents,
             | then use a dead man switch-like system to ensure this
             | pipeline keeps working.
             | 
             | Link: https://grafana.com/blog/2021/04/08/how-we-use-
             | metamonitorin...
        
             | bboreham wrote:
             | If you have systems exporting metrics in Prometheus style,
             | then you can use Prometheus to scrape them and remote-write
             | to Mimir.
             | 
             | You can alternately use Prometheus Agent, to save storing
             | the data and running a query engine at the leaf.
             | 
             | You can also use the OpenTelemetry suite to perform the
             | same operation, though this is more appealing if you want
             | some other OpenTelemetry features at the same time. Eg if
             | you prefer the 'pipeline' style.
        
             | inkel wrote:
             | You configure with Remote Write [1] to the Mimir instance.
             | Then the Prometheus agents will send the metrics to Mimir.
             | 
             | 1: https://prometheus.io/docs/prometheus/latest/configurati
             | on/c...
        
       | SuperQue wrote:
       | One interesting question I have is regards to global
       | availability.
       | 
       | With our current Thanos deployment, we can tie a single geo
       | regional deployment together with a tiered query engine.
       | 
       | Basically like this:
       | 
       | "Global Query Layer" -> "Zone Cluster Query Layer" -> "Prom
       | Sidecar / Thanos Store"
       | 
       | We can duplicate the "Global Query Layer" in multiple geo regions
       | with their own replicated Grafana instances. If a single
       | region/zone has trouble we can still access metrics in other
       | regions/zones. This avoids Thanos having any SPoFs for large
       | multi-user(Dev/SRE) orgs.
        
         | bboreham wrote:
         | The typical way to run Mimir is centralised, with different
         | regions/datacenters feeding metrics in to one place. You can
         | run that central system across multiple AZs.
         | 
         | If you run Mimir with an object store (e.g. S3) that supports
         | replication then you can have copies in multiple geographies
         | and query them, but the copies will not have the most recent
         | data.
         | 
         | (Note I work on Mimir)
        
       | ddon wrote:
       | Looks like an interesting alternative to Clickhouse with s3
       | backend...
        
       | monstrado wrote:
       | Is this the project you guys referenced using Apache Arrow for?
        
         | bboreham wrote:
         | Maybe you're thinking of this - the data structure used by
         | datasources for Grafana dashboards:
         | 
         | https://grafana.com/docs/grafana/latest/developers/plugins/d...
        
         | netingle wrote:
         | I don't think so! I think thats being used in Tempo, but I'm
         | not sure.
        
       | sriv1211 wrote:
       | What's the latency between sending a metric and being able to
       | query it when using object storage (s3) instead of block storage?
       | 
       | How do the transfer/retrieval (GET/PUT) costs factor in as well?
        
         | pracucci wrote:
         | Good question! Grafana Mimir guarantees read-after-write. If a
         | write request succeed, the metric samples you've written are
         | guaranteed to be queried by any subsequent query.
         | 
         | Mimir employes write deamplification: it doesn't write
         | immediately to the object storage but keeps most recently
         | written data in-memory and/or local disk.
         | 
         | Mimir also employes several shared caches (supports Memcached)
         | to reduce object storage (S3) access as much as possible.
         | 
         | You can learn more here in the Mimir architecture
         | documentation: https://grafana.com/docs/mimir/latest/operators-
         | guide/archit...
        
       | young_unixer wrote:
       | Coincidentally, "mimir" is a funny, baby-like way of saying
       | "dormir" (to sleep) in Spanish.
        
         | estebarb wrote:
         | Technical meetings are going to be fun with hispanic devs...
         | 
         | "And finally we sent the metrics to Mimir /giggles/"
         | 
         | Sadly they don't support encryption at rest (sorry, I really
         | had to do one more pun)
        
         | vladsanchez wrote:
         | So true!!! LOL I related to "Vamos a mimir!" when I read it!!!
         | ROFL
        
       | bbu wrote:
       | i don't get why there's so much hate here.
       | 
       | cortex is a pain to configure and maintain. would be awesome to
       | have mimir address these issue!
        
       | jhoechtl wrote:
       | What is the relationship to Loki?
        
         | bboreham wrote:
         | Sibling. Much of the architecture is similar; a number of
         | components are shared in https://github.com/grafana/dskit.
        
       | firstSpeaker wrote:
       | How does it work with Rules? So far I cannot see if this can be a
       | replacement for prometheus since I cannot see how can we re-use
       | our prometheus rules with Mimir. Anyone knows anything around
       | that?
        
         | pracucci wrote:
         | Mimir includes a ruler component, which is responsibile to
         | evaluate Prometheus recording and alerting rules. It also
         | exposes a set of APIs to configure the rule groups.
         | 
         | For example, you can use this API to upload a rule group:
         | https://grafana.com/docs/mimir/latest/operators-guide/refere...
         | 
         | Mimir is released with a CLI tool called "mimirtool" which,
         | among other things, allow you to configure the rule groups
         | (under the hood, it calls the Mimir API). Mimirtool
         | documentation is here:
         | https://grafana.com/docs/mimir/latest/operators-guide/tools/...
        
       | dikei wrote:
       | Sad news for Cortex, with most of the maintainer moving on to
       | Mimir, I fear it's pretty much dead in the water.
        
         | AndyNemmity wrote:
         | If anything, this makes me less interested in moving from
         | Thanos.
        
         | netingle wrote:
         | We tried to address this question on the Q&A blog post:
         | https://grafana.com/blog/2022/03/30/qa-with-our-ceo-about-gr...
         | 
         | It doesn't have to mean the end for Cortex, but others will
         | have to step up to lead the project. We've tried to put other
         | maintainers in place to kick start this.
        
           | sciurus wrote:
           | I was going to ask what the migration path was from Cortex to
           | Mimir, but I see you've documented that at
           | https://grafana.com/docs/mimir/latest/migration-
           | guide/migrat... . Thanks for the work you've done to make
           | this easy.
        
             | pracucci wrote:
             | This video also shows a live migration from Cortex to Mimir
             | (running in Kubernetes): https://www.youtube.com/watch?v=aa
             | GxTcJmzBw&ab_channel=Grafa...
        
       | misiti3780 wrote:
       | What is the best SASS based dashboard solution for Prometheus?
        
         | heinrichhartman wrote:
         | Grafana Cloud
        
           | misiti3780 wrote:
           | thanks
        
       | [deleted]
        
       ___________________________________________________________________
       (page generated 2022-03-30 23:01 UTC)