[HN Gopher] Amazon Managed Service for Prometheus ___________________________________________________________________ Amazon Managed Service for Prometheus Author : pdelgallego Score : 115 points Date : 2020-12-15 17:47 UTC (5 hours ago) (HTM) web link (aws.amazon.com) (TXT) w3m dump (aws.amazon.com) | pickledish wrote: | 14 cents per "query processing minute" sounds like it could add | up very fast. Prom queries can get somewhat complex and it's not | rare at all IME to have a dashboard making several multi-second | queries per load (whether that falls into "you're using | Prometheus wrong" being a separate discussion of course) | | Edit: The example from their pricing page: | | > We will assume you have 1 end user monitoring a dashboard for | an average of 2 hours per day refreshing it every 60 seconds with | 20 chart widgets per dashboard (assuming 1 PromQL query per | widget)... assuming 18ms per query for this example. | | Comes out to over $3 per month in query costs. Replace this 1 | person with a TV showing the dashboard all day, and the cost | jumps to $36, for just one dashboard and (again IME) overly fast | query estimates... o.O | edoceo wrote: | Now do six dashboards, 10 widgets each, multiple viewers, | 18h/day and one slowish query on each dashboard. Seems like we | get to hundred+ pretty quick | bboreham wrote: | Caching means that multiple viewers cost very little extra. | | (I am a Cortex maintainer) | gravypod wrote: | Does it put any limits on cardinality of metrics? Grafana | cloud's offering was absolutely awful for my use cases. They | charge per-series so if you have metrics with a "pod=..." label | your prices go through the roof. | valyala wrote: | Grafana cloud sets high prices for high-cardinality metrics | because the underlying system - Cortex - isn't optimized well | for storing high number of unique time series. For example, | it requires at least 15GB of RAM for processing a million of | active time series [1]. This means high infrastructure costs, | which increase pricing for end users. Other systems such as | VictoriaMetrics require up to 15x lower RAM for the same | metrics' cardinality [2]. | | [1] https://github.com/cortexproject/cortex/blob/67648aabae70 | f19... | | [2] https://victoriametrics.github.io/#capacity-planning | heliodor wrote: | Plenty has been written about not using the | server/container/pod id as a label because it leads to high | cardinality which leads to poor performance (cost aside). | Time series databases have been purpose-built for certain | workloads and you can consider this their weakness. | gravypod wrote: | Plenty has also been written about the bugs/issues that | have cropped up that are only visible when inspecting what | regions/nodes/cgroups an issue is coming from [0]. My use | case wasn't exactly `pod=...` but it was very similar. It | was more like `device=...`. Also, for a huge application, | it's not uncommon to have 100s or even 1000s of metrics | that are important to application health/performance. | Constantly saying "do you really need X? It will cost us Y" | will lead to an extremely under-monitored application. | | [0] - https://cloud.google.com/blog/products/management- | tools/sre-... | heliodor wrote: | Plenty of companies run their own servers because cloud | is too expensive at their scale. Same goes for metrics. | It's a direct result of one-price-fits-all pricing models | for software as well as pricing that is not correctly | tied to value. | kasey_junk wrote: | Every managed metrics system will put a limit on cardinality | because all mainstream available metrics systems cost more | per cardinality to query and store. If they don't limit that | you can assume you or some other customer is going to use up | the clusters resources and cause an outage. | | Like most metrics systems, under the covers in Prometheus | each unique combination of dimensions is the same as a new | metric line. | webo wrote: | I like Weave Cloud's Prometheus hosting model -- it's per | host, which is predictable and forecastable. | latchkey wrote: | I just went through the "process" of installing Grafana, Loki, | Promtail and Prometheus on an ubuntu box and it is almost like | the company behind all of this has gone out of the their way to | make it hard. It isn't really _that_ difficult to get set up, but | it also isn't 'apt install' easy (you really want me to create my | own startup scripts?) and required me to build my own | documentation on how I installed everything. | rfratto wrote: | One of the Loki maintainers here (though I mostly work on other | stuff now). I promise it's not difficult on purpose. | | We've put a lot of effort into optimizing the Kubernetes | experience that non-containerized installations haven't been | getting as much attention. We'd be thrilled to have system | packages for Loki that also set it up as a service, it's just | not something we've been able to spend time doing ourselves | yet. | latchkey wrote: | It isn't just loki, but the whole stack. Grafana is the only | project mentioned that has a debian installer. | | The expectation that someone doing greenfield development is | going to jump into k8s just to use the software is kind of | weird. | qz2 wrote: | I'm deploying it (prom, alertmanager, pushgateway, grafana) | on native hardware via ansible and it's not difficult. Not | Loki (yet). It's all just go binaries you fire up with | systemd with a single config file. | | I find it harder to deploy reliably on kubernetes with | persistent volumes etc. | 0xbadcafebee wrote: | All of those who have spent their free time contributing to | Linux distributions are why 'apt install' is easy. You can | contribute too. | latchkey wrote: | As the co-founder of Apache Java and a 20+ year member of the | ASF, creator and contributor to hundreds of projects over the | years, I think I've contributed enough of my time to OSS. I'm | more than happy to let the new kids jump in. Thanks for the | 'advice'. | john_moscow wrote: | It's almost like the company behind it wants to see some profit | after pouring millions of dollars into developing these tools. | Except, in 2020 you cannot just have a closed-source easy-to- | use documented and supported product with a license fee. Not in | the server market, at least. Everything must be free and open- | source, and you are expected to make money by offering a hosted | service. Except, good luck competing with Big Cloud. | RocketSyntax wrote: | It's extremely worrisome. The incentive to spend your early | mornings, nights, and weekends building something awesome to | free yourself from corporate life is fading away. They need | to institute some kind of royalty program or at least | dedicate engineers to helping maintain the projects they make | into services. | | Almost have to change gears and get into a scientific field | that isn't computer science. | WoahNoun wrote: | Everyone here complaining about the pricing on the managed | Grafana and Prometheus services have clearly never worked at a | shop using SumoLogic. Log/metric processing/querying is expensive | for a reason. | eminence32 wrote: | From the pricing section: | | > AMP counts each metric sample ingested to the secured | Prometheus-compatible endpoint. AMP also calculates the stored | metric samples and metric metadata in gigabytes (GB), where 1GB | is 230 bytes. | | Surely that's a typo, right? | biot wrote: | Likely a casualty of copy and paste that left out the | superscript formatting. 1GB is 2^30 bytes. | alexhf wrote: | I don't see any mention of Pushgateway. They'll need to add that | or I won't be able to monitor ephemeral jobs. | mchene wrote: | Hey... Marc here from AWS. I'm the PM lead for this service. | Thank you for the feedback. Pushgateway is important for our | customers and it is a feature we are looking to support as part | of our roadmap. For the time being, you can continue to use the | Pushgateway as you do today and remote write the metrics to AMP | for long term storage and querying! | pram wrote: | Yeah I dunno about this, and the grafana service. They're not | exactly complicated to run on their own. At this pricing you may | as well be on Datadog. | nrmitchi wrote: | I've commented fairly heavily in the related Grafana thread. | | Prometheus is a bit of a different story. It _does_ have some | operational overhead when you get to a certain point, and | scaling it out is not always trivial. | | Assuming it works, there is value-add on this one, and the | pricing is more in line with _active use_ (ie, a cost+ model, | which is more typical of AWS services) | [deleted] | valyala wrote: | Amazon Managed Service for Prometheus is based on Cortex. It | is quite expensive in terms of operational and infrastructure | costs compared to VictoriaMetrics [1] according to case | studies from VictoriaMetrics users [2]. This may explain | quite high costs for AMP. | | [1] https://victoriametrics.github.io/FAQ.html#what-is-the- | diffe... | | [2] https://victoriametrics.github.io/CaseStudies.html | | Disclaimer: I'm core developer of VictoriaMetrics, so feel | free asking any questions about it or about our competitors | :) | zander312 wrote: | Scaling prometheus across multiple separate Kubernetes clusters | is a fking nightmare. | zytek wrote: | Try VictoriaMetrics. Deploy stateless Prometheuses that | remote_write to central VictoriaMetrics instance. | stevekemp wrote: | This seems more interesting of the two, grafana is pretty | simple to setup and maintain. The harder part is handling the | metrics themselves, be it with influxdb, prometheus, or | something else. | markcartertm wrote: | setting up one Prometheus server is easy. scaling, HA, Metrics | retention for more than 3 days not so much. | heliodor wrote: | Look at VictoriaMetrics (and the related products vmalert and | vmagent) for a much easier and pleasant experience as a drop- | in Prometheus replacement. | Thaxll wrote: | Prometheus is not easy to run at scale on the storage side. | pram wrote: | This is all relative but I don't personally think so. Not on | EC2+EBS, anyway. Certainly not as difficult as | running/scaling an ES or Kafka cluster. | Thaxll wrote: | It's a completely different problem because by default | Prometheus does not shard anything so you're bound to a | single instance, where ES and Kafka are cluster based. | 0xbadcafebee wrote: | You could say the same about any SaaS based on open source, but | people still find it useful | slyall wrote: | The pricing just for the ingest seems way off. $0.002 for 10,000 | metrics might not seem like much by even a simple node_exporter | will grab 700 metrics every 15 seconds. | | Thats $24/month just to ingest the cpu/ram/diskspace data from | each server. Plus storage and query costs. | | At work I have a single r4.xlarge instance handling 1.3 million | metrics every 15 seconds. Storage is not clustered but cost is | only $500/month. It would cost me $45k/month just for the ingest | with the new managed service. | mchusma wrote: | Their pricing for these managed services used to be "no | brainer" (something like the cost of compute only, or maybe a | <30% upcharge). Managed airflow was similarly very expensive | (maybe 3x the cost). Just not worth it. Bummer. | vishuk wrote: | Do we know which scalable prometheus backend are they running? | Chronosphere? Thanos? | bboreham wrote: | It's Cortex, though the particular configuration shares a lot | of code with Thanos. | | (I am a Cortex maintainer) | bmurphy1976 wrote: | The Grafana blog post mentions Cortex, something I'm not | familiar with: | | https://grafana.com/blog/2020/12/15/announcing-amazon-manage... ___________________________________________________________________ (page generated 2020-12-15 23:00 UTC)