[HN Gopher] Grafana releases OnCall open source project
       ___________________________________________________________________
        
       Grafana releases OnCall open source project
        
       Author : netingle
       Score  : 283 points
       Date   : 2022-06-14 15:28 UTC (7 hours ago)
        
 (HTM) web link (grafana.com)
 (TXT) w3m dump (grafana.com)
        
       | pphysch wrote:
       | Seems like a solid replacement for Alertmanager for those already
       | using Grafana OSS. Anyone planning on using both OnCall and
       | Alertmanager?
        
         | dString wrote:
         | Doesn't AlertManager evaluate metrics and fire alerts?
         | 
         | A quick look at OnCall suggests it is more for managing fired
         | alerts than firing alerts.
         | 
         | Their own screenshot has AlertManager as an alert source.
        
           | remram wrote:
           | Grafana used to be so simple, I don't know if I'm a fan of
           | this direction towards many services.
           | 
           | Having to run alertmanager and configure it in addition to
           | Grafana was bad enough, now you need to run and configure
           | another service if you want some extra functionalities for
           | those alerts? Are they going to keep maintaining
           | acknowledgements and scheduled silence in AlertManager now
           | that OnCall exists? Are we going to have "legacy
           | notifications" in AlertManager when not running OnCall, the
           | same way there are "legacy alerts" in Grafana when updating
           | from Grafana 7 (pre-AlertManager)?
        
           | pphysch wrote:
           | AlertManager does not do the evaluations, it does not connect
           | to any metrics database; those are done by Prometheus/etc and
           | forwarded to AlertManager, which handles deduplication and
           | routing among other things.
        
       | juliennakache wrote:
       | Looking forward to trying this out. I've always felt that
       | PagerDuty was absurdly expensive for the feature set they were
       | offering. It costs something at least $250 per user for
       | organization larger than 5 person - even if you're not an
       | engineer who is ever directly on call. At my previous company, IT
       | had to regularly send surveys to employees to assess if they
       | _really_ needed to have a PagerDuty account. Alerts are a key
       | information in an organization that runs software in production
       | and you shouldn 't have to pay $250 / month just to be able to
       | have some visibility into it. I'm hoping Grafana OnCall is able
       | to fully replace PagerDuty.
        
         | CSMastermind wrote:
         | > I've always felt that PagerDuty was absurdly expensive for
         | the feature set they were offering
         | 
         | For anyone out there in the same spot, I'll say that I switched
         | my last company to Atlassian's OpsGenie and it was a 10x cost
         | savings for the same feature set.
        
           | arccy wrote:
           | the opsgenie api is really bad though if you want to manage
           | it as code/declaratively
        
           | dijit wrote:
           | I really can't find myself to ever recommend atlassian
           | products though.
           | 
           | If cost is the only measure: I understand. But time lost in
           | various areas of the software package (performance alone!
           | Before we get into weird UX paradigms and esoteric query
           | languages, shoddy search systems etc;) surely has an impact
           | on cost. Having your employees spending a lot of time
           | navigating janky software has a cost too.
        
         | jlg23 wrote:
         | Thanks, I think I finally understand why some friends of mine,
         | who can implement this for any company in half a day, take
         | $2000/day...
        
         | motakuk wrote:
         | Check this ;)
         | https://github.com/grafana/oncall/tree/dev/tools/pagerduty-m...
        
       | ildari wrote:
       | Hey HN, Ildar here, one of the co-founders of Amixr and one of
       | the software engineers behind Grafana OnCall. Finally we open-
       | sourced the product I'm really excited about that. Please try it
       | out and leave your feedback
        
       | sandstrom wrote:
       | I think it would be great if it was easier to mix and match
       | Grafana SaaS and self-hosted products.
       | 
       | For example, we need to run Loki ourselves, for security /
       | privacy reasons, but wouldn't mind using hosted versions of
       | Tempo, Prometheus and OnCall.
       | 
       | Right now it isn't super-easy to link e.g. self-hosted loki
       | search queries with SaaS-Prometheus.
        
         | netingle wrote:
         | Its very much our aim to make this mix of self-hosted and cloud
         | services as easy as going all-cloud; but I agree we're not
         | quite there yet.
         | 
         | Do you mind if I ask what isn't super-easy about linking self-
         | hosted loki search queries with SaaS-Prometheus? You should be
         | e.g. able to add a Prometheus data source to your local Grafana
         | (or securely expose your Loki to the internet and add a Loki
         | data source to your Cloud Grafana)
        
       | [deleted]
        
       | this_was_posted wrote:
       | glad to hear this got open sourced!
       | 
       | for someone at grafana; noticed a dead link in the post:
       | https://grafana.com/docs/oncall/main/
        
       | nojito wrote:
       | Unfortunate that it's AGPL. But this is looks really great!
        
         | josephcsible wrote:
         | There's nothing unfortunate about the AGPLv3. Everything that
         | it doesn't let you do is stuff that you shouldn't be doing
         | anyway.
        
           | [deleted]
        
         | ucosty wrote:
         | Why is that unfortunate? Unless you're looking to make
         | proprietary changes to Grafana Oncall and host it as a SAAS,
         | it's the same as running any other GPL software.
        
           | nojito wrote:
           | GPL and its variants are a no go where I work.
        
             | ketralnis wrote:
             | To distribute I understand, but even just to use? Almost
             | any desktop OS you run has GPL code somewhere in it
        
               | dividedbyzero wrote:
               | Almost any desktop OS? I may be wrong but I don't think
               | Windows and macOS contain any GPL code.
        
               | warp wrote:
               | Doesn't Windows 10 ship with WSL2 now? (which includes a
               | full Linux kernel).
               | 
               | Apple still ships bash under GPLv2 on current macOS
               | versions. Apple hates GPLv3, which is why they're trying
               | to switch away from bash to zsh, but for the time being
               | they're still shipping bash.
        
             | eeZah7Ux wrote:
             | Then the problem is in the company and not in the license.
        
             | woadwarrior01 wrote:
             | Is Linux verboten at work?
        
               | to11mtm wrote:
               | Probably not.
               | 
               | Linux usually gets a pass, because most times you're just
               | deploying it and not mucking with source code.
               | 
               | But a lot of places (I've worked at more that do than
               | don't) will have rules about GPL/AGPL for libraries/infra
               | as a whole though. Often evaluated case-by-case, but it's
               | rare I've seen a AGPL stuff get approved for usage.
               | 
               | I think some of it is not wanting to deal with the cost
               | of vigilance; i.e. you can make sure that someone is
               | using %thing% in a way that doesn't run afoul of AGPL
               | right now, but does legal and upper management have
               | confidence in that being true forever and always?
               | Engineers are still human, and corporate management +
               | legal teams tend to hate licensing folk tromping around.
               | 
               | This results in refusals ranging from "This is internal
               | for now but we will open it up later" (a fair concern) to
               | "Somebody is worried that exposing it over the VPN to
               | contractors would count as making it public" (IDK, I'm
               | not a lawyer.)
        
               | ucosty wrote:
               | > Linux usually gets a pass, because most times you're
               | just deploying it and not mucking with source code.
               | 
               | That would apply for most uses of software, wouldn't it?
               | 
               | > This results in refusals ranging from "This is internal
               | for now but we will open it up later" (a fair concern) to
               | "Somebody is worried that exposing it over the VPN to
               | contractors would count as making it public" (IDK, I'm
               | not a lawyer.)
               | 
               | I've encountered variations of this problem at places I
               | have worked in. Education goes a long way to solving
               | this, and this example of simple usage of (A)GPL software
               | is easy enough to explain with examples.
        
               | nojito wrote:
               | Any Linux deploy is through RedHat but most local
               | development here is using windows.
               | 
               | No idea why Linux gets a pass though.
        
             | ucosty wrote:
             | Must be quite the paranoid business, given even tier 1
             | banks here (in the UK) will happily run GPL software.
        
             | matsemann wrote:
             | Running a service with a GPL license is different than
             | including their code in your projects, though. So while it
             | may be a blanket ban, it may be worth it to clarify the
             | scope of that ban.
        
         | bbkane wrote:
         | LinkedIn built and uses https://iris.claims/ . I don't know how
         | it compares to alternatives, but I find IRIS relatively pretty
         | easy to use.
        
         | acatton wrote:
         | https://drewdevault.com/2020/07/27/Anti-AGPL-propaganda.html
        
       | Equiet wrote:
       | It's surprising how seemingly difficult it is to build a good on-
       | call scheduling system. Everything I tried so far (not naming the
       | companies here) felt like the UX was the last thing on the
       | developers' minds. Which is tolerable during business hours but
       | really annoying at 2am.
       | 
       | Is there some hidden complexity or is it just a consequence of
       | engineers building a product for other engineers? Also, any tips
       | what worked for you?
        
         | matsemann wrote:
         | Have had lots of bad experiences with that from Pagerduty at
         | least. Want to generate a schedule far in advance, so people
         | know when they will be oncall and can plan/switch.
         | 
         | Of course, in a few months we may have some new people having
         | joined, some quit, or other circumstances. A single misclick
         | when fixing that can invalidate the whole schedule and generate
         | another. Infuriating.
         | 
         | Or the UI itself, might have become better tha last two years,
         | but having to click "next week" tens of times to see when I was
         | scheduled (since I wasnt just interested in my next scheduled
         | time but all of them) were annoying.
        
       | raffraffraff wrote:
       | Production helm chart link on this page leads to 404:
       | https://grafana.com/docs/grafana-cloud/oncall/open-source/#p...
        
       | Deritio wrote:
       | I like what grafana labs does with grafana.
       | 
       | Im annoyed by their license choice.
       | 
       | But apparently when you are grafana everything looks like a
       | dashboard UI?
       | 
       | Joke aside I will have a look but I didn't like the screenshots
       | before already. I like the dashboardy thing for dashboards but
       | otherwise it's not a really good UI system for everything else.
        
       | Maledictus wrote:
       | What I really want is an Android app that keeps alerting until a
       | page is ACKed or escalated.
        
         | machinerychorus wrote:
         | check out pushover, I use it for this exact case
         | 
         | https://pushover.net/
        
       | pphysch wrote:
       | A bit disappointed by the architecture -- it's a Django stack
       | with MySQL, Redis, RabbitMQ, and Celery -- for what is
       | effectively AlertManager (a single golang binary) with a nicer
       | web frontend + Grafana integration + etc.
       | 
       | I'm curious why/if this architecture was chosen. I get that it
       | started as a standalone product (Amixr), but in the current state
       | it is hard to rationalize deploying this next to Grafana in my
       | current containerless setting.
        
         | alex_dev wrote:
         | One of the most frustrating aspects of being a software
         | engineer is dealing with others that love to over-engineer.
         | Unfortunately, they make enough noise that complex solutions
         | are necessary that it gets managers scared about taking any
         | easier, simpler solutions.
        
         | skullone wrote:
         | That seems like a perfectly reasonable architecture. If only
         | all of us could work on battle tested components like those
         | during our job!
        
           | contravariant wrote:
           | For something that is supposed to add some more features to
           | the basic email/HTTP message alert like grafana generates, I
           | do wonder what extra features require an additional 2
           | databases, a message queue and a separate task queue.
        
             | skullone wrote:
             | probably keeps history, state, escalation flow, etc?
        
         | goodpoint wrote:
         | That's very bad. 99% of organizations don't have a volume of
         | alerts that justifies any of MySQL, Redis and RabbitMQ.
         | 
         | Complexity comes at a steep price when something critical (e.g.
         | OnCall) breaks and you have to debug it in a hurry.
         | 
         | Shoving everything in a container and closing the lid does not
         | help.
        
         | [deleted]
        
         | motakuk wrote:
         | I agree that multi-component architecture is harder to deploy.
         | We did our best and prepared tooling to make deployment an easy
         | thing.
         | 
         | Helm (https://github.com/grafana/oncall/tree/dev/helm/oncall),
         | docker-composes for hobby and dev environments.
         | 
         | Besides deployment, there are two main priorities for OnCall
         | architecture: 1) It should be as "default" as possible. No
         | fancy tech, no hacking around 2) It should deliver
         | notifications no matter what.
         | 
         | We chose the most "boring" (no offense Django community, that's
         | a great quality for a framework) stack we know well: Django,
         | Rabbit, Celery, MySQL, Redis. It's mature, reliable, and allows
         | us to build a message bus-based pipeline with reliable and
         | predictable migrations.
         | 
         | It's important for such a tool to be based on message bus
         | because it should have no single point of failure. If worker
         | will die, the other will pick up the task and deliver alert. If
         | Slack will go down, you won't loose your data. It will continue
         | delivering to other destinations and will deliver to Slack once
         | it's up.
         | 
         | The architecture you see in the repo was live for 3+ years now.
         | We were able to perform a few hundreds of data migrations
         | without downtimes, had no major downtimes or data loss. So I'm
         | pretty happy with this choice.
        
           | Deritio wrote:
           | Hearing your message bus assumption sounds like one of the
           | most ridiculous claims I heard.
           | 
           | Sorry but why is rabbitmq really necessary?
        
           | slotrans wrote:
           | You don't need Rabbit, Celery, or Redis. You should be able
           | to replace MySQL with SQLite. Then it would be _radically_
           | easier to deploy.
        
             | sergiomattei wrote:
             | It's curious to see people questioning the stack choices of
             | apps they haven't built yet and problems they haven't faced
             | either.
             | 
             | They chose this stack, it works for them. They've put it
             | through its paces in production.
             | 
             | It's as boring as it gets.
        
             | throwaway892238 wrote:
             | A MySQL database cluster, and a local copy of a SQL
             | database on a single file on a single filesystem, are not
             | close to the same thing. Except they both have "SQL" in the
             | name.
             | 
             | One of them allows a thousand different nodes on different
             | networks to share a single dataset with high availability.
             | The other can't share data with any other application,
             | doesn't have high availability, is constrained by the
             | resources of the executing application node, has obvious
             | performance limits, limited functionality, no commercial
             | support, etc etc.
             | 
             | And we're talking about a product that's intended for
             | dealing with on-call alerts. The entire point is to alert
             | when things are crashing, so you would want it to be highly
             | available. As in, running on more than one node.
             | 
             | I know the HN hipsters are all gung-ho for SQLite, but
             | let's try to reign in the hype train.
        
               | slotrans wrote:
               | I don't need _any_ of that stuff, and nor does anyone who
               | would use this. People who need clustered high-
               | availability stuff are _paying for PagerDuty or
               | VictorOps_.
               | 
               | This is for tiny shops with 4 servers. And tiny shops
               | with 4 servers don't have time to spin up a horrendous
               | stack like this. I was excited to see this announcement
               | until I saw all the moving pieces. No thanks!
        
               | Spivak wrote:
               | And this is the on-prem version of those tools. Just
               | because it isn't the tool you wanted doesn't mean it's
               | not good.
        
               | throwaway892238 wrote:
               | If you only have 4 servers, make a GitHub Action (or,
               | hell, since we're assuming one node with SQLite, a cron
               | job on one of your 4 servers) that _curl_ s your servers
               | every 5 minutes and sends you a text when they're down.
               | You don't need a Lamborghini to get groceries.
        
               | pphysch wrote:
               | This discussion is in the context of a self-contained app
               | called Grafana OnCall, which is built on Django, which
               | does not _particularly_ care which RDBMS you are using.
               | 
               | At the very least, SQLite should be the default database
               | for this product, and users can swap it out with their
               | MySQL database cluster if they really are Google-scale.
        
               | gjulianm wrote:
               | > The entire point is to alert when things are crashing,
               | so you would want it to be highly available. As in,
               | running on more than one node.
               | 
               | An important question to ask is how much availability are
               | you actually gaining from the setup. It wouldn't be the
               | first time I see a system moving from single-node to
               | multinode and being less available than before due to the
               | extra complexity and moving pieces.
        
               | [deleted]
        
           | gen220 wrote:
           | I think your decisions were reasonable, as is the opinion of
           | the person you're responding to.
           | 
           | To be fair, even in its current form, it should be possible
           | to operate this system with sqlite (i.e. no db server) and
           | in-process celery workers (i.e. no rabbit MQ) if configured
           | correctly, assuming they're not using MySQL-specific features
           | in the app.
           | 
           | Using a message bus, a persistent data store behind a SQL
           | interface, and a caching layer are all good design choices. I
           | think the OP's concern is less with your particular
           | implementations, and more with the principle of preventing
           | operators from bringing their own preferred implementation of
           | those interfaces to the table.
           | 
           | They mentioned that it makes sense because you were a
           | standalone product, so stack portability was less of a
           | concern. But as FOSS, you're opening yourself up to different
           | standards on portability.
           | 
           | It requires some work on the maintainer to make the
           | application tolerant to different fulfillments of the same
           | interfaces. But it's good work. It usually results in cleaner
           | separation of concerns between application logic and
           | caching/message bus/persistence logic, for one. It also
           | allows your app to serve a wider audience: for example, those
           | who are locked-in to using Postgres/Kafka/Memcached.
        
           | raffraffraff wrote:
           | Nothing wrong with that. I managed 7+ Sensu "clusters" at a
           | previous job, and it's stack was a ruby server, Redis and
           | RabbitMQ. But I completely ditched RabbitMQ and used Redis
           | for the queue and data. Simpler, more performant and more
           | reliable (even if the feature was marked _experimental_ ).
           | Our alerts were really spammy, and we had ~8k servers (each
           | running a bunch of containers) per cluster, so these things
           | were busy. Each cluster was 3x small nodes (6gb memory, 2CPU)
           | Memory usage was miniscule, typically <300mb. Any box could
           | be restarted without any impact because Redis just operated
           | in (failover) mode and Sensu was horizontally scalable.
           | 
           | I get why you would add a relational DB to the mix.
           | Personally, I'd like a Rabbit-free option.
        
         | minusf wrote:
         | not gonna argue that a single binary is the ultimate deploy
         | solution but running a django app is not that difficult
         | (although i am biased cause i do that for a living).
         | 
         | i love django projects but mysql, celery and rabbitmq -- no
         | thanks.
        
           | pphysch wrote:
           | Don't get me wrong, I love Django and think its a great
           | framework for writing internal tools like this. Redis gets a
           | pass too since Django has native support for it in 4.0+. It's
           | really the (IMHO unnecessary) combo of MySQL+RabbitMQ+Celery
           | that turns me off.
           | 
           | Redis itself has had solid support for building reliable
           | distributed task streaming for nearly 4 years (Redis
           | ConsumerGroups introduced in 2018).
        
         | lazyant wrote:
         | Curious as to what architecture you would have preferred or why
         | this pretty standard stack (that can be deployed to k8s) is not
         | giving you.
        
           | pphysch wrote:
           | Any of the following:
           | 
           | Python(Django)+Redis+[SQLite]
           | 
           | Python(Django)+Postgres
           | 
           | [Compiled Go binary]+[SQLite]
           | 
           | SQLite barely even counts as an architectural dependency TBH
           | :)
        
           | theptip wrote:
           | For a simple low-scale app you can often do without Redis and
           | Celery/RMQ if you just push everything into Postgres.
           | 
           | Far less scalable, but it is dramatically simpler to deploy.
           | Often gets you surprisingly far though. Would be interesting
           | to know how many monitored integrations could be supported by
           | that flow.
        
             | picozeta wrote:
             | How does a message queue work via Postgres? Many people
             | (including me) use Redis to run background jobs.
        
               | theptip wrote:
               | Here's the option I'm familiar with (siblings have others
               | too):
               | 
               | https://github.com/malthe/pq
               | 
               | Doesn't have all the plumbing you'd want, there is a
               | wrapper (https://github.com/bretth/django-pq/) that seems
               | to give you an entrypoint command more like `celery
               | worker ...` but I've not investigated it closely.
        
               | minusf wrote:
               | https://github.com/procrastinate-org/procrastinate
               | 
               | https://github.com/gavinwahl/django-postgres-queue
        
               | infogulch wrote:
               | lmgtfy https://www.crunchydata.com/blog/message-queuing-
               | using-nativ...
        
               | slotrans wrote:
               | This is a very confused question. The data store you keep
               | your queued items in is completely orthogonal to what a
               | message queue actually is.
               | 
               | A simple way to use an RDBMS as a message queue, that has
               | been in use since before most HN readers were born, is
               | roughly:                 - enqueue an item by inserting a
               | row into a table with a status of QUEUED       - use a
               | SELECT FOR UPDATE, or UPDATE...LIMIT 1, or similar, to
               | atomically claim and return the first status=QUEUED item,
               | while setting its status to RUNNING (setting a timestamp
               | is also recommended)       - when the work is complete,
               | update the status to DONE
               | 
               | There are more details to it obviously but that's the
               | outline.
               | 
               | The first software company I worked for was using this
               | basic approach to queue outbound emails (and phone and
               | fax... it was 2005!), millions per day, on an Oracle DB
               | that _also_ ran the entire rest of the business. It 's
               | not hard.
        
             | gjulianm wrote:
             | I bet quite a lot, probably at least 10-50 per second
             | without doing anything special for performance, i.e.
             | multiple queries per alert, calling different APIs, things
             | like that. I don't know of many places that are dealing
             | with alerts measured in "per second" as a unit.
             | 
             | Not to mention that having multiple components doesn't mean
             | it's "scalable" by default, it could happen that some part
             | of the pipeline doesn't like multiple instances of
             | something.
        
           | chrisandchris wrote:
           | Not OP, but one may interpret your response as "I don't
           | understand why you prefer a single binary over this
           | architecture that requires 6 different services and prefers
           | k8s".
           | 
           | IMHO, OP just stated that one could solve this with less
           | dependencies and have the same (if not a better) result.
        
             | pphysch wrote:
             | Yes, thank you. I would be surprised if this same product
             | couldn't be delivered with just Python(Django) + SQLite +
             | Redis (assuming writing everything in Go is unrealistic).
             | Spinning up a venv and launching a local Redis instance is
             | significantly more reasonable than having to configure
             | MySQL, RabbitMQ, and Celery.
        
             | lazyant wrote:
             | I missed that interpretation :(
             | 
             | IMHO a fat binary written from scratch would have been a
             | way worse choice than to use a standard stack, both in
             | terms of bugs and time, let alone Open Source contributions
             | or any scalability.
             | 
             | In terms of number of services, what do you get rid of that
             | produce a better result? maybe RMQ and use a worse queue?,
             | celery and write your own task manager or use another
             | dependency?
        
           | gjulianm wrote:
           | Installation in a regular system without Kubernetes? Right
           | now I can install Grafana, Prometheus and Alertmanager in a
           | regular Linux system using distribution packages, and just
           | worry about those programs themselves. If I want to install
           | OnCall, I need not only OnCall plus four other non-trivial
           | dependencies that will still need configuration, management
           | and troubleshooting. All for something that is going to deal
           | with far less load than any of
           | Grafana/Prometheus/Alertmanager. I honestly do not understand
           | it.
        
             | lazyant wrote:
             | you can install this stack without kubernetes no? I don't
             | see anything k8s-specific
        
               | heavyset_go wrote:
               | Yes, there is nothing Kubernetes specific here, and this
               | can be deployed using whatever container orchestration
               | system you want.
        
               | gjulianm wrote:
               | The problem still stands of adding dependencies, extra
               | complexity and configuration. I'm usually happy about
               | Grafana/Prometheus deployments because the base
               | installation is fairly simple and self-contained, but
               | this looks like a bit of a mess.
        
         | vhold wrote:
         | AlertManager is one component of a more complicated
         | infrastructure.
         | 
         | https://prometheus.io/docs/introduction/overview/#architectu...
         | 
         | https://kubernetes.io/docs/concepts/overview/components/
        
           | pphysch wrote:
           | OnCall also does nothing unless you have something external
           | firing alerts for you. They both fill similar niches in a
           | larger monitoring system; this does not excuse OnCall having
           | a drastically more complex internal architecture.
        
         | mkl95 wrote:
         | > Django stack with MySQL, Redis, RabbitMQ, and Celery
         | 
         | MySQL is a weird if not slightly disturbing choice. Other than
         | that it's a boring, battle-tested stack that is relatively easy
         | to scale. I agree that Go is nicer, but I'm biased by several
         | years of dealing with horrific Flask / Django projects.
        
         | heavyset_go wrote:
         | That's a tried and true stack, and a very good one for
         | maintaining sane levels of reliability, consistency, durability
         | etc. Resource wise, at least with Celery, RabbitMQ and Django,
         | they're also pretty lean.
         | 
         | It even ships in containers along with Docker Compose files and
         | Helm charts, which would suit the deployment use cases of 99%
         | of users. I understand that you're not using containers, but I
         | don't think that's a limitation that many are inflicting upon
         | themselves as of late, and if pressed, installing Docker
         | Compose takes about 5 minutes and you don't have to think about
         | it again.
        
         | MarquesMa wrote:
         | This. I find open source projects written in Go or Rust are
         | usually more pleasant to work with than Java, Django or Rails,
         | etc. They have less clunky dependencies, are less resource-
         | hungry, and can ship with single executables which make
         | people's life much easier.
         | 
         | Just think about Gitea vs GitLab.
        
           | matsemann wrote:
           | Not sure why you include java in that, as you mostly get a
           | standalone file. No such thing as a jre in modern java
           | deployment.
           | 
           | As for python, at least getting a dockerfile helps a lot.
           | Otherwise it's a huge mess to get running, yes.
           | 
           | Python is still a hassle anyways, since the lack of true
           | multithreading means that you often need multiple
           | deployments, which the Celery usage here for instance shows.
        
             | Volundr wrote:
             | > Not sure why you include java in that, as you mostly get
             | a standalone file. No such thing as a jre in modern java
             | deployment.
             | 
             | Maybe I'm behind the times, but I can't figure out what you
             | mean here. As far as I know 'java -jar' or servlets are
             | still the most common ways of running a Java app. Are you
             | talking graal and native image?
        
               | matsemann wrote:
               | For deploying your own stuff, most people do as before,
               | yes. But even then, it's at least still only a single jar
               | file, containing all dependencies. Not like a typical
               | python project where they ask you to run some command to
               | fetch dependencies and you have to pray it will work on
               | your system.
               | 
               | But using jlink for java, one can package everything to a
               | smaller runtime distributed together with the
               | application. So then I feel it will be not much different
               | than a Go executable.
               | 
               | > _The generated JRE with your sample application does
               | not have any other dependencies..._
               | 
               | > _You can distribute your application bundled with the
               | custom runtime in custom-runtime. It includes your
               | application._
               | 
               | From the guide here
               | https://access.redhat.com/documentation/en-
               | us/openjdk/11/htm...
        
             | FridgeSeal wrote:
             | Python application deployments are all fun and games until
             | suddenly the documentation starts unironically suggesting
             | that you should "write your configuration as a Python
             | script" that should get mounted to some random specific
             | directory within the app as if that could ever be a sane
             | and rational idea.
        
           | eeZah7Ux wrote:
           | Hell no, I want stuff like OnCall packaged into Linux
           | distribution. I need something stable and reliable and that
           | receive security fixes.
           | 
           | Maintaining tenths of binaries pulled from random github
           | projects over the years is a nightmare.
           | 
           | (Not to mention all the issues around supply chain
           | management, licensing issues, homecalling and so on)
        
             | morelisp wrote:
             | At this point I trust the Go modules supply chain
             | considerably more than any free distro's packaging, which
             | is ultimately pulling from GitHub anyway.
        
               | dijit wrote:
               | > At this point I trust the Go modules supply chain
               | considerably more than any free distro's packaging
               | 
               | What has happened in the package ecosystem to make you
               | believe this? Is it velocity of updates or actual trust?
               | 
               | I haven't heard of any malicious package maintainers.
        
               | eeZah7Ux wrote:
               | This is plain false. Most production-grade distribution
               | do extensive vetting of the packages, both in terms of
               | code and legal.
               | 
               | Additionally, distribution packages are tested by a
               | significant number of users before the release.
               | 
               | Nothing of this sort happens around any language-specific
               | package manager. You just get whatever happens to be
               | around all software forges.
               | 
               | Unsurprisingly, there has been many serious supply chain
               | attacks in the last 5 years. None of which affected the
               | usual big distros.
        
               | morelisp wrote:
               | No, Go modules implement a global TOFU checksum database.
               | Obviously a compromised upstream at initial pull would
               | not be affected, but distros (other than the well-scoped
               | commercial ones) don't do anything close to security
               | audits of every module they package either. Real-world
               | untargeted SCAs come from compromised upstreams, not
               | long-term bad faith actors. Go modules protects against
               | that (as well as other forms of upstream incompetence
               | that break immutable artifacts / deterministic builds).
               | 
               | MVS also prevents unexpected upgrades just because
               | someone deleted a lockfile.
        
       | goodpoint wrote:
       | It's very nice to see Python and AGPL used for this.
        
       | ucosty wrote:
       | Looks very cool, will have to give this a shot.
        
       | motakuk wrote:
       | Hello HN!
       | 
       | Matvey Kukuy, ex-CEO of Amixr and a head of the OnCall project
       | here. We've been working hard for a few months to make this OSS
       | release happen. I believe it should make incident response
       | features (on-call rotations, escalations, multi-channel
       | notifications) and best practices more accessible to the wider
       | audience of SRE and DevOps engineers.
       | 
       | Hope someone will be able to finally sleep well at night being
       | sure that OnCall will handle escalations and will alert the right
       | person :)
       | 
       | Please join our community on a GitHub! The whole Grafana OnCall
       | team is help you and to make this thing better.
        
         | knicholes wrote:
         | Being on-call has never made me sleep better at night!
        
           | krab wrote:
           | If I know someone else is on call and he's competent, I can
           | sleep better.
        
         | the_duke wrote:
         | The docs link [1] is 404.
         | 
         | Seems like the /main is the culprit.
         | 
         | [1] https://grafana.com/docs/oncall/main/.
        
           | motakuk wrote:
           | Fixed: https://grafana.com/docs/grafana-cloud/oncall/
        
       | pachico wrote:
       | I love Grafana, don't get me wrong, but I have the sensation they
       | are now in that position where, companies that got a massive
       | capital injection and, therefore, a massive increase of work
       | power, release too much and too soon.
       | 
       | It doesn't have anything to do, of course, with the fact that
       | this morning we suddenly found that all our dashboards stopped
       | working because we were upgraded to Grafana v9, for which there
       | is not a stable release nor documentation for breaking changes.
       | 
       | Luckily they rolled back our account.
        
         | danlimerick wrote:
         | I apologize for the disruption we caused you when rolling out
         | Grafana 9. We are working on improving our releases to Grafana
         | Cloud and also on making sure that errors due to breaking
         | changes in a major release won't affect customers in the
         | future. As a Grafana Cloud customer, you shouldn't need to read
         | docs about breaking changes when we upgrade your instance.
        
           | pachico wrote:
           | Dude, I hope you also read when I say that I love what you do
           | and your reply just confirms I'm putting my money in the
           | right hands.
           | 
           | I just wouldn't mind to be the last to upgrade to a newer
           | version :)
        
       | greatgib wrote:
       | I would give a huge marketing bullshit award for the following
       | sentence:
       | 
       | <<We offered Grafana OnCall to users as a SaaS tool first for a
       | few reasons. It's a commonly shared belief that the more
       | independent your on-call management system is, the better it will
       | be for your entire operation. If something goes wrong, there will
       | be a "designated survivor" outside of your infrastructure to help
       | identify any issues. >>
       | 
       | They tried to ensure that you use their SaaS offering because
       | they care more about your own good than yourself. So humanist...
        
         | ezrast wrote:
         | The point isn't that their infrastructure is more reliable than
         | yours, but that it's decoupled from yours. If you run your
         | monitoring on the same infra as production, it's liable to go
         | down when production does, i.e. just when you need it most.
         | This is a real reason to outsource monitoring to a SaaS, just
         | like there are real reasons to self-host.
         | 
         | I mean, obviously they chose to address the segment of the
         | market they could get more money out of first; I'm not
         | contesting that. But the bit you quoted is low-grade bullshit
         | at best. Hardly award-winning.
        
       | martypitt wrote:
       | Congrats - this looks great, and definitely something I was
       | wishing for during an incident earlier this week.
       | 
       | A minor note, if anyone from Grafana is around - a bunch of the
       | links on the bottom of the announcement go to a 404.
        
         | motakuk wrote:
         | We're fixing that, thank you ;)
        
       | googletron wrote:
       | Very cool. I love what the Grafana team is up to.
        
       | anyfactor wrote:
       | Here is the repo: https://github.com/grafana/oncall
       | 
       | AGPL 3.0
        
       ___________________________________________________________________
       (page generated 2022-06-14 23:00 UTC)