[HN Gopher] Grafana Incident: Smart incident management for your...
       ___________________________________________________________________
        
       Grafana Incident: Smart incident management for your teams
        
       Author : matryer
       Score  : 167 points
       Date   : 2022-02-02 18:10 UTC (4 hours ago)
        
 (HTM) web link (grafana.com)
 (TXT) w3m dump (grafana.com)
        
       | palijer wrote:
       | >Automatically create the online meeting spaces for collaboration
       | 
       | >Manage TODO items so nothing falls through the cracks
       | 
       | I work in incident response, and I feel a huge misunderstanding
       | of incident response products fail to understand that companies
       | already have established tools for collaborations and meetings
       | and for capturing planned work.
       | 
       | I find adding these things is seen as nice and inclusive and it
       | is easier to sell a product that does a lot, but it turns into
       | complete bloat and makes adoption harder, and makes it harder to
       | support a larger product.
        
         | quartz wrote:
         | This was a big learning for us when we were first building out
         | Kintaba[1].
         | 
         | Re: task management specifically-- having previously been at
         | FAANG companies that built all their own tools I had not
         | realized just how prevalent Jira is. It. is. EVERYWHERE. and IT
         | orgs at companies from 3 to 300,000 people are absolutely
         | married to their carefully customized version of it as a system
         | of record for everything that happens or will happen.
         | 
         | We see many on-premise implementations as well despite the
         | announced sunsetting of that product.
         | 
         | I'm sure there's a #2 and #3 out there but honestly I almost
         | never see it (we do see clubhouse/shortcut from time to time...
         | but even those folks tend to move to Jira within 6 months).
         | 
         | OT but it really makes me doubly impressed that Slack was able
         | move into organizations so successfully from all corners such
         | that it was able to dodge what would traditionally be a pretty
         | big Atlassian-owned barrier.
         | 
         | [1] shameless plug for our incident management tool @
         | https://kintaba.com
        
         | dharmab wrote:
         | I've used products in this space that would integrate with your
         | existing video, chat and ticketing tools.
        
         | buscoquadnary wrote:
         | I think the problem is trying to present an abstraction layer
         | to management, because we have those same features of todo
         | lists, and recording information, in Jira and ServiceNow and
         | like a dozen other pieces that's purpose is to coordinate and
         | track work, and often they are unpopular with developers
         | because they end up trying to provide an abstraction layer to
         | the Execs to replace their management by spreadsheets, but
         | unfortunately as anyone who has worked in software for long
         | enough can tell you, abstractions are leaky.
         | 
         | Hence the dissatisfaction with a lot of these tools.
        
         | aantix wrote:
         | Interesting take..
         | 
         | What do you think is the solution - when an enterprise already
         | has Jira, Github and Confluence, how do you think a product
         | like Grafana Incident should integrate with these somewhat
         | overlapping products?
        
           | ethbr0 wrote:
           | This feels like a central question of post-cloud / post-SaaS
           | outsourcing.
           | 
           | In the end, it boils down to two options: offer deep APIs
           | into your product, or don't.
           | 
           | IMHO, what needs to happen to support the former is for every
           | SaaS purchase to include full technical due diligence on
           | external integration capabilities.
           | 
           | Integration needs to start being a headline feature in
           | purchasing. And less an afterthought when a horrified
           | engineer looks at some new enterprise product that's already
           | being adopted.
        
       | encryptluks2 wrote:
       | So does Grafana actually believe in open source or not?
        
       | sjwhitworth wrote:
        
         | capableweb wrote:
         | You'd do your job as a CEO better if you didn't spam
         | competitors HN threads with your own product, unless you have
         | something relevant to bring to the table. This comment just
         | looks like a shameless plug because you're in the same sector.
         | 
         | One way you could approach is to highlight what you think is
         | good with Grafanas implementation, and what could be better,
         | and then contrast that with your own offering, without sounding
         | like a salesman.
        
         | burkaman wrote:
         | This is just incredibly rude. Please don't do it again.
        
       | JshWright wrote:
       | This is timely... I just started building out an internal
       | "chatops" solution that leans heavily on OnCall. Looks like I may
       | be able to set that aside.
       | 
       | If this is implemented as cleanly as OnCall, I have high hopes.
       | It isn't without bugs, but it's already miles ahead of solutions
       | like Pager Duty (in my opinion).
        
         | btables wrote:
         | I'd checkout FireHydrant, but I'm biased ;)
        
           | JshWright wrote:
           | Yeah, there are definitely already products in this space,
           | but we're already invested in Grafana, so it makes sense to
           | lean in that direction, even if it meant a little custom work
           | on our end (though it looks like that may not be necessary
           | now)
        
         | bloodyplonker22 wrote:
         | PagerDuty is a product that has not evolved much at all in the
         | last 10 years, unfortunately.
        
       | jtlisi wrote:
       | This looks really sharp! Love the opinionated approach to how to
       | handle incidents with assigned roles!
        
         | amelius wrote:
         | It seems like this is a special case of project management
         | software. If the existing products can't handle incidents then
         | that software should be improved, not new software written.
         | It's the best way to ensure that everybody on the team knows
         | how to use the software when it's most urgently needed.
         | 
         | E.g. would you change your favorite editor to a different one,
         | in case of an incident? Probably not. So why change project
         | management systems?
        
           | [deleted]
        
           | bastardoperator wrote:
           | Did we watch a different presentation? ChatOps isn't new.
           | What you're describing is what I would consider an antiquated
           | practice. Nobody wants to go sniffing around a PM tool at 3AM
           | in the morning.
        
           | hughrr wrote:
           | Zero here!
        
             | matryer wrote:
             | You must have solid tech. :)
        
           | JshWright wrote:
           | While you certainly could cobble together incident response
           | workflows in something like Jira, I think it makes more sense
           | to extend the monitoring and paging tooling (in large part
           | due to the reason you mention-- familiarity with the tools
           | that you're using as part of that response).
        
       | dijit wrote:
       | It's funny what process can do.
       | 
       | 13 years ago I was working on a SaaS eCommerce platform and it
       | feels like this tool is a relatively minor improvement over what
       | we had built on top of IRC.
       | 
       | That said; it's pretty cool and I'm definitely going to evaluate
       | it: as our current PagerDuty integration is not nearly as clean
       | as this.
        
       | cfors wrote:
       | I wish Grafana would stop trying to make offerings that already
       | exist and focus on making their dashboards and alerts as code
       | usable.
       | 
       | I would even pay money for an actual offering that worked.
        
         | lukeqsee wrote:
         | Grafana Cloud is the best ROI money my startup spends every
         | month.
        
         | wernerb wrote:
         | Alert templating. Grafana is fussy about configuring alerts on
         | dashboards that have variables. What this means is if you have
         | 30 clusters and want to use a single dashboard with a drop-down
         | variable seefting your cluster you cannot define alerts on it.
         | It will refuse to do it.
         | 
         | Alerts are also integrated tightly in dashboards. Forces alerts
         | to be saved/backedup/imported as single json blob. We want
         | separate management of alerts so they can be defined as code
         | and not in the dashboard blob of json!
         | 
         | What makes me chagrined is because of the above issues we have
         | to use prometheus alert manager instead while our colleagues
         | absolutely LOVE grafana itself! We can't duplicate alerts tens
         | of tens times. We don't want that management nor do we want to
         | teach our colleagues jsonnet/ksonnet to generate it. We also
         | don't want permission problems.
        
           | mikewave wrote:
           | The new Grafana alerts do absolutely nothing to help with
           | this.
           | 
           | I'm at the point where I would pay 5 figures a year for
           | something purely to do better alerting inside or alongside
           | Grafana. Clicking alerts together is a nightmare when I have
           | a ton of identical systems I need to configure. Same for
           | dashboards - the limitations of the current mechanism are too
           | severe.
           | 
           | I'd build my own templating mechanism for it, but I still
           | want the alerts visible in Grafana itself. Zabbix has the
           | power to do all this but with a UX that is not ideal....
        
         | gotjosh- wrote:
         | Hey there! I work with alerting in general at Grafana - what
         | are the pain points of dashboards and alerts as code you're
         | currently experiencing? Would love to deliver / capitalise on
         | the feedback.
        
           | wernerb wrote:
           | Alert templating. Grafana is fussy about configuring alerts
           | on dashboards that have variables. What this means is if you
           | have 30 clusters and want to use a single dashboard with a
           | drop-down variable seefting your cluster you cannot define
           | alerts on it. It will refuse to do it.
           | 
           | Alerts are also integrated tightly in dashboards. Forces
           | alerts to be saved/backedup/imported as single json blob. We
           | want separate management of alerts so they can be defined as
           | code and not in the dashboard blob of json!
           | 
           | What makes me chagrined is because of the above issues we
           | have to use prometheus alert manager instead while our
           | colleagues absolutely LOVE grafana itself! We can't duplicate
           | alerts tens of tens times. We don't want that management nor
           | do we want to teach our colleagues jsonnet/ksonnet to
           | generate it. We also don't want permission problems.
        
             | wernerb wrote:
             | I can't edit my above comment anymore but I see that at
             | least alerting is now a separate system in grafana 8!
             | Great, we will take a look again!
        
           | cfors wrote:
           | For one, I'm not convinced that the Grafana 8 Alerting API
           | Swagger docs are up to date or ready for the public [0].
           | 
           | I've literally copied an alert's json format, and then tried
           | to post it back and never got it to work.
           | 
           | Here's an example from my bash history:
           | 
           | > curl -X POST -H "Authorization: Bearer $GRAFANA_API_KEY" -H
           | "accept: application/json" -d @rule.json
           | some_endpoint/api/ruler/grafana/api/v1/rules/test1
           | 
           | I spent a solid day trying to play around with this to get it
           | to work. Because of this the alerts are impossible to code
           | review or store in a git source. Which stinks because
           | Grafana's datasource API's would be amazing to use for
           | alerting. But they're either unusable because anybody can
           | change them or the administrator could bork them at any given
           | point (which has happened before), or just undocumented to
           | the point where they are useless.
           | 
           | That's not even to begin on dealing with the "big blob of
           | json" problem [1] that was clearly important enough to be
           | given an entire spot at GrafanaCon, but even Grafonnet is not
           | supported with Grafana 8. There is apparently some CUE way of
           | doing this, but I can't seem to find any official
           | documentation on that.
           | 
           | Anyways, I've moved back to alertmanager for the time being.
           | 
           | edit: is all of grafana labs downvoting the GP? this is very
           | honest and candid feedback here.
           | 
           | [0]: https://editor.swagger.io/?url=https://raw.githubusercon
           | tent...
           | 
           | [1]: https://grafana.com/go/grafanaconline/2021/dashboards-
           | as-cod...
        
           | wtfishackernews wrote:
           | It's currently impossible to write alert rules for Prometheus
           | vectors. https://github.com/grafana/grafana/issues/35663
           | 
           | Missing basic functionality like that is a dealbreaker.
        
       | antod wrote:
       | Will it always be a Grafana Cloud only offering?
        
         | netingle wrote:
         | For now, yes. Long term we're trying to offer everything we do
         | both on premise and in the cloud. It's a bit tricky, so we
         | can't say when....
        
           | zbhoy wrote:
           | Have you heard of Replicated.com before? They might be able
           | to get y'all to both on premise and in the cloud at the same
           | time easier
        
           | chosenken wrote:
           | Would it be possible to have a split offering, with both on
           | prem and cloud? In my mind I would prefer to have things like
           | Prometheus, Logs, and Metrics stored on prem mainly due to
           | the volume of logs and metrics we create. Then use Grafana
           | cloud for Grafana Dashboards, Loki logs, and incident
           | management that pull directly from my on prem data stores. I
           | bring this up as it may be cost prohibitive for us to store
           | our metrics in the cloud ( we make so many metrics and logs!
           | ) but I would love to off load hosting the front end. Grafana
           | cloud takes care of managing and maintaining Grafana
           | Dashboard and backend database, Authentication, updates, ect.
           | I'm fine hosting Prometheus and Loki locally, have been for a
           | long time! I just get annoyed having to host Grafana and
           | setting it up, the database up, configuring auth, etc.
        
             | bboreham wrote:
             | I'm pretty sure that is doable today: Hosted Grafana with
             | data sources pointing at your on-prem Prometheus and Loki.
             | 
             | https://grafana.com/docs/grafana-cloud/fundamentals/gs-
             | visua...
             | 
             | (I work for Grafana Labs, but not on this part)
        
           | mikewave wrote:
           | Is there any hope of a Grafana Cloud data access proxy that
           | runs on prem and enables us to give the Cloud access to
           | databases we cannot expose?
        
             | netingle wrote:
             | Yes! It's something we've be mulling for a while, and I was
             | just talking to one of the PMs about it this morning. This
             | year for sure I hope.
        
         | matryer wrote:
         | Yeah, building for Grafana Cloud has big dev benefits too. We
         | can iterate quickly, run live experiments, and build a more
         | complicated stack (e.g. for ML tasks). We're going to be
         | integrating more and more with the rest of Grafana too. All of
         | this is much easier to do in one place.
        
           | encryptluks2 wrote:
           | It also has drawbacks like being locked into Saas products
           | that you don't have a lot of insight to.
        
         | shamiln wrote:
         | Seems like the industry is headed in that direction.
        
           | [deleted]
        
       ___________________________________________________________________
       (page generated 2022-02-02 23:00 UTC)