[HN Gopher] Grafana Incident: Smart incident management for your... ___________________________________________________________________ Grafana Incident: Smart incident management for your teams Author : matryer Score : 167 points Date : 2022-02-02 18:10 UTC (4 hours ago) (HTM) web link (grafana.com) (TXT) w3m dump (grafana.com) | palijer wrote: | >Automatically create the online meeting spaces for collaboration | | >Manage TODO items so nothing falls through the cracks | | I work in incident response, and I feel a huge misunderstanding | of incident response products fail to understand that companies | already have established tools for collaborations and meetings | and for capturing planned work. | | I find adding these things is seen as nice and inclusive and it | is easier to sell a product that does a lot, but it turns into | complete bloat and makes adoption harder, and makes it harder to | support a larger product. | quartz wrote: | This was a big learning for us when we were first building out | Kintaba[1]. | | Re: task management specifically-- having previously been at | FAANG companies that built all their own tools I had not | realized just how prevalent Jira is. It. is. EVERYWHERE. and IT | orgs at companies from 3 to 300,000 people are absolutely | married to their carefully customized version of it as a system | of record for everything that happens or will happen. | | We see many on-premise implementations as well despite the | announced sunsetting of that product. | | I'm sure there's a #2 and #3 out there but honestly I almost | never see it (we do see clubhouse/shortcut from time to time... | but even those folks tend to move to Jira within 6 months). | | OT but it really makes me doubly impressed that Slack was able | move into organizations so successfully from all corners such | that it was able to dodge what would traditionally be a pretty | big Atlassian-owned barrier. | | [1] shameless plug for our incident management tool @ | https://kintaba.com | dharmab wrote: | I've used products in this space that would integrate with your | existing video, chat and ticketing tools. | buscoquadnary wrote: | I think the problem is trying to present an abstraction layer | to management, because we have those same features of todo | lists, and recording information, in Jira and ServiceNow and | like a dozen other pieces that's purpose is to coordinate and | track work, and often they are unpopular with developers | because they end up trying to provide an abstraction layer to | the Execs to replace their management by spreadsheets, but | unfortunately as anyone who has worked in software for long | enough can tell you, abstractions are leaky. | | Hence the dissatisfaction with a lot of these tools. | aantix wrote: | Interesting take.. | | What do you think is the solution - when an enterprise already | has Jira, Github and Confluence, how do you think a product | like Grafana Incident should integrate with these somewhat | overlapping products? | ethbr0 wrote: | This feels like a central question of post-cloud / post-SaaS | outsourcing. | | In the end, it boils down to two options: offer deep APIs | into your product, or don't. | | IMHO, what needs to happen to support the former is for every | SaaS purchase to include full technical due diligence on | external integration capabilities. | | Integration needs to start being a headline feature in | purchasing. And less an afterthought when a horrified | engineer looks at some new enterprise product that's already | being adopted. | encryptluks2 wrote: | So does Grafana actually believe in open source or not? | sjwhitworth wrote: | capableweb wrote: | You'd do your job as a CEO better if you didn't spam | competitors HN threads with your own product, unless you have | something relevant to bring to the table. This comment just | looks like a shameless plug because you're in the same sector. | | One way you could approach is to highlight what you think is | good with Grafanas implementation, and what could be better, | and then contrast that with your own offering, without sounding | like a salesman. | burkaman wrote: | This is just incredibly rude. Please don't do it again. | JshWright wrote: | This is timely... I just started building out an internal | "chatops" solution that leans heavily on OnCall. Looks like I may | be able to set that aside. | | If this is implemented as cleanly as OnCall, I have high hopes. | It isn't without bugs, but it's already miles ahead of solutions | like Pager Duty (in my opinion). | btables wrote: | I'd checkout FireHydrant, but I'm biased ;) | JshWright wrote: | Yeah, there are definitely already products in this space, | but we're already invested in Grafana, so it makes sense to | lean in that direction, even if it meant a little custom work | on our end (though it looks like that may not be necessary | now) | bloodyplonker22 wrote: | PagerDuty is a product that has not evolved much at all in the | last 10 years, unfortunately. | jtlisi wrote: | This looks really sharp! Love the opinionated approach to how to | handle incidents with assigned roles! | amelius wrote: | It seems like this is a special case of project management | software. If the existing products can't handle incidents then | that software should be improved, not new software written. | It's the best way to ensure that everybody on the team knows | how to use the software when it's most urgently needed. | | E.g. would you change your favorite editor to a different one, | in case of an incident? Probably not. So why change project | management systems? | [deleted] | bastardoperator wrote: | Did we watch a different presentation? ChatOps isn't new. | What you're describing is what I would consider an antiquated | practice. Nobody wants to go sniffing around a PM tool at 3AM | in the morning. | hughrr wrote: | Zero here! | matryer wrote: | You must have solid tech. :) | JshWright wrote: | While you certainly could cobble together incident response | workflows in something like Jira, I think it makes more sense | to extend the monitoring and paging tooling (in large part | due to the reason you mention-- familiarity with the tools | that you're using as part of that response). | dijit wrote: | It's funny what process can do. | | 13 years ago I was working on a SaaS eCommerce platform and it | feels like this tool is a relatively minor improvement over what | we had built on top of IRC. | | That said; it's pretty cool and I'm definitely going to evaluate | it: as our current PagerDuty integration is not nearly as clean | as this. | cfors wrote: | I wish Grafana would stop trying to make offerings that already | exist and focus on making their dashboards and alerts as code | usable. | | I would even pay money for an actual offering that worked. | lukeqsee wrote: | Grafana Cloud is the best ROI money my startup spends every | month. | wernerb wrote: | Alert templating. Grafana is fussy about configuring alerts on | dashboards that have variables. What this means is if you have | 30 clusters and want to use a single dashboard with a drop-down | variable seefting your cluster you cannot define alerts on it. | It will refuse to do it. | | Alerts are also integrated tightly in dashboards. Forces alerts | to be saved/backedup/imported as single json blob. We want | separate management of alerts so they can be defined as code | and not in the dashboard blob of json! | | What makes me chagrined is because of the above issues we have | to use prometheus alert manager instead while our colleagues | absolutely LOVE grafana itself! We can't duplicate alerts tens | of tens times. We don't want that management nor do we want to | teach our colleagues jsonnet/ksonnet to generate it. We also | don't want permission problems. | mikewave wrote: | The new Grafana alerts do absolutely nothing to help with | this. | | I'm at the point where I would pay 5 figures a year for | something purely to do better alerting inside or alongside | Grafana. Clicking alerts together is a nightmare when I have | a ton of identical systems I need to configure. Same for | dashboards - the limitations of the current mechanism are too | severe. | | I'd build my own templating mechanism for it, but I still | want the alerts visible in Grafana itself. Zabbix has the | power to do all this but with a UX that is not ideal.... | gotjosh- wrote: | Hey there! I work with alerting in general at Grafana - what | are the pain points of dashboards and alerts as code you're | currently experiencing? Would love to deliver / capitalise on | the feedback. | wernerb wrote: | Alert templating. Grafana is fussy about configuring alerts | on dashboards that have variables. What this means is if you | have 30 clusters and want to use a single dashboard with a | drop-down variable seefting your cluster you cannot define | alerts on it. It will refuse to do it. | | Alerts are also integrated tightly in dashboards. Forces | alerts to be saved/backedup/imported as single json blob. We | want separate management of alerts so they can be defined as | code and not in the dashboard blob of json! | | What makes me chagrined is because of the above issues we | have to use prometheus alert manager instead while our | colleagues absolutely LOVE grafana itself! We can't duplicate | alerts tens of tens times. We don't want that management nor | do we want to teach our colleagues jsonnet/ksonnet to | generate it. We also don't want permission problems. | wernerb wrote: | I can't edit my above comment anymore but I see that at | least alerting is now a separate system in grafana 8! | Great, we will take a look again! | cfors wrote: | For one, I'm not convinced that the Grafana 8 Alerting API | Swagger docs are up to date or ready for the public [0]. | | I've literally copied an alert's json format, and then tried | to post it back and never got it to work. | | Here's an example from my bash history: | | > curl -X POST -H "Authorization: Bearer $GRAFANA_API_KEY" -H | "accept: application/json" -d @rule.json | some_endpoint/api/ruler/grafana/api/v1/rules/test1 | | I spent a solid day trying to play around with this to get it | to work. Because of this the alerts are impossible to code | review or store in a git source. Which stinks because | Grafana's datasource API's would be amazing to use for | alerting. But they're either unusable because anybody can | change them or the administrator could bork them at any given | point (which has happened before), or just undocumented to | the point where they are useless. | | That's not even to begin on dealing with the "big blob of | json" problem [1] that was clearly important enough to be | given an entire spot at GrafanaCon, but even Grafonnet is not | supported with Grafana 8. There is apparently some CUE way of | doing this, but I can't seem to find any official | documentation on that. | | Anyways, I've moved back to alertmanager for the time being. | | edit: is all of grafana labs downvoting the GP? this is very | honest and candid feedback here. | | [0]: https://editor.swagger.io/?url=https://raw.githubusercon | tent... | | [1]: https://grafana.com/go/grafanaconline/2021/dashboards- | as-cod... | wtfishackernews wrote: | It's currently impossible to write alert rules for Prometheus | vectors. https://github.com/grafana/grafana/issues/35663 | | Missing basic functionality like that is a dealbreaker. | antod wrote: | Will it always be a Grafana Cloud only offering? | netingle wrote: | For now, yes. Long term we're trying to offer everything we do | both on premise and in the cloud. It's a bit tricky, so we | can't say when.... | zbhoy wrote: | Have you heard of Replicated.com before? They might be able | to get y'all to both on premise and in the cloud at the same | time easier | chosenken wrote: | Would it be possible to have a split offering, with both on | prem and cloud? In my mind I would prefer to have things like | Prometheus, Logs, and Metrics stored on prem mainly due to | the volume of logs and metrics we create. Then use Grafana | cloud for Grafana Dashboards, Loki logs, and incident | management that pull directly from my on prem data stores. I | bring this up as it may be cost prohibitive for us to store | our metrics in the cloud ( we make so many metrics and logs! | ) but I would love to off load hosting the front end. Grafana | cloud takes care of managing and maintaining Grafana | Dashboard and backend database, Authentication, updates, ect. | I'm fine hosting Prometheus and Loki locally, have been for a | long time! I just get annoyed having to host Grafana and | setting it up, the database up, configuring auth, etc. | bboreham wrote: | I'm pretty sure that is doable today: Hosted Grafana with | data sources pointing at your on-prem Prometheus and Loki. | | https://grafana.com/docs/grafana-cloud/fundamentals/gs- | visua... | | (I work for Grafana Labs, but not on this part) | mikewave wrote: | Is there any hope of a Grafana Cloud data access proxy that | runs on prem and enables us to give the Cloud access to | databases we cannot expose? | netingle wrote: | Yes! It's something we've be mulling for a while, and I was | just talking to one of the PMs about it this morning. This | year for sure I hope. | matryer wrote: | Yeah, building for Grafana Cloud has big dev benefits too. We | can iterate quickly, run live experiments, and build a more | complicated stack (e.g. for ML tasks). We're going to be | integrating more and more with the rest of Grafana too. All of | this is much easier to do in one place. | encryptluks2 wrote: | It also has drawbacks like being locked into Saas products | that you don't have a lot of insight to. | shamiln wrote: | Seems like the industry is headed in that direction. | [deleted] ___________________________________________________________________ (page generated 2022-02-02 23:00 UTC)