[HN Gopher] Car alarms and smoke alarms: the tradeoff between se... ___________________________________________________________________ Car alarms and smoke alarms: the tradeoff between sensitivity and specificity Author : lngarner Score : 52 points Date : 2023-04-11 19:37 UTC (3 hours ago) (HTM) web link (blog.danslimmon.com) (TXT) w3m dump (blog.danslimmon.com) | rtkwe wrote: | It's a constant pain of mine to try to get people to stop having | business as usual or successfully completed $PROCESS emails come | out of our batch processes on our teams at work. They absolutely | drown my inbox so I'm forced to filter them then the actual | failures get buried in the unchecked "batch spam" folders. | hevans66 wrote: | My pet peeve is these $PROCESS notifications that go to slack | channels. I worked at a company that had an #engineering_humans | slack channel because we got chased out of #engineering by | bots. | justin_oaks wrote: | I'm fine if they go to THEIR OWN slack channel. Then I can | mute or leave that channel. | | Of course, it's a different problem if those notifications | have a mix of actionable and non-actionable messages (e.g. | both success and error messages). Then it's a signal/noise | problem. | WrtCdEvrydy wrote: | The one that pushes buttons is the alarms that have no docs | attached so when they blow off at 2AM, they just get muted | until someone comes in and complains at 6AM. | justin_oaks wrote: | I had a boss who had an inbox with literally hundreds of | thousands of unread emails. A good chunk of those emails were | "success" messages from batch processes. | | It's quite correct to send a "success" message when a batch | process is completed successfully, but it's quite wrong to send | that message to a human. It should be sent to a machine that | should translate a missing success message into an error | message/alert for humans to respond to. | | For example, I have a set of nightly backup jobs. The last step | of each backup process is to send a success message to my | monitoring system. I only get a "Missing Backup" alert when the | monitoring system detects that it didn't receive the success | message it expected for a particular backup. | | My old boss didn't seem to understand the concept that people | don't generally notice missing messages. Or he was too | lazy/incompetent to use a monitoring system that could | translate gaps in successes into errors. | sammalloy wrote: | The entire car alarm industry is a scam, promoted by Republican | congressman Darrell Issa. It has seriously disrupted our lives in | every way imaginable and has drowned out the beauty of nature. I | can't think of a single car that has been protected by a car | alarm since they were invented. They are useless and should be | banned for the health and safety of mitigating noise pollution. | GuB-42 wrote: | > I can't think of a single car that has been protected by a | car alarm since they were invented. | | Many insurance companies offer lower premiums if you install a | car alarm. So I guess they work at least a little, otherwise | they wouldn't lower their premiums. | | It may not actually stop a thief, but it may get a thief to | chose a car that doesn't have an alarm, or maybe it is just a | correlation, but there is at least something. | | Still, I think they should be made illegal, they are a | nuisance, there are already laws against making excessive noise | and car alarms should be included. And if they create an arms | race, by getting thieves to prefer cars without alarms, that's | even more reason to ban them. | BalinKing wrote: | > It has seriously disrupted our lives in every way imaginable | | I assume this is one of those things that changes dramatically | based on where you live--for me (western US), this statement | seems almost comically exaggerated. | dahfizz wrote: | Yeah, I can't remember the last time I heard a car alarm. | birdyrooster wrote: | Visit a city | Eji1700 wrote: | I live in one with more than 2 million people and it's | something that I have to think hard about to remember the | last time I heard one go off. | | Longer if I have to think of one that went off and wasn't | some form of 'oh shit oh shit oh shit, wrong button' | reaction from the person trying who accidentally turned | it on. | tayo42 wrote: | I am jealous, I hear atleast one everyday. I live in an | apartment complex in a suburb. I get woken up by them | sometimes too | jimbob45 wrote: | For poor people whose ability to live depends on having a car, | car alarms must be at least _sort_ of useful to know if your | car is being stolen at night. I'm sure they're just a noisy | inconvenience to the wealthy though. | izacus wrote: | Does the alarm ever prevent theft? | thrashh wrote: | If I'm living in an area where they don't go off often | (like right now) and a car alarm woke me up, I would | definitely check. | | And I imagine if I triggered a car alarm, I would back off. | yafbum wrote: | I'd like to know more about the chip designer who, perhaps | unwittingly, created the alarm-filled soundscape of most American | cities https://youtu.be/tmCnleSBAIg. Would love to know more | about the composition process that went into it. | tra3 wrote: | I need to sit down and go through the math again, I got lost in | the middle somewhere. All I know is our alerts are way too noisy | now to the point where they are useless. | nh23423fefe wrote: | I dunno, article doesn't seem to want me to understand. It's | just another, "here's a random stats calculation you cant | perform in your head, isnt the english a bad way to describe | this calculation?!?!? your intuition sucks when i dont explain | myself....." | sparrish wrote: | Alert fatigue... it's common when alerts are non-actionable and | it causes a lot of downtime. | bluGill wrote: | We write up stories to fix them, and upper management tracks | progress on completion so they are not buried in the backlog. | hevans66 wrote: | Yes! This. This has happened to me at least two previous | companies I have worked at. Everybody sets up thresholds on | every possible Datadog metric and alerts become useless. That's | part of the ethos of monitoring at my current company. We only | set up alerts through https://heiioncall.com/ that we are | convinced you absolutely need to look at right now. Anything | that is not that gets shoved to a slack channel (that I have | long since muted). | Syonyk wrote: | Now, if you're annoyed by the false positive rate on your _actual | smoke alarms,_ go replace the one nearest your kitchen with a | photoelectric type, not the standard ionization type that 's | cheaper, the default style installed, and ought to be illegal in | homes (IMO). | | There's been quite a bit of research done, generally easy to find | if you look, that talks about the difference and tests them, but | the short summary: | | - Ionization type sensors detect the products of fast flaming | combustion and "things cooking in the kitchen." Your oven, if a | bit dirty, will reliably trip an ionization type. They are quick | on the draw for this. The downside is that they're very, very | poor at detecting the sort of slow, smoking, smoldering | combustion that is associated with house fires that kill people | in the middle of the night. | | - The photoelectric type is very good at detecting smoke in the | air - but it isn't nearly as prone to false triggers on ovens, a | burner burning some spills off, etc. | | They've been A/B tested in a wide variety of conditions, and in | some cases, the ionization type is a bit quicker. In other cases, | the ionization type is slower, by time ranges north of _half an | hour_ - I 've seen some test reports where there was a 45 minute | gap, while the photoelectric type was going off, before the | ionization type fired! | | In general, "rapid fires during the day" are somewhat destructive | to property, but rarely kill people. If your kitchen catches on | fire while you're cooking, it may burn the house down, but | generally people are able to get out. | | The fires that kill people are "slow starting fires during the | night" - the sort that smolder for potentially hours, often | slowly filling the house with toxic smoke, before actually | bursting into open flames. On this sort of fire, the | photoelectric type will fire long, long before the ionization | type - in some cases, they get around to alarming quite literally | "after the occupants are dead from the smoke." | | Using smoke alarms as a way to talk about monitoring systems is | nice, but in terms of actual smoke detectors, get at least a few | photoelectric sorts in the main areas of your home. | | Do _not_ get the "combined sensor" sort, since these tend to be | and-gated and the worst of both worlds. | | Edited to add some resources: | | A presentation on the matter from a while back by one of the | experts in this field: | https://wahigroup.com/Resources/Documents/Ion%20vs%20Photo%2... | | Another paper: https://www.semanticscholar.org/paper/Detection- | of-Smoke-%3A... | | > _Full-scale fire tests are carried out to study the | effectiveness of the various types of smoke detectors to provide | an early warning of a fire. Both optical smoke detectors and | ionization smoke detectors have been used. Alarm times are | related to human tenability limits for toxic effects, visibility | loss and heat stress. During smouldering fires it is only the | optical detectors that provide satisfactory safety. With flaming | fires the ionization detectors react before the optical ones. If | a fire were started by a glowing cigarette, optical detectors are | generally recommended. If not, the response time with these two | types of detectors are so close that it is only in extreme cases | that this difference between optical and ionization detectors | would be critical in saving lives._ | bluGill wrote: | The law requires you have both types of good reason. Either | alone will detect less than half of all house fires. | | Dual sensors are not and gated. While nobody will admit what | algorithm they use, they detect most fires unlike the single | sensor type. | riceart wrote: | Lol "the law" .. what law? Maybe in some dumb ass | jurisdiction - but you're a bit full of yourself if you think | where you happen to live is "the law". | bluGill wrote: | Us fire code, though inspectors often don't check. | Syonyk wrote: | As far as I can tell, it's state by state, so... you want | to cite some sources? | Syonyk wrote: | Where does the law require both types? I'm not aware of any | housing codes specifically requiring photoelectric types, and | any house I've looked at, including mine, came with purely | ionization types. Though it's been a few years, and it may | have changed recently - this is less of a niche concern | lately. | | As for dual sensors and gating... do you actually trust your | life to "nobody will admit what algorithm they use"? | | My house has all the smoke detectors wired together (they're | on an AC circuit, with battery backup, with a signal line | running between them all), so I have some photoelectric and | some ionization, depending on where in the house they are. | compumike wrote: | I do like how the author presents the case for how damaging | false-positives can be in SRE monitoring. But, FYI, it can get | worse if these monitors are hooked to self-actuating feedback | loops! I recently wrote about a production incident on the Heii | On-Call blog, in the context of witnessing how Kubernetes | liveness probes and CPU limits worked together to create a self- | reinforcing CrashLoopBackOff. [1] Partially because the liveness | probe thresholds (timeoutSeconds and failureThreshold fields) | were too aggressive. | | We have a similar message about setting monitoring thresholds in | our documentation [2] because users have to explicitly specify a | downtime timeout before they're alerted about their website / API | endpoint / cron job being down. The timeout / "grace period" is | necessary because in many cases a failure is some transient | network glitch which will fix itself before a human is alerted. | | If you make the timeout too short, you'll get lots of false | positive alerts, and as the article says, your on-call engineers | will be overwhelmed or just start ignoring the alerts. | | If you make the timeout too long, it just takes that many minutes | of downtime longer before you find out about it. | | It may sound counterintuitive, but the latter is usually | preferable. :) | | [1] https://heiioncall.com/blog/kubernetes-liveness-probes- | and-c... | | [2] https://heiioncall.com/docs | raldi wrote: | When the oncall gets paged, an SLO should be in jeopardy in a way | that requires immediate measures to be taken by a well-trained | human as described in actionable terms in a linked playbook. | | No SLO in jeopardy, or no immediate measure that needs to be | taken? Don't page the oncall; send a low-priority ticket for the | service owner to investigate the next business day. | | Steps need to be taken, but they're mechanical in nature or | otherwise don't give the SRE an opportunity to exercise their | brain in an interesting fashion? Replace the alert with an | automated handler that only pages the oncall if it encounters an | exception. | | No playbook, or the playbook consists of useless non-actionable | items like, "This alert means the service is running out of | frobs"? Write a playbook that explains what the oncall is | expected to _do_ when the service needs frobs. | | Edit: A dead reply asks if I've ever experienced a novel | incident. Of course. Say, for instance, a "This should never | happen" error-level log is suddenly happening like crazy, for the | first time ever. In that case, you page the oncall, they do their | best to debug it, see if they can reach the SWE service owners, | read through the code to see if it could be an indicator that | SLOs are being violated (e.g., user data corruption) or might be | violated soon, and then write a stub playbook to be fleshed out | the next business day, probably alongside a code change to handle | this situation without spamming the logs so much. | matthew9219 wrote: | [dead] | fatnoah wrote: | In a previous life as a full-stack Engineer at a startup, this | was my white whale. The state of logging, monitoring, and | alerting was such that signal quality was low, and only | indirect observations of the system were possible since the | logging was borderline useless. The result was multiple pages | per night, with each one resulting in a scavenger hunt because | signal was so low that it was nigh impossible to even identify | what playbook to run. | | For example, the web application crashing was logged as a DEBUG | statement, but starting was logged at an ERROR level. This was | clearly done at some point because DEBUG generated far too much | log info w/millions of active users, but some Engineer wanted | to know that the app started. Gross. | | I solved for this by doing a couple things. The first was to | define standards for log levels, ability to correlate log | statements with each other for a given request, and to define | the level of context a "proper" log level should provide. | | For example, FATAL = there's no way anything can work properly. | These are pretty rare, but incorrect configuration values were | a common culprit. ERROR indicates something, possibly transient | going wrong. Every now and then, not a big deal that can wait | until later, but a rapid accumulation could mean something more | serious is going on. INFO contained information about the state | of the system, such as general measures of activity and other | signals to indicate the system is working as expected. Most of | our metrics capture was instrumented based off these | statements. | | In terms of the messages, we rapidly evolved the quality of the | messages. For something like the aforementioned configuration | error, the system initially just spat out an "Unexpected error" | and a module name. The first improvement then stated something | like "invalid configuration value" and finally we ended up on a | message that stated the value was incorrect, identified which | configuration value was wrong, and had a code that referenced | documentation and escalation owner. | | When all was said and done, we'd reduced our downtime from | hours per year to less than 5 minutes, eliminated over 95% of | our pages, and reduced escalations to Engineering from several | days per week to a level where it was hard to remember the last | one. | | As the head of Engineering, I had to fight an uphill battle | against the product & sales team for almost a year to make all | of this happen, but I was fully vindicated when we were | acquired and our operational maturity was lauded during the due | diligence process. | peteradio wrote: | You know all that work was worth it when you get a good | lauding. | yamtaddle wrote: | > When presented with this tradeoff, the path of least resistance | is to say "Let's just keep the threshold lower. We'd rather get | woken up when there's nothing broken than sleep through a real | problem." And I can sympathize with that attitude. Undetected | outages are embarrassing and harmful to your reputation. Surely | it's preferable to deal with a few late-night fire drills. | | > It's a trap. | | > In the long run, false positives can -- and will often -- hurt | you more than false negatives. Let's learn about the base rate | fallacy. | | Not sure about anyone else, but speaking of alarms, this style of | writing trips my "self-promoting snake-oil Internet bullshitter" | alarm. It's like nails on a damn chalkboard, and if you're | writing like this, you've already lost me; however, maybe I ought | not be pointing that out, since signals are nice to have. | | Incidentally, I wasn't sure which way the author was gonna go | with the core analogy. My smoke alarms have false-alarmed | probably 10x as much as my car alarm, even counting times one of | us has hit the alarm button on the fob by accident. I've | certainly never been so annoyed by my car alarm that I've ripped | it out and stuck it in a freezer, as I have with a smoke alarm. | | (If I were writing like the author I suppose that last part would | have read: | | "I've certainly never been so annoyed by my car alarm that I've | ripped it out and stuck it in a chest freezer. | | I have, with a smoke alarm." | | Except also I'd have found a way to use "we" and "you" a bunch.) | raldi wrote: | What do you mean by "this style of writing"? What aspects of | the quote do you object to? | jacquesm wrote: | At a guess the bit 'let's learn about the base rate fallacy'. | yamtaddle wrote: | Short, choppy sentences, lots of second-person, dropping a | "punch-line" sentence to its own paragraph like they're a | fucking magician revealing the card you pulled earlier. | It's some kind of cross between transparent rapport- | building sales-psychology crap and setting off a fireworks | display to celebrate your successfully assembling a PB&J. | | Like listening to a used car salesman tell a mundane story | about their morning commute. | | But full of unearned and over-the-top dramatic pauses. | burnished wrote: | Im not sure what you are responding to in the quoted text but | after reading the article I think I can assure you that the | author isnt selling you anything more salacious than you would | find in a more interesting introduction to probability and | statistics lecture. | quickthrower2 wrote: | I see a lot of this style of writing in articles submitted on | HN. I think they are just trying to make the writing more | lively, not trying to BS. | | A trope of this style is "{interesting half story} but more on | that later". | | I don't think it is a big deal and I don't see much self | promotion here other than vanilla blogging, i.e. sounds like | this person is knowledgeable let's check their bio. | gmuslera wrote: | Some complementary reading could be My Philosophy on Alerting ( | https://docs.google.com/document/d/199PqyG3UsyXlwieHaqbGiWVa... ) | and https://how.complexsystems.fail/ | | In any case, not all signals are the same. Most systems have a | lot of components interacting and what turns to be dangerous is | usually a combination of factors, but in the end, what defines | that it was or not is that the system is doing what it should. | You can put some guessing thresholds, but you must contrast it | with that the system works. | | And they should be actionable too, at least for alerts instead of | slow day notifications, or metrics giving context to perceived | problems that could take out the guessing from the thresholds. | mertd wrote: | The post is somewhat incomplete without also discussing the cost | of the wrong decision. | | You obey the smoke alarm because the cost of ignoring the alarm | when it is a true positive is potentially infinite (you die). You | ignore the car alarm because (1) most likely it is a false | positive but also (2) most likely it is somebody else's car. | [deleted] | dfox wrote: | Smoke(/fire in general) alarms are not a good example of a thing | with high specificity. You perceive it that way, but what you see | is the result of somebody getting paged about it and then | checking (preferably physically, but also through eg. CCTV) | whether there really is an emergency situation and canceling the | alarm before its escalation timeout. Apparently, for typical | commercial building false fire alarms are more or less an weekly | occurrence. | | Edit: in large scale fire alarm systems there also are rules | about combinations of triggered sensors that cause immediate | escalation (if there is smoke and elevated temperature in two | adjacent zones, it probably is not a false alarm and such things, | often it even takes into account the failure modes of the | physical alarm loop wiring). This is an interesting idea for IT | monitoring: page someone only when multiple metrics indicate an | issue. | tobyjsullivan wrote: | It was an interesting example and maybe deserved a few more | caveats to actually serve the point. After all, we've all heard | a fire alarm of some sort in the past year (if not the past | month) but how many were actual fires? (Technically the author | said smoke which helps but not really.) | | Where I was expecting the author to go: | | - Clearly was talking about residential smoke detectors, not | commercial. That could have been explicit. | | - Smoke detectors do have a high false-positive rate but almost | always at the _right time_. A home smoke alarm going off while | I 'm cooking is quite different to a smoke alarm going off when | I'm sleeping. To the author's point, there are very few false | positives while I'm sleeping so when they happen, I'm getting | up. | | Speaking of the commercial context, I wonder what sort of | businesses would get a lot of false alarms and how that varies | across industries. | cbarrick wrote: | I think this article is missing the forest for the trees. | | The article is about finding the appropriate sensitivity of | alerts on some signal in order to maximize the predictive value. | | But you should care more about the quality of the signals you are | monitoring than about the sensitivity of your thresholds. | | The article mentions load-average as an example signal, but to | me, that's a poor signal to monitor. Instead, if your SLO is | defined for error rate, alert on error rate. | | Alerts on your SLO will have a high predictive value for | predicting violations of your SLO, by definition. The tunable | parameter here is the time window, not the threshold. E.g. if | your error budget is defined for a 30d window, you may want | alerts at the SLO threshold for 24h and 1h windows. | | Alert on causes, not symptoms. | jacquesm wrote: | > But you should care more about the quality of the signals you | are monitoring than about the sensitivity of your thresholds. | | This is so true. Case in point: Growatt inverters have - like | every other inverter - a maximum voltage on the grid connection | at which they will shut down. They're pretty trigger happy | about this and fail to take into account the resistance of the | feed wire of the inverter to the (much lower impedance) grid | hookup. As a result even on cabling sized properly for the | interconnect they tend to falsely trigger well before the point | where they should. The only way to avoid this problem is to | either hack into the inverter somehow (which I've so far failed | to do) or to use oversized cables (which isn't always an | option). | | The sensitivity is fantastic, the quality of the signal is | hopeless. Obviously they err on the side of caution but the | margin is so ridiculously large that you end up losing a lot of | usable power for no reason at all. At least it should allow for | either a resistance for the interconnect to be specified so | that it can take into account the voltage drop across that | wire, which at 10A is appreciable for even short runs of fairly | beefy cable. ___________________________________________________________________ (page generated 2023-04-11 23:00 UTC)