[HN Gopher] Car alarms and smoke alarms: the tradeoff between se...
       ___________________________________________________________________
        
       Car alarms and smoke alarms: the tradeoff between sensitivity and
       specificity
        
       Author : lngarner
       Score  : 52 points
       Date   : 2023-04-11 19:37 UTC (3 hours ago)
        
 (HTM) web link (blog.danslimmon.com)
 (TXT) w3m dump (blog.danslimmon.com)
        
       | rtkwe wrote:
       | It's a constant pain of mine to try to get people to stop having
       | business as usual or successfully completed $PROCESS emails come
       | out of our batch processes on our teams at work. They absolutely
       | drown my inbox so I'm forced to filter them then the actual
       | failures get buried in the unchecked "batch spam" folders.
        
         | hevans66 wrote:
         | My pet peeve is these $PROCESS notifications that go to slack
         | channels. I worked at a company that had an #engineering_humans
         | slack channel because we got chased out of #engineering by
         | bots.
        
           | justin_oaks wrote:
           | I'm fine if they go to THEIR OWN slack channel. Then I can
           | mute or leave that channel.
           | 
           | Of course, it's a different problem if those notifications
           | have a mix of actionable and non-actionable messages (e.g.
           | both success and error messages). Then it's a signal/noise
           | problem.
        
         | WrtCdEvrydy wrote:
         | The one that pushes buttons is the alarms that have no docs
         | attached so when they blow off at 2AM, they just get muted
         | until someone comes in and complains at 6AM.
        
         | justin_oaks wrote:
         | I had a boss who had an inbox with literally hundreds of
         | thousands of unread emails. A good chunk of those emails were
         | "success" messages from batch processes.
         | 
         | It's quite correct to send a "success" message when a batch
         | process is completed successfully, but it's quite wrong to send
         | that message to a human. It should be sent to a machine that
         | should translate a missing success message into an error
         | message/alert for humans to respond to.
         | 
         | For example, I have a set of nightly backup jobs. The last step
         | of each backup process is to send a success message to my
         | monitoring system. I only get a "Missing Backup" alert when the
         | monitoring system detects that it didn't receive the success
         | message it expected for a particular backup.
         | 
         | My old boss didn't seem to understand the concept that people
         | don't generally notice missing messages. Or he was too
         | lazy/incompetent to use a monitoring system that could
         | translate gaps in successes into errors.
        
       | sammalloy wrote:
       | The entire car alarm industry is a scam, promoted by Republican
       | congressman Darrell Issa. It has seriously disrupted our lives in
       | every way imaginable and has drowned out the beauty of nature. I
       | can't think of a single car that has been protected by a car
       | alarm since they were invented. They are useless and should be
       | banned for the health and safety of mitigating noise pollution.
        
         | GuB-42 wrote:
         | > I can't think of a single car that has been protected by a
         | car alarm since they were invented.
         | 
         | Many insurance companies offer lower premiums if you install a
         | car alarm. So I guess they work at least a little, otherwise
         | they wouldn't lower their premiums.
         | 
         | It may not actually stop a thief, but it may get a thief to
         | chose a car that doesn't have an alarm, or maybe it is just a
         | correlation, but there is at least something.
         | 
         | Still, I think they should be made illegal, they are a
         | nuisance, there are already laws against making excessive noise
         | and car alarms should be included. And if they create an arms
         | race, by getting thieves to prefer cars without alarms, that's
         | even more reason to ban them.
        
         | BalinKing wrote:
         | > It has seriously disrupted our lives in every way imaginable
         | 
         | I assume this is one of those things that changes dramatically
         | based on where you live--for me (western US), this statement
         | seems almost comically exaggerated.
        
           | dahfizz wrote:
           | Yeah, I can't remember the last time I heard a car alarm.
        
             | birdyrooster wrote:
             | Visit a city
        
               | Eji1700 wrote:
               | I live in one with more than 2 million people and it's
               | something that I have to think hard about to remember the
               | last time I heard one go off.
               | 
               | Longer if I have to think of one that went off and wasn't
               | some form of 'oh shit oh shit oh shit, wrong button'
               | reaction from the person trying who accidentally turned
               | it on.
        
             | tayo42 wrote:
             | I am jealous, I hear atleast one everyday. I live in an
             | apartment complex in a suburb. I get woken up by them
             | sometimes too
        
         | jimbob45 wrote:
         | For poor people whose ability to live depends on having a car,
         | car alarms must be at least _sort_ of useful to know if your
         | car is being stolen at night. I'm sure they're just a noisy
         | inconvenience to the wealthy though.
        
           | izacus wrote:
           | Does the alarm ever prevent theft?
        
             | thrashh wrote:
             | If I'm living in an area where they don't go off often
             | (like right now) and a car alarm woke me up, I would
             | definitely check.
             | 
             | And I imagine if I triggered a car alarm, I would back off.
        
       | yafbum wrote:
       | I'd like to know more about the chip designer who, perhaps
       | unwittingly, created the alarm-filled soundscape of most American
       | cities https://youtu.be/tmCnleSBAIg. Would love to know more
       | about the composition process that went into it.
        
       | tra3 wrote:
       | I need to sit down and go through the math again, I got lost in
       | the middle somewhere. All I know is our alerts are way too noisy
       | now to the point where they are useless.
        
         | nh23423fefe wrote:
         | I dunno, article doesn't seem to want me to understand. It's
         | just another, "here's a random stats calculation you cant
         | perform in your head, isnt the english a bad way to describe
         | this calculation?!?!? your intuition sucks when i dont explain
         | myself....."
        
         | sparrish wrote:
         | Alert fatigue... it's common when alerts are non-actionable and
         | it causes a lot of downtime.
        
           | bluGill wrote:
           | We write up stories to fix them, and upper management tracks
           | progress on completion so they are not buried in the backlog.
        
         | hevans66 wrote:
         | Yes! This. This has happened to me at least two previous
         | companies I have worked at. Everybody sets up thresholds on
         | every possible Datadog metric and alerts become useless. That's
         | part of the ethos of monitoring at my current company. We only
         | set up alerts through https://heiioncall.com/ that we are
         | convinced you absolutely need to look at right now. Anything
         | that is not that gets shoved to a slack channel (that I have
         | long since muted).
        
       | Syonyk wrote:
       | Now, if you're annoyed by the false positive rate on your _actual
       | smoke alarms,_ go replace the one nearest your kitchen with a
       | photoelectric type, not the standard ionization type that 's
       | cheaper, the default style installed, and ought to be illegal in
       | homes (IMO).
       | 
       | There's been quite a bit of research done, generally easy to find
       | if you look, that talks about the difference and tests them, but
       | the short summary:
       | 
       | - Ionization type sensors detect the products of fast flaming
       | combustion and "things cooking in the kitchen." Your oven, if a
       | bit dirty, will reliably trip an ionization type. They are quick
       | on the draw for this. The downside is that they're very, very
       | poor at detecting the sort of slow, smoking, smoldering
       | combustion that is associated with house fires that kill people
       | in the middle of the night.
       | 
       | - The photoelectric type is very good at detecting smoke in the
       | air - but it isn't nearly as prone to false triggers on ovens, a
       | burner burning some spills off, etc.
       | 
       | They've been A/B tested in a wide variety of conditions, and in
       | some cases, the ionization type is a bit quicker. In other cases,
       | the ionization type is slower, by time ranges north of _half an
       | hour_ - I 've seen some test reports where there was a 45 minute
       | gap, while the photoelectric type was going off, before the
       | ionization type fired!
       | 
       | In general, "rapid fires during the day" are somewhat destructive
       | to property, but rarely kill people. If your kitchen catches on
       | fire while you're cooking, it may burn the house down, but
       | generally people are able to get out.
       | 
       | The fires that kill people are "slow starting fires during the
       | night" - the sort that smolder for potentially hours, often
       | slowly filling the house with toxic smoke, before actually
       | bursting into open flames. On this sort of fire, the
       | photoelectric type will fire long, long before the ionization
       | type - in some cases, they get around to alarming quite literally
       | "after the occupants are dead from the smoke."
       | 
       | Using smoke alarms as a way to talk about monitoring systems is
       | nice, but in terms of actual smoke detectors, get at least a few
       | photoelectric sorts in the main areas of your home.
       | 
       | Do _not_ get the  "combined sensor" sort, since these tend to be
       | and-gated and the worst of both worlds.
       | 
       | Edited to add some resources:
       | 
       | A presentation on the matter from a while back by one of the
       | experts in this field:
       | https://wahigroup.com/Resources/Documents/Ion%20vs%20Photo%2...
       | 
       | Another paper: https://www.semanticscholar.org/paper/Detection-
       | of-Smoke-%3A...
       | 
       | > _Full-scale fire tests are carried out to study the
       | effectiveness of the various types of smoke detectors to provide
       | an early warning of a fire. Both optical smoke detectors and
       | ionization smoke detectors have been used. Alarm times are
       | related to human tenability limits for toxic effects, visibility
       | loss and heat stress. During smouldering fires it is only the
       | optical detectors that provide satisfactory safety. With flaming
       | fires the ionization detectors react before the optical ones. If
       | a fire were started by a glowing cigarette, optical detectors are
       | generally recommended. If not, the response time with these two
       | types of detectors are so close that it is only in extreme cases
       | that this difference between optical and ionization detectors
       | would be critical in saving lives._
        
         | bluGill wrote:
         | The law requires you have both types of good reason. Either
         | alone will detect less than half of all house fires.
         | 
         | Dual sensors are not and gated. While nobody will admit what
         | algorithm they use, they detect most fires unlike the single
         | sensor type.
        
           | riceart wrote:
           | Lol "the law" .. what law? Maybe in some dumb ass
           | jurisdiction - but you're a bit full of yourself if you think
           | where you happen to live is "the law".
        
             | bluGill wrote:
             | Us fire code, though inspectors often don't check.
        
               | Syonyk wrote:
               | As far as I can tell, it's state by state, so... you want
               | to cite some sources?
        
           | Syonyk wrote:
           | Where does the law require both types? I'm not aware of any
           | housing codes specifically requiring photoelectric types, and
           | any house I've looked at, including mine, came with purely
           | ionization types. Though it's been a few years, and it may
           | have changed recently - this is less of a niche concern
           | lately.
           | 
           | As for dual sensors and gating... do you actually trust your
           | life to "nobody will admit what algorithm they use"?
           | 
           | My house has all the smoke detectors wired together (they're
           | on an AC circuit, with battery backup, with a signal line
           | running between them all), so I have some photoelectric and
           | some ionization, depending on where in the house they are.
        
       | compumike wrote:
       | I do like how the author presents the case for how damaging
       | false-positives can be in SRE monitoring. But, FYI, it can get
       | worse if these monitors are hooked to self-actuating feedback
       | loops! I recently wrote about a production incident on the Heii
       | On-Call blog, in the context of witnessing how Kubernetes
       | liveness probes and CPU limits worked together to create a self-
       | reinforcing CrashLoopBackOff. [1] Partially because the liveness
       | probe thresholds (timeoutSeconds and failureThreshold fields)
       | were too aggressive.
       | 
       | We have a similar message about setting monitoring thresholds in
       | our documentation [2] because users have to explicitly specify a
       | downtime timeout before they're alerted about their website / API
       | endpoint / cron job being down. The timeout / "grace period" is
       | necessary because in many cases a failure is some transient
       | network glitch which will fix itself before a human is alerted.
       | 
       | If you make the timeout too short, you'll get lots of false
       | positive alerts, and as the article says, your on-call engineers
       | will be overwhelmed or just start ignoring the alerts.
       | 
       | If you make the timeout too long, it just takes that many minutes
       | of downtime longer before you find out about it.
       | 
       | It may sound counterintuitive, but the latter is usually
       | preferable. :)
       | 
       | [1] https://heiioncall.com/blog/kubernetes-liveness-probes-
       | and-c...
       | 
       | [2] https://heiioncall.com/docs
        
       | raldi wrote:
       | When the oncall gets paged, an SLO should be in jeopardy in a way
       | that requires immediate measures to be taken by a well-trained
       | human as described in actionable terms in a linked playbook.
       | 
       | No SLO in jeopardy, or no immediate measure that needs to be
       | taken? Don't page the oncall; send a low-priority ticket for the
       | service owner to investigate the next business day.
       | 
       | Steps need to be taken, but they're mechanical in nature or
       | otherwise don't give the SRE an opportunity to exercise their
       | brain in an interesting fashion? Replace the alert with an
       | automated handler that only pages the oncall if it encounters an
       | exception.
       | 
       | No playbook, or the playbook consists of useless non-actionable
       | items like, "This alert means the service is running out of
       | frobs"? Write a playbook that explains what the oncall is
       | expected to _do_ when the service needs frobs.
       | 
       | Edit: A dead reply asks if I've ever experienced a novel
       | incident. Of course. Say, for instance, a "This should never
       | happen" error-level log is suddenly happening like crazy, for the
       | first time ever. In that case, you page the oncall, they do their
       | best to debug it, see if they can reach the SWE service owners,
       | read through the code to see if it could be an indicator that
       | SLOs are being violated (e.g., user data corruption) or might be
       | violated soon, and then write a stub playbook to be fleshed out
       | the next business day, probably alongside a code change to handle
       | this situation without spamming the logs so much.
        
         | matthew9219 wrote:
         | [dead]
        
         | fatnoah wrote:
         | In a previous life as a full-stack Engineer at a startup, this
         | was my white whale. The state of logging, monitoring, and
         | alerting was such that signal quality was low, and only
         | indirect observations of the system were possible since the
         | logging was borderline useless. The result was multiple pages
         | per night, with each one resulting in a scavenger hunt because
         | signal was so low that it was nigh impossible to even identify
         | what playbook to run.
         | 
         | For example, the web application crashing was logged as a DEBUG
         | statement, but starting was logged at an ERROR level. This was
         | clearly done at some point because DEBUG generated far too much
         | log info w/millions of active users, but some Engineer wanted
         | to know that the app started. Gross.
         | 
         | I solved for this by doing a couple things. The first was to
         | define standards for log levels, ability to correlate log
         | statements with each other for a given request, and to define
         | the level of context a "proper" log level should provide.
         | 
         | For example, FATAL = there's no way anything can work properly.
         | These are pretty rare, but incorrect configuration values were
         | a common culprit. ERROR indicates something, possibly transient
         | going wrong. Every now and then, not a big deal that can wait
         | until later, but a rapid accumulation could mean something more
         | serious is going on. INFO contained information about the state
         | of the system, such as general measures of activity and other
         | signals to indicate the system is working as expected. Most of
         | our metrics capture was instrumented based off these
         | statements.
         | 
         | In terms of the messages, we rapidly evolved the quality of the
         | messages. For something like the aforementioned configuration
         | error, the system initially just spat out an "Unexpected error"
         | and a module name. The first improvement then stated something
         | like "invalid configuration value" and finally we ended up on a
         | message that stated the value was incorrect, identified which
         | configuration value was wrong, and had a code that referenced
         | documentation and escalation owner.
         | 
         | When all was said and done, we'd reduced our downtime from
         | hours per year to less than 5 minutes, eliminated over 95% of
         | our pages, and reduced escalations to Engineering from several
         | days per week to a level where it was hard to remember the last
         | one.
         | 
         | As the head of Engineering, I had to fight an uphill battle
         | against the product & sales team for almost a year to make all
         | of this happen, but I was fully vindicated when we were
         | acquired and our operational maturity was lauded during the due
         | diligence process.
        
           | peteradio wrote:
           | You know all that work was worth it when you get a good
           | lauding.
        
       | yamtaddle wrote:
       | > When presented with this tradeoff, the path of least resistance
       | is to say "Let's just keep the threshold lower. We'd rather get
       | woken up when there's nothing broken than sleep through a real
       | problem." And I can sympathize with that attitude. Undetected
       | outages are embarrassing and harmful to your reputation. Surely
       | it's preferable to deal with a few late-night fire drills.
       | 
       | > It's a trap.
       | 
       | > In the long run, false positives can -- and will often -- hurt
       | you more than false negatives. Let's learn about the base rate
       | fallacy.
       | 
       | Not sure about anyone else, but speaking of alarms, this style of
       | writing trips my "self-promoting snake-oil Internet bullshitter"
       | alarm. It's like nails on a damn chalkboard, and if you're
       | writing like this, you've already lost me; however, maybe I ought
       | not be pointing that out, since signals are nice to have.
       | 
       | Incidentally, I wasn't sure which way the author was gonna go
       | with the core analogy. My smoke alarms have false-alarmed
       | probably 10x as much as my car alarm, even counting times one of
       | us has hit the alarm button on the fob by accident. I've
       | certainly never been so annoyed by my car alarm that I've ripped
       | it out and stuck it in a freezer, as I have with a smoke alarm.
       | 
       | (If I were writing like the author I suppose that last part would
       | have read:
       | 
       | "I've certainly never been so annoyed by my car alarm that I've
       | ripped it out and stuck it in a chest freezer.
       | 
       | I have, with a smoke alarm."
       | 
       | Except also I'd have found a way to use "we" and "you" a bunch.)
        
         | raldi wrote:
         | What do you mean by "this style of writing"? What aspects of
         | the quote do you object to?
        
           | jacquesm wrote:
           | At a guess the bit 'let's learn about the base rate fallacy'.
        
             | yamtaddle wrote:
             | Short, choppy sentences, lots of second-person, dropping a
             | "punch-line" sentence to its own paragraph like they're a
             | fucking magician revealing the card you pulled earlier.
             | It's some kind of cross between transparent rapport-
             | building sales-psychology crap and setting off a fireworks
             | display to celebrate your successfully assembling a PB&J.
             | 
             | Like listening to a used car salesman tell a mundane story
             | about their morning commute.
             | 
             | But full of unearned and over-the-top dramatic pauses.
        
         | burnished wrote:
         | Im not sure what you are responding to in the quoted text but
         | after reading the article I think I can assure you that the
         | author isnt selling you anything more salacious than you would
         | find in a more interesting introduction to probability and
         | statistics lecture.
        
         | quickthrower2 wrote:
         | I see a lot of this style of writing in articles submitted on
         | HN. I think they are just trying to make the writing more
         | lively, not trying to BS.
         | 
         | A trope of this style is "{interesting half story} but more on
         | that later".
         | 
         | I don't think it is a big deal and I don't see much self
         | promotion here other than vanilla blogging, i.e. sounds like
         | this person is knowledgeable let's check their bio.
        
       | gmuslera wrote:
       | Some complementary reading could be My Philosophy on Alerting (
       | https://docs.google.com/document/d/199PqyG3UsyXlwieHaqbGiWVa... )
       | and https://how.complexsystems.fail/
       | 
       | In any case, not all signals are the same. Most systems have a
       | lot of components interacting and what turns to be dangerous is
       | usually a combination of factors, but in the end, what defines
       | that it was or not is that the system is doing what it should.
       | You can put some guessing thresholds, but you must contrast it
       | with that the system works.
       | 
       | And they should be actionable too, at least for alerts instead of
       | slow day notifications, or metrics giving context to perceived
       | problems that could take out the guessing from the thresholds.
        
       | mertd wrote:
       | The post is somewhat incomplete without also discussing the cost
       | of the wrong decision.
       | 
       | You obey the smoke alarm because the cost of ignoring the alarm
       | when it is a true positive is potentially infinite (you die). You
       | ignore the car alarm because (1) most likely it is a false
       | positive but also (2) most likely it is somebody else's car.
        
       | [deleted]
        
       | dfox wrote:
       | Smoke(/fire in general) alarms are not a good example of a thing
       | with high specificity. You perceive it that way, but what you see
       | is the result of somebody getting paged about it and then
       | checking (preferably physically, but also through eg. CCTV)
       | whether there really is an emergency situation and canceling the
       | alarm before its escalation timeout. Apparently, for typical
       | commercial building false fire alarms are more or less an weekly
       | occurrence.
       | 
       | Edit: in large scale fire alarm systems there also are rules
       | about combinations of triggered sensors that cause immediate
       | escalation (if there is smoke and elevated temperature in two
       | adjacent zones, it probably is not a false alarm and such things,
       | often it even takes into account the failure modes of the
       | physical alarm loop wiring). This is an interesting idea for IT
       | monitoring: page someone only when multiple metrics indicate an
       | issue.
        
         | tobyjsullivan wrote:
         | It was an interesting example and maybe deserved a few more
         | caveats to actually serve the point. After all, we've all heard
         | a fire alarm of some sort in the past year (if not the past
         | month) but how many were actual fires? (Technically the author
         | said smoke which helps but not really.)
         | 
         | Where I was expecting the author to go:
         | 
         | - Clearly was talking about residential smoke detectors, not
         | commercial. That could have been explicit.
         | 
         | - Smoke detectors do have a high false-positive rate but almost
         | always at the _right time_. A home smoke alarm going off while
         | I 'm cooking is quite different to a smoke alarm going off when
         | I'm sleeping. To the author's point, there are very few false
         | positives while I'm sleeping so when they happen, I'm getting
         | up.
         | 
         | Speaking of the commercial context, I wonder what sort of
         | businesses would get a lot of false alarms and how that varies
         | across industries.
        
       | cbarrick wrote:
       | I think this article is missing the forest for the trees.
       | 
       | The article is about finding the appropriate sensitivity of
       | alerts on some signal in order to maximize the predictive value.
       | 
       | But you should care more about the quality of the signals you are
       | monitoring than about the sensitivity of your thresholds.
       | 
       | The article mentions load-average as an example signal, but to
       | me, that's a poor signal to monitor. Instead, if your SLO is
       | defined for error rate, alert on error rate.
       | 
       | Alerts on your SLO will have a high predictive value for
       | predicting violations of your SLO, by definition. The tunable
       | parameter here is the time window, not the threshold. E.g. if
       | your error budget is defined for a 30d window, you may want
       | alerts at the SLO threshold for 24h and 1h windows.
       | 
       | Alert on causes, not symptoms.
        
         | jacquesm wrote:
         | > But you should care more about the quality of the signals you
         | are monitoring than about the sensitivity of your thresholds.
         | 
         | This is so true. Case in point: Growatt inverters have - like
         | every other inverter - a maximum voltage on the grid connection
         | at which they will shut down. They're pretty trigger happy
         | about this and fail to take into account the resistance of the
         | feed wire of the inverter to the (much lower impedance) grid
         | hookup. As a result even on cabling sized properly for the
         | interconnect they tend to falsely trigger well before the point
         | where they should. The only way to avoid this problem is to
         | either hack into the inverter somehow (which I've so far failed
         | to do) or to use oversized cables (which isn't always an
         | option).
         | 
         | The sensitivity is fantastic, the quality of the signal is
         | hopeless. Obviously they err on the side of caution but the
         | margin is so ridiculously large that you end up losing a lot of
         | usable power for no reason at all. At least it should allow for
         | either a resistance for the interconnect to be specified so
         | that it can take into account the voltage drop across that
         | wire, which at 10A is appreciable for even short runs of fairly
         | beefy cable.
        
       ___________________________________________________________________
       (page generated 2023-04-11 23:00 UTC)