[HN Gopher] What Went Wrong?
       What Went Wrong?
       Author : headalgorithm
       Score  : 154 points
       Date   : 2021-07-17 11:34 UTC (11 hours ago)
 (HTM) web link (queue.acm.org)
 (TXT) w3m dump (queue.acm.org)
       | nixpulvis wrote:
       | I would gladly work for a prolific IRB.
       | MrStonedOne wrote:
       | In washington state, the state superior court ruled the police
       | department was not liable for the impound fee paid by somebody
       | who had their car impounded for 90* days for driving on what the
       | computer reported was a suspended license, because they are
       | exempt from mistakes from trusting their own computer system.
       | This was the second time the department had wrongfully impounded
       | his car and they made no attempt to fix the mistake from the
       | first time, this didn't impact the ruling.
       | Its gonna get much worse before it get any better.
         | spaetzleesser wrote:
         | It will get much worse I think. More and more companies are
         | hiding behind algorithms and other computer systems while
         | cutting support staff. If you are wronged you have nobody to
         | talk to and they make no effort to correct the situation. the
         | only recourse is a lawsuit which is way too expensive for most
         | people. And even when they are caught the fines are usually
         | only nominal.
         | I think we are building up the ultimate faceless bureaucracies.
       | [deleted]
       | verytrivial wrote:
       | I agree with nearly everything in this artictle but the following
       | question stumped me: when exactly would a software disaster
       | investigation board be employed?
       | Plane goes down, train goes off rails or passes signal at danger,
       | easy. But at what exact what point did the UK postmaster system
       | "fail" enough for an investigation?
         | andersource wrote:
         | I would say at latest when people convicted because of it had
         | their names cleared -
         | https://www.bbc.com/news/business-56859357
       | ashton314 wrote:
       | > Personal information is the helium of IT systems--it leaks out
       | of every crack or imperfection faster than seems possible.
       | Might as well call it the _hydrogen_ of IT systems--get too much
       | of it concentrated in one place, and all it takes is one little
       | spark for it all to go up in flames. Boom!
       | dgb23 wrote:
       | Large amounts of money spent on government systems that never
       | ship is a tragedy, but software projects like these tend to have
       | a lot of open questions.
       | We understand software development often as a discovery process
       | (evolving requirements), especially if they are large or
       | disruptive. So one critical output of any such project has to be
       | knowledge that can be built upon, as in open, clearly specified
       | and written papers. This should be done regardless of whether it
       | failed or didn't fail.
       | openthc wrote:
       | In Washington State we have a system to track cannabis, the
       | enforcement officers are supposed to be able to get reports from
       | this system. The system is super buggy and also doesn't have
       | meaningful reports. So there is a secondary system for officers
       | to export to Excel documents. In one of the trainings they've
       | been instructed to look for anomalies -- not real analisys, not
       | even a pivot table. One thing they find is "negative quantities"
       | -- but how can that be? (hint: it's bugs in the tracking
       | software). Then enforcement shows up at the cannabis business to
       | audit these negative numbers (or demand the business try to
       | correct the data (which they cannot due to bugs)).
       | So, crappy software gets law enforcement officers to basically
       | review data "anomalies" created by bugs by visiting a business.
       | The second most expensive method for data sanatization I can
       | imagine. It's a poor use of their time and disruptive to the
       | business.
       | The system in WA is so buggy that the agency has opted to freeze
       | the software rather than try to fix the issues. The future of
       | government software is bleak -- so long as they keep using closed
       | source packages from low-cost bidders.
         | laurent92 wrote:
         | Why isn't all software created for the government required to
         | be open-source? Would that really drive the costs up, if the
         | providers don't have the choice?
           | openthc wrote:
           | The vendor claimed that if the code was out it would be a
           | security risk. The agency claims the vendor needs to protect
           | their intellectual property rights. We have (some) visibility
           | into other things our taxes pay for -- the software should
           | absolutely be one -- expecially the regulatory compliance
           | ones that drive enforcement action.
           | Edit: also, they were breached anyway shortly after launch
           | (2018) and then an email went around offerting to sell the
           | code and data from their entire system.
       | foobiekr wrote:
       | Part of my job is to help the executives that I report to
       | understand why things went wrong from the security perspective in
       | our business unit. These are purely internal discussions, not
       | even investigations. There are no penalties, but really, for
       | things as egregious as hard coded passwords. As will become clear
       | in a moment, the fact that my executives care is quite unusual.
       | Culturally the result is coverups and lies.
       | Engineers lie, managers lie, test people lie, directors lie,
       | senior directors lie, vice president lie, external interesting
       | teams are negotiated into minimizing certain critical failures,
       | and so on. Managers don't want to hear it so that they can't be
       | accused of lying, vice presidents don't wanna know, SVP's just
       | want green squares on the cross-BU PowerPoint.
       | This is internal discussion of revenue impacting incidents. Do
       | you know what executives do care about? Revenue. Lost deals. If
       | the people who care about money, including the account teams,
       | don't care about security and severe quality issues enough to be
       | honest enough to get to improvement, how could an external board
       | accomplish anything for those very few incidents that actually
       | become publicly visible?
       | This isn't like the NTSB; I spent my life reading NTSB accident
       | reports. They have actual real authority, there are potentially
       | issues that might impact someone more than being caught
       | distorting things.
         | slyall wrote:
         | I think you are overestimating the importance of "revenue
         | impacting incidents" to company employees.
         | If the company makes a couple of million extra or less this
         | year it doesn't effect the majority of workers. Their bonus
         | isn't going up or down etc. And remember this incident has
         | already happened.
         | By contrast if a report comes out blaming the loss on a worker,
         | department or division then that could have major consequences.
         | No matter how "blameless" it is, come next round of bonuses,
         | promotions or layoffs everybody knows it'll be factored into
         | the decisions.
         | So people don't have an incentive to make themselves look bad
         | and unlike with the NTSB there is no legal powers or fear of
         | causing deaths behind the investigation.
           | foobiekr wrote:
           | I didn't say "employees" so much as "executives"; and the
           | executives I'm referring to go beyond owning P&L. They
           | actually do care about revenue, which is why everyone lies to
           | them.
           | laurent92 wrote:
           | I understand, but it sounds like we are digging ourselves
           | into the same hole as USSR workers who were not incentivized
           | to deliver working products, when we do that. It's a
           | civilizational peril. How do we solve cooperation at large
           | scale? Is the only way to watch large companies accumulate
           | bored employees and constantly recreate "the small guy", the
           | startup, which will finally make things right, until they
           | become too big to be incentivized?
         | izacus wrote:
         | I also wonder if "blameless postmortem" culture perhaps
         | actively works against preventing these kind of incidents. It
         | doesn't seem that anyone in IT is ever responsible for damage
         | they cause.
         | But yes, lying, "not seeing" and covering documentation is
         | pretty much standard corporate behaviour I've seen around
         | plenty of companies as well.
           | nanis wrote:
           | In my negative experiences, "blameless" turned in to "nobody
           | did anything wrong" which, of course, undermines the whole
           | point of finding out what actually happened so we can see if
           | there is a thing we can do to reduce the likelihood of it
           | happening again.
           | Sometimes, the root cause is indeed someone with the
           | privilege but not the good sense ignoring warning signs. If
           | we can't identify that problem, then we can't improve our
           | odds for the next time.
           | foobiekr wrote:
           | I no longer believe in blameless post mortem as a general
           | rule. I have, through experience, come to believe that the
           | contexts where blameless post mortems work are the contexts
           | where literally anything works because they are organizations
           | that have high hiring bars and high expectations. My current
           | employer is not one of them; we are a mountain of mediocrity
           | and all blameless post mortems do is act as an excuse to
           | avoid raising the bar.
             | _jal wrote:
             | > and all blameless post mortems do is act as an excuse to
             | avoid raising the bar
             | "Well, there's your problem, right there."
             | The entire point of doing blameless post-mortems is to
             | correctly identify problems for resolution. If management
             | doesn't drive changes in response (process, training,
             | communication, whatever), you have a different problem to
             | solve before they'll do any good.
             | jolux wrote:
             | The principle of blameless postmortems is not supposed to
             | absolve anyone of the responsibility to change anything,
             | it's supposed to foreground that serious failures are
             | organizational failures first and foremost, because it's
             | the organization that has an obligation not to fail, not
             | individuals, who fail all the time as a rule.
             | torgard wrote:
             | A post-mortem should not necessarily blame the individual,
             | but blame the circumstances the individual finds themselves
             | in.
             | Yes, a hard-coded password is bad practice. But does the
             | company have a bad culture of keeping configs in repos?
             | Maybe management thinks it easier to commit configs with
             | sensitive data, than to set up proper deployment shit. And
             | after all, the repos are private, so it should be fine
             | yeah?
             | Bad code ending up in production is something you'll see
             | often. Does the company have nice test suites for
             | everything? Continuous integration pipelines? E2E tests? Or
             | is upper management pushing everyone to their limits,
             | because "fuck it ship it"?
       | Scoundreller wrote:
       | > In 2017 the motor of an airplane exploded over the southern
       | part of the Greenland icecap. Part of the engine landed on the
       | ice while the plane continued to the first suitable airport way
       | up north in Canada.
       | eh, Happy Valley-Goose Bay isn't that far north as far as Canada
       | goes. 53 degrees north.
       | The actual droppings in Greendland were around 61 degrees N.
       | Nuuk would have been ~60% closer, but not a chance it could
       | handle an A380.
       | ithkuil wrote:
       | Well written article
       | ldarby wrote:
       | It's known what went wrong, computerphile has a video with some
       | details: https://www.youtube.com/watch?v=hBJm9ZYqL10 but it
       | doesn't address any of the judicial and cultural fails, that's
       | what needs to be fixed. Software bugs are a fact of life, people
       | know this, except the judges in this case apparently.
         | HarryHirsch wrote:
         | Bugs are a fact of life because of sloppy practices. The
         | experience from SQLite is instructive, after a testsuite had
         | been written, matters improved immensely.
         | Why was the testsuite written? Because it was in the list of
         | requirements from the client, aerospace standards demand that
         | every possible branch is covered by a test.
         | We choose to write bad software.
         | II2II wrote:
         | One could argue that faults in the engineering and construction
         | are also a fact of life, yet that doesn't mean we excuse them
         | and it doesn't mean that assume that a failure is due to those
         | faults. Investigations are performed in order to ascertain the
         | truth.
         | I think the authors comparison to the historical development of
         | trains is appropriate. Investigating IT failures wasn't as
         | important 50 years ago because IT infrastructure was not as
         | critical. Investigating IT failures today is critical because
         | the functioning of society depends upon it.
       | ChrisMarshallNY wrote:
       | I really enjoyed this.
       | Like most things, it's a matter of scale. If a train derails, we
       | call in the NTSB, but they don't investigate car crashes.
       | The issue that I see, is that the software industry seems to be
       | absolutely _obsessed_ with scale. Small applications are actively
       | sneered at. Go big, or go home.
       | So that means that _every_ accident is a train wreck.
       | hamilyon2 wrote:
       | The industry fails to listen to lessons written in "Mythical man
       | month" - 50 years from now. Half of a century ago. Of course some
       | reports on why systems are being designed and coded poorly won't
       | change anything. We know why, we just ignored the knowledge to
       | the point of absurdity.
         | torgard wrote:
         | Companies could be held liable for gross misconduct. Although
         | GDPR is not exactly a shining example of IT regulation, I think
         | it's a good example of liability.
         | Companies get fined for breaking GDPR.
         | Governmental projects should have similar requirements in
         | place, and companies and people should be held accountable for
         | breaking them.
       | Scoundreller wrote:
       | Would also like to point out the fantastic videos created by the
       | US Chemical Safety Board: https://www.youtube.com/user/USCSB
       (page generated 2021-07-17 23:00 UTC)