[HN Gopher] What Went Wrong? ___________________________________________________________________ What Went Wrong? Author : headalgorithm Score : 154 points Date : 2021-07-17 11:34 UTC (11 hours ago) (HTM) web link (queue.acm.org) (TXT) w3m dump (queue.acm.org) | nixpulvis wrote: | I would gladly work for a prolific IRB. | MrStonedOne wrote: | In washington state, the state superior court ruled the police | department was not liable for the impound fee paid by somebody | who had their car impounded for 90* days for driving on what the | computer reported was a suspended license, because they are | exempt from mistakes from trusting their own computer system. | | This was the second time the department had wrongfully impounded | his car and they made no attempt to fix the mistake from the | first time, this didn't impact the ruling. | | Its gonna get much worse before it get any better. | spaetzleesser wrote: | It will get much worse I think. More and more companies are | hiding behind algorithms and other computer systems while | cutting support staff. If you are wronged you have nobody to | talk to and they make no effort to correct the situation. the | only recourse is a lawsuit which is way too expensive for most | people. And even when they are caught the fines are usually | only nominal. | | I think we are building up the ultimate faceless bureaucracies. | [deleted] | verytrivial wrote: | I agree with nearly everything in this artictle but the following | question stumped me: when exactly would a software disaster | investigation board be employed? | | Plane goes down, train goes off rails or passes signal at danger, | easy. But at what exact what point did the UK postmaster system | "fail" enough for an investigation? | andersource wrote: | I would say at latest when people convicted because of it had | their names cleared - | https://www.bbc.com/news/business-56859357 | ashton314 wrote: | > Personal information is the helium of IT systems--it leaks out | of every crack or imperfection faster than seems possible. | | Might as well call it the _hydrogen_ of IT systems--get too much | of it concentrated in one place, and all it takes is one little | spark for it all to go up in flames. Boom! | dgb23 wrote: | Large amounts of money spent on government systems that never | ship is a tragedy, but software projects like these tend to have | a lot of open questions. | | We understand software development often as a discovery process | (evolving requirements), especially if they are large or | disruptive. So one critical output of any such project has to be | knowledge that can be built upon, as in open, clearly specified | and written papers. This should be done regardless of whether it | failed or didn't fail. | openthc wrote: | In Washington State we have a system to track cannabis, the | enforcement officers are supposed to be able to get reports from | this system. The system is super buggy and also doesn't have | meaningful reports. So there is a secondary system for officers | to export to Excel documents. In one of the trainings they've | been instructed to look for anomalies -- not real analisys, not | even a pivot table. One thing they find is "negative quantities" | -- but how can that be? (hint: it's bugs in the tracking | software). Then enforcement shows up at the cannabis business to | audit these negative numbers (or demand the business try to | correct the data (which they cannot due to bugs)). | | So, crappy software gets law enforcement officers to basically | review data "anomalies" created by bugs by visiting a business. | The second most expensive method for data sanatization I can | imagine. It's a poor use of their time and disruptive to the | business. | | The system in WA is so buggy that the agency has opted to freeze | the software rather than try to fix the issues. The future of | government software is bleak -- so long as they keep using closed | source packages from low-cost bidders. | laurent92 wrote: | Why isn't all software created for the government required to | be open-source? Would that really drive the costs up, if the | providers don't have the choice? | openthc wrote: | The vendor claimed that if the code was out it would be a | security risk. The agency claims the vendor needs to protect | their intellectual property rights. We have (some) visibility | into other things our taxes pay for -- the software should | absolutely be one -- expecially the regulatory compliance | ones that drive enforcement action. | | Edit: also, they were breached anyway shortly after launch | (2018) and then an email went around offerting to sell the | code and data from their entire system. | foobiekr wrote: | Part of my job is to help the executives that I report to | understand why things went wrong from the security perspective in | our business unit. These are purely internal discussions, not | even investigations. There are no penalties, but really, for | things as egregious as hard coded passwords. As will become clear | in a moment, the fact that my executives care is quite unusual. | | Culturally the result is coverups and lies. | | Engineers lie, managers lie, test people lie, directors lie, | senior directors lie, vice president lie, external interesting | teams are negotiated into minimizing certain critical failures, | and so on. Managers don't want to hear it so that they can't be | accused of lying, vice presidents don't wanna know, SVP's just | want green squares on the cross-BU PowerPoint. | | This is internal discussion of revenue impacting incidents. Do | you know what executives do care about? Revenue. Lost deals. If | the people who care about money, including the account teams, | don't care about security and severe quality issues enough to be | honest enough to get to improvement, how could an external board | accomplish anything for those very few incidents that actually | become publicly visible? | | This isn't like the NTSB; I spent my life reading NTSB accident | reports. They have actual real authority, there are potentially | issues that might impact someone more than being caught | distorting things. | slyall wrote: | I think you are overestimating the importance of "revenue | impacting incidents" to company employees. | | If the company makes a couple of million extra or less this | year it doesn't effect the majority of workers. Their bonus | isn't going up or down etc. And remember this incident has | already happened. | | By contrast if a report comes out blaming the loss on a worker, | department or division then that could have major consequences. | No matter how "blameless" it is, come next round of bonuses, | promotions or layoffs everybody knows it'll be factored into | the decisions. | | So people don't have an incentive to make themselves look bad | and unlike with the NTSB there is no legal powers or fear of | causing deaths behind the investigation. | foobiekr wrote: | I didn't say "employees" so much as "executives"; and the | executives I'm referring to go beyond owning P&L. They | actually do care about revenue, which is why everyone lies to | them. | laurent92 wrote: | I understand, but it sounds like we are digging ourselves | into the same hole as USSR workers who were not incentivized | to deliver working products, when we do that. It's a | civilizational peril. How do we solve cooperation at large | scale? Is the only way to watch large companies accumulate | bored employees and constantly recreate "the small guy", the | startup, which will finally make things right, until they | become too big to be incentivized? | izacus wrote: | I also wonder if "blameless postmortem" culture perhaps | actively works against preventing these kind of incidents. It | doesn't seem that anyone in IT is ever responsible for damage | they cause. | | But yes, lying, "not seeing" and covering documentation is | pretty much standard corporate behaviour I've seen around | plenty of companies as well. | nanis wrote: | In my negative experiences, "blameless" turned in to "nobody | did anything wrong" which, of course, undermines the whole | point of finding out what actually happened so we can see if | there is a thing we can do to reduce the likelihood of it | happening again. | | Sometimes, the root cause is indeed someone with the | privilege but not the good sense ignoring warning signs. If | we can't identify that problem, then we can't improve our | odds for the next time. | foobiekr wrote: | I no longer believe in blameless post mortem as a general | rule. I have, through experience, come to believe that the | contexts where blameless post mortems work are the contexts | where literally anything works because they are organizations | that have high hiring bars and high expectations. My current | employer is not one of them; we are a mountain of mediocrity | and all blameless post mortems do is act as an excuse to | avoid raising the bar. | _jal wrote: | > and all blameless post mortems do is act as an excuse to | avoid raising the bar | | "Well, there's your problem, right there." | | The entire point of doing blameless post-mortems is to | correctly identify problems for resolution. If management | doesn't drive changes in response (process, training, | communication, whatever), you have a different problem to | solve before they'll do any good. | jolux wrote: | The principle of blameless postmortems is not supposed to | absolve anyone of the responsibility to change anything, | it's supposed to foreground that serious failures are | organizational failures first and foremost, because it's | the organization that has an obligation not to fail, not | individuals, who fail all the time as a rule. | torgard wrote: | A post-mortem should not necessarily blame the individual, | but blame the circumstances the individual finds themselves | in. | | Yes, a hard-coded password is bad practice. But does the | company have a bad culture of keeping configs in repos? | Maybe management thinks it easier to commit configs with | sensitive data, than to set up proper deployment shit. And | after all, the repos are private, so it should be fine | yeah? | | Bad code ending up in production is something you'll see | often. Does the company have nice test suites for | everything? Continuous integration pipelines? E2E tests? Or | is upper management pushing everyone to their limits, | because "fuck it ship it"? | Scoundreller wrote: | > In 2017 the motor of an airplane exploded over the southern | part of the Greenland icecap. Part of the engine landed on the | ice while the plane continued to the first suitable airport way | up north in Canada. | | eh, Happy Valley-Goose Bay isn't that far north as far as Canada | goes. 53 degrees north. | | The actual droppings in Greendland were around 61 degrees N. | | Nuuk would have been ~60% closer, but not a chance it could | handle an A380. | ithkuil wrote: | Well written article | ldarby wrote: | It's known what went wrong, computerphile has a video with some | details: https://www.youtube.com/watch?v=hBJm9ZYqL10 but it | doesn't address any of the judicial and cultural fails, that's | what needs to be fixed. Software bugs are a fact of life, people | know this, except the judges in this case apparently. | HarryHirsch wrote: | Bugs are a fact of life because of sloppy practices. The | experience from SQLite is instructive, after a testsuite had | been written, matters improved immensely. | | Why was the testsuite written? Because it was in the list of | requirements from the client, aerospace standards demand that | every possible branch is covered by a test. | | We choose to write bad software. | II2II wrote: | One could argue that faults in the engineering and construction | are also a fact of life, yet that doesn't mean we excuse them | and it doesn't mean that assume that a failure is due to those | faults. Investigations are performed in order to ascertain the | truth. | | I think the authors comparison to the historical development of | trains is appropriate. Investigating IT failures wasn't as | important 50 years ago because IT infrastructure was not as | critical. Investigating IT failures today is critical because | the functioning of society depends upon it. | ChrisMarshallNY wrote: | I really enjoyed this. | | Like most things, it's a matter of scale. If a train derails, we | call in the NTSB, but they don't investigate car crashes. | | The issue that I see, is that the software industry seems to be | absolutely _obsessed_ with scale. Small applications are actively | sneered at. Go big, or go home. | | So that means that _every_ accident is a train wreck. | hamilyon2 wrote: | The industry fails to listen to lessons written in "Mythical man | month" - 50 years from now. Half of a century ago. Of course some | reports on why systems are being designed and coded poorly won't | change anything. We know why, we just ignored the knowledge to | the point of absurdity. | torgard wrote: | Companies could be held liable for gross misconduct. Although | GDPR is not exactly a shining example of IT regulation, I think | it's a good example of liability. | | Companies get fined for breaking GDPR. | | Governmental projects should have similar requirements in | place, and companies and people should be held accountable for | breaking them. | Scoundreller wrote: | Would also like to point out the fantastic videos created by the | US Chemical Safety Board: https://www.youtube.com/user/USCSB ___________________________________________________________________ (page generated 2021-07-17 23:00 UTC)