[HN Gopher] Incident: Airbus A330 at Taipei, primary computers f...
       ___________________________________________________________________
        
       Incident: Airbus A330 at Taipei, primary computers failed on
       touchdown (2020)
        
       Author : akamaka
       Score  : 235 points
       Date   : 2021-09-05 14:12 UTC (8 hours ago)
        
 (HTM) web link (avherald.com)
 (TXT) w3m dump (avherald.com)
        
       | sinuhe69 wrote:
       | Remind me of the Ariane 5 maiden flight: redundant hardwares does
       | not protect against unsanitized inputs. I don't know much about
       | the design of the airbuses, but I think it's sensible to specify
       | the last primary computer running in a different mode as the
       | first two, simply because double hardware failure is very very
       | unlikely. So it boils down to either software bug and or
       | unsanitized inputs. Assuming that, the last computer should
       | provide some basic but safe functions.
        
         | strictfp wrote:
         | I've seen bad user input, one very unusual char, bring down a
         | whole self-healing geographically redundant cluster of machines
         | running an expensive commercial cluster software. Any connected
         | devices can bring each other down.
        
           | salawat wrote:
           | Story, story, story, story!
        
           | andrewnicolalde wrote:
           | That sounds like a great war story if you're open to telling
           | it :)
        
         | avianlyric wrote:
         | Well that's kind of what happened. The primary flight computers
         | all failed, to the plane fell back to "direct law", which means
         | pilot input is directly transmitted to control systems without
         | a computer attempting to interpret intent or protect against
         | pilot error.
         | 
         | As a consequence a lot of the more advanced functions became
         | disabled because they rely on the primary flight computer to
         | indicate when it's safe for them to operate. Presumably because
         | accidental operation of those features in flight is extremely
         | dangerous.
        
       | the_duke wrote:
       | Odd that there is no information from Airbus there.
       | 
       | FYI: the comment section contains some interesting background.
        
         | zymhan wrote:
         | Well there is a single bullet point from/about Airbus
         | 
         | > 7. Following the occurrence, Airbus reviewed its in-service
         | experience, and confirmed that no other triple PRIM fault at
         | touchdown event had been reported on A330/A340 aircraft family
         | since entry into service. The A330/A340 fleet fitted with
         | electrical rudder has accumulated 8.7 millions of Flight Cycles
         | and 44.3 millions of Flight Hours (in-service data from April
         | 2020).
        
           | gswdh wrote:
           | AKA, it's only happened once so we don't care...
        
           | thomasjudge wrote:
           | So, does this mean that the outcome was, Airbus reviewed the
           | situation and the data, and made no changes or corrections to
           | the system?
        
             | thefurrysquid wrote:
             | There will be a software update.
             | 
             | Read page 9 of the exec summary linked in the article.
        
       | ceejayoz wrote:
       | > The crew applied maximum manual braking and managed to stop the
       | aircraft 10 meters/33 feet ahead of the runway end (runway length
       | 2600 meters/8530 feet).
       | 
       | Hope the crew had their brown pants on.
        
         | encoderer wrote:
         | From the cockpit, 33' must have looked right under their nose.
        
       | bryan0 wrote:
       | The 2020 should be removed from the title. The article is mostly
       | new information from the final report:
       | 
       | > On Sep 3rd 2021 Taiwan's ASC released their final report in
       | Chinese and their English Executive Summary concluding the
       | probable causes of the incident were: ...
        
       | 5faulker wrote:
       | Good thing that this was on a smaller airplane, otherwise it
       | could have been a larger disaster.
        
       | FatalLogic wrote:
       | The flight data recorder chart on page 114 of the Chinese-
       | language report[0] seems to record that the pilots hit the brakes
       | very hard in the last few seconds, maybe when they realized they
       | were close to overrunning the runway.
       | 
       | That caused a sudden peak of -0.47g deceleration, so it looks as
       | if they could have used the brakes to slow the plane earlier on
       | the runway, but maybe they didn't because they were expecting
       | they could sort out the problem with their spoilers and reverse
       | thrust and stop it in the normal way, without giving the
       | passengers a shock.
       | 
       | [0]https://www.ttsb.gov.tw/media/4912/ci-202-final-
       | report_chine...
       | 
       | edit: that chart's x-axis is the aircraft's position on the
       | 8500ft-runway, and most of the recording traces seem to begin at
       | the point when it had touched down finally with all wheels
        
         | ysangkok wrote:
         | > hit the brakes very hard in the last few seconds
         | 
         | In direct law, is it possible to brake too hard and have the
         | tires skid on the runway? Is it possible that the pilots were
         | afraid of this?
         | 
         | I ask because people are characterizing direct law as "less
         | smart". So how smart is it? Smart enough to include ABS?
        
           | FatalLogic wrote:
           | I don't know, but the computer had failed, the runway was
           | wet, and they were confused, so, yes, it's not unlikely they
           | were also afraid of skidding
        
           | oakmad wrote:
           | Oh 100%! Anti skid is only available in alternate and normal
           | laws. Section 6.5 braking system of this document.
           | 
           | https://www.smartcockpit.com/docs/A330_Flight_Deck_and_Syste.
           | ..
        
           | mertd wrote:
           | How hard is it to make airliner lose grip? There is a lot of
           | weight above the wheels and a lot of downforce from the
           | wings.
        
             | DC-3 wrote:
             | > a lot of downforce from the wings
             | 
             | Not unless the plane is upside down, surely...
        
               | raviolo wrote:
               | Don't call me Shirley!
        
               | thesh4d0w wrote:
               | If the spoilers are up, yeah.
        
           | noduerme wrote:
           | I was a passenger on a JetBlue flight in the winter of
           | 2000/1, where air traffic was being diverted from JFK due to
           | a snowstorm, but for whatever reason (fuel?) we had to land
           | there anyway. The runway was basically unplowed at that
           | point. As soon as we touched down there was a sensation we
           | were heading into a skid. I wouldn't really call it a braking
           | skid... it the plane just started turning away from its angle
           | of motion. The pilot somehow managed to slow down before we
           | went off the side of the runway. Ended up stuck in deep snow,
           | partially in a ditch. After what seemed to be some attempts
           | at getting us out if the snow, we were evacuated off the
           | slide. A couple people movers were brought out which we got
           | into, amd which themselves became stuck in the snow on the
           | runway for an hour ... fun times.
        
         | oakmad wrote:
         | I wonder how much was just being << used >> to auto braking
         | assistance and what felt right.
        
       | mplanchard wrote:
       | This could probably use a 2020 in the title
        
         | [deleted]
        
         | akamaka wrote:
         | Final report was just issued this week.
        
         | alkonaut wrote:
         | Maybe explains why there were only 87 pax in a 330 too.
        
       | fnord77 wrote:
       | > The root cause was determined to be an undue triggering of the
       | rudder order COM/MON monitoring concomitantly in the 3 FCPC. At
       | the time of the aircraft lateral control flight law switching to
       | lateral ground law at touch down, the combination of a high
       | COM/MON channels asynchronism and the pilot pedal inputs resulted
       | in the rudder order difference between the two channels to exceed
       | the monitoring threshold. The FCPC1 failed first.
       | 
       | In lay-programmers terms, this sounds like a race condition. Is
       | that correct?
        
       | rzzzt wrote:
       | Why can't thrust reversers and spoilers be engaged without
       | computer supervision? The article just says that ground spoilers
       | require one of the three flight computers to remain operative,
       | autobrake needs two, reversers needs one out of two specific
       | units to be running to unlock.
        
         | jeffbee wrote:
         | Accidental deployment of thrust reversers is a bad thing and
         | incidents of that kind have killed a lot of people. It seems
         | better to bias the failure modes toward non-deployment. The
         | reversers are not critical systems, and aircraft are permitted
         | to operate even if their reversers are not operational. They
         | mainly save wear and tear on the tires and brakes.
        
           | braza wrote:
           | This was one of the reasons of why TAM3057 crashed in Sao
           | Paulo.
        
           | HPsquared wrote:
           | In other words, it's more important to ensure they don't
           | deploy when they shouldn't, than that they do deploy when
           | requested.
        
         | avianlyric wrote:
         | Based on the article it seems this is a safety feature to
         | prevent the accidental activation of those functions in flight.
         | 
         | I imagine there's zero reasons for using ground spoilers or
         | reverse thrusters in the air, and doing so would probably cause
         | a complete loss of aircraft.
         | 
         | So to make sure that can't happen, the functions require a
         | positive "it's safe to active now" signal from the flight
         | computers.
         | 
         | So the functions themselves are quite capable of operating
         | without the flight computers. They simply refuse to do so
         | unless a flight computer has given them the all clear.
        
           | polishdude20 wrote:
           | Actually the ground spoilers can be used in a spectrum during
           | flight especially when the plane is descending and needs to
           | slow down as well.
        
           | numpad0 wrote:
           | Japan Airlines Flight 350 accident in 1982 - at 164ft/50m
           | AGL, the captain suffering a schizophrenic attack disengaged
           | A/P, activated reversers, and pushed down his control column
           | hard forward. 24 deaths, 150 survivors.
           | 
           | ugh.
        
           | EdwardDiego wrote:
           | > I imagine there's zero reasons for using ground spoilers or
           | reverse thrusters in the air, and doing so would probably
           | cause a complete loss of aircraft.
           | 
           | It has, on several occasions, although I'm not aware of any
           | where it was a commanded deployment of reversers in flight.
           | 
           | https://en.wikipedia.org/wiki/Lauda_Air_Flight_004 https://en
           | .wikipedia.org/wiki/TAM_Transportes_Aereos_Regiona...
        
             | avianlyric wrote:
             | Like I said, zero reasons for using reverse thrusters in
             | the air. Complete loss of air craft if you do.
        
         | bdonlan wrote:
         | All flight controls on the A330 are fly-by-wire - they're
         | connected via computers. The most critical surfaces have
         | secondary, dumb controllers that can take over if the primary
         | flight computers fail, but less critical systems may not be
         | connected to those backup systems - it's expected that a triple
         | fault of the primary flight computers should be very rare, and
         | manual braking is available as a backup after all.
        
           | NovemberWhiskey wrote:
           | Right. In this case we have a completely-unprecedented triple
           | failure of the primary flight control system causing loss of
           | auto-brake, spoilers and thrust reversers; a wet runway with
           | less than expected braking action; a tailwind landing _and_
           | still a happy outcome.
           | 
           | Honestly defense-in-depth seems to be working OK here.
        
             | gsnedders wrote:
             | And note that for all aircraft the landing distance is
             | calculated _without_ thrust reversers or spoilers, and thus
             | is based on braking performance alone.
             | 
             | Some of the challenge here was the seconds spent at near
             | landing speed without any braking being performed, as
             | that'll eat up runway distance fast, following the auto
             | brake failure prior to manual braking commencing.
        
               | azalemeth wrote:
               | It's also worth stating (for the OP) that thrust
               | reversers are potentially _really_ dangerous -- if they
               | deploy in flight without being commanded to, the aircraft
               | can -- or, more likely if the engine computer does not
               | detect it and shut itself down, _will_ become barely
               | controllable, with previous fatal outcomes. [1]
               | 
               | Two or more FCPCs failing is sufficient evidence that a
               | "faecal fan incident may be occurring" that the risk of
               | deploying them is just _not_ worth it, especially as
               | (pointed out by this parent above) they are effectively
               | optional equipment and the FAA _requires_ you to expect
               | them to be inop to be safe.
               | 
               | What they do save is fuel and time -- taxi time to the
               | terminal, permit the use of high-speed runway turnoffs,
               | etc.
               | 
               | [1] https://en.wikipedia.org/wiki/Lauda_Air_Flight_004
        
               | birdyrooster wrote:
               | (An aside) Bruh the ppl in Thailand looted the wreck, how
               | sad is that. What kind of shit person does such a thing?
        
             | redis_mlc wrote:
             | > Honestly defense-in-depth seems to be working OK here.
             | 
             | No. Given that once a computer failure happens (p=1.0),
             | then you analyze what the recovery procedures are from that
             | point. It sounds like Airbus is unsafe at that point.
             | 
             | This scenario illuminates why Boeing and Airbus have
             | different automation philosophies.
             | 
             | Source: commercially-rated pilot.
        
             | [deleted]
        
             | digikata wrote:
             | Yes, it worked because there as a direct controller to fall
             | back to. But worryingly because two things, it wasn't a
             | triple failure of the redundant flight computers, is seems
             | to point to a singular failure of the monitoring logic.
             | 
             | The other thing is that the report notes that the aircraft
             | came perilously close to a disaster because it calls out
             | other specific factors could have easily eaten up runway
             | margin leading to a disaster.
        
       | pseingatl wrote:
       | Airbus: what is it doing now?
        
       | aomobile wrote:
       | I wonder how the pilots would have reacted if the computers had
       | crashed 10-20 seconds earlier. Would they have landed or would
       | they go up again to wait for reboot?
        
         | bonzini wrote:
         | Difficult to say, since the computer crash was linked directly
         | to having just touched down. 10-20 seconds before there was no
         | disagreement because neither computer has detected that the
         | wheels were on the ground.
        
       | mysterydip wrote:
       | I don't entirely grok the vocabulary, but it sounds like the
       | computers each said "This input is outside of limits, something
       | must be wrong with me" so shut off for safety. But is there a
       | mechanism that says "we can't all be wrong, it must be the
       | sensor", to avoid situations like this?
        
         | zymhan wrote:
         | I'm not sure there was a sensor issue here. It appears that the
         | Flight Computer monitoring routine for the Rudder position
         | somehow caused all three computers to crash. This was somehow
         | exacerbated by the pilot's rudder inputs, which I don't fully
         | understand
         | 
         | > the combination of a high COM/MON channels asynchronism and
         | the pilot pedal inputs resulted in the rudder order difference
         | between the two channels to exceed the monitoring threshold
         | 
         | The flight computer failure resulted in the inability to use
         | two braking mechanisms, Thrust Reversers (make air from the
         | engine go forwardsish) and Spoilers (stop the wing from
         | producing lift, putting more weight on the wheels and making
         | the brakes more effective.
        
           | salawat wrote:
           | The thing that bugs me is "did the pilots engage those
           | systems manually?"
           | 
           | And
           | 
           | How was their reaction time doing so compared to someone who
           | would be used to doing so via muscle memory on the regular?
           | 
           | Automating things is nice and all, but there is something to
           | be said for keeping manual skills sharp.
        
             | avianlyric wrote:
             | Well in this case the entire system failed safe after a
             | pretty catastrophic failure of the automated systems.
             | 
             | So on the whole I would say this incident demonstrates that
             | the current safety standards, contingency plans, and pilot
             | train all work as needed. I don't think there's anything
             | here to suggest that the pilots manual skills are rusty.
             | 
             | And when talking about the specific systems that didn't
             | active. They didn't activate because they require a
             | positive indication from the flight computers that's it's
             | safe to activate. Something that probably can't be
             | overridden by the pilots. Which is why the planning process
             | for flights requires pilots to assume they won't work, and
             | ensure the runway is long enough for the worst possible
             | scenario.
        
               | scoopertrooper wrote:
               | > So on the whole I would say this incident demonstrates
               | that the current safety standards, contingency plans, and
               | pilot train all work as needed. I don't think there's
               | anything here to suggest that the pilots manual skills
               | are rusty
               | 
               | According to the article they had 30 feet of runway
               | remaining when they brought the plane to a halt. So, yeah
               | I guess, everything worked out, but I wouldn't say that
               | was indicative of a well oiled machine.
        
         | ak217 wrote:
         | It sounds like it was more complex than that. The computers
         | shut down because of a COM-MON order mismatch. In other words,
         | there is a watchdog in the system that monitors the three
         | computers and their orders (control surface movements, etc. -
         | in this case rudder inputs). There are 3 computers and one of
         | them is designated COM (command) while another is designated
         | MON (monitoring). If the values of the inputs that the
         | computers want to send to the actuators diverge too much for
         | too long, the command computer is disconnected and another
         | computer is designated command. In this case it sounds like the
         | command and monitoring computers had a significant delay
         | between them when they transitioned from flight law (settings
         | used in flight) to ground law (settings used while on the
         | ground) while the pilot was also applying rudder (which is very
         | common during landing, to correct for crosswind). When on the
         | ground, the rudder is set to not deflect as much from the
         | pilot's inputs compared to when in the air. It sounds like
         | normally the computers switch to ground law close enough in
         | time that the resulting mismatch is ignored, but in this case
         | the significant delay caused a "split brain" situation and the
         | watchdog disconnected all 3 computers.
         | 
         | Perhaps the root cause is an issue in the weight-on-wheel
         | sensor inputs that the computers use to transition. The
         | computers need to use redundant sensors so they don't all rely
         | on the same faulty sensor, but maybe the sensors are not
         | appropriately cross-fed or the sensor input combination logic
         | is too different between the computers.
         | 
         | The problem with not disconnecting a computer in this situation
         | is that bad inputs from that computer may also cause a crash.
         | But maybe if no other computer is available to take its place
         | the logic should be different, at least during takeoff/landing.
        
           | avianlyric wrote:
           | Im gonna re-explain what you've written in language I'm more
           | familiar with to check my understanding, and hopefully you
           | can correct any errors I make.
           | 
           | As I understand it the A330 has three primary flight
           | computers, all observing the same inputs (which might come
           | from different physical sensors monitoring the same thing?)
           | and producing outputs, also know as "orders" for other
           | systems in the plan, like actuators.
           | 
           | Of these three computers, one will act as the primary command
           | computer (COM), one as a monitoring computer (MON), the third
           | is spare and normally ignored. Only orders from the COM
           | machine is sent to downstream systems like actuators.
           | 
           | There's a separate watchdog system that monitors the outputs
           | (orders) of both the COM and MON, and if their order values
           | diverge by too much for too long, it shuts down the COM
           | computer and passes control to the MON computer and the
           | spare. As part of this process one of those two computers
           | becomes COM the other MON. I assume the size of the
           | divergence determines how long they can diverge for. A large
           | divergence is only allowable for a very short period of time,
           | and a small divergence is allowed for longer.
           | 
           | In addition to all of this, the computers have different
           | operating modes which changes how they respond to inputs. In
           | this case the relevant states are "normal law" for in the
           | air, and "ground law" for on the ground. One things that's
           | different between these states is how tightly coupled rudder
           | inputs from the pilot are to rudder orders to the actuator.
           | In the air the rudder is less tightly coupled than on the
           | ground. E.g. commanding full left rudder in the air results
           | in less physical movement of the rudder than the same command
           | on the ground.
           | 
           | When the plane lands, detected by a pressure sensor in the
           | landing gear, the computers transition from "normal law" to
           | "ground law". Which for some outputs, like the rudder, might
           | result in a step change in outputs (orders) from the
           | computers.
           | 
           | So in this specific scenario what happened is that the flight
           | computers for some reason didn't transition between "normal
           | law" and "ground law" simultaneously (or close enough to
           | simultaneously). So the COM computer significantly changed
           | its rudder output as a result of changing law, but the MON
           | computer didn't, because it hadn't changed law yet (the
           | inverse of this is also possible). As a result they were
           | producing very large differences in rudder orders, resulting
           | in the monitoring watchdog killing the COM computer and
           | failing over to the spare. Where the above situation happened
           | a second time, resulting in all computers being shut down.
           | 
           | Is all of the above correct?
           | 
           | All of this does make me wonder, if changing law can result
           | in computer outputs quickly changing, doesn't that make law
           | changes inherently dangerous? If you're a pilot landing a
           | plane applying significant rudder inputs, doesn't the above
           | me that those inputs will have a vastly different effect once
           | the wheels touch the runway?
        
             | lisper wrote:
             | [Private pilot here]
             | 
             | > doesn't that make law changes inherently dangerous?
             | 
             | Well, yeah, but _landing_ is inherently dangerous for the
             | exact same reason: the system dynamics change suddenly when
             | the wheels touch the ground. That happens (obviously) as a
             | consequence of the laws of physics whether or not you have
             | a computer in the loop. So this is a risk that just goes
             | with the territory.
        
             | AnimalMuppet wrote:
             | Is MON one of the three, or is it a fourth computer? If
             | it's one of the three, how does the third computer get
             | disabled?
        
               | avs733 wrote:
               | as I understand it...and I am not an expert but have been
               | exposed to some similar systems just not with Airbus...I
               | believe the following is correct at a systems design
               | level:
               | 
               | * There are three flight 'computers' (boxes) (its more
               | complex than that but that complexity is not germane to
               | your question)
               | 
               | * each box has two entirely different motherboards with
               | different processors and independent software inside of
               | it
               | 
               | * each motherboard takes the same inputs and calculates
               | the appropriate outputs.
               | 
               | * if those outputs disagree, inside of the same box, you
               | get a COM/MON fault and the box/system takes an
               | appropriate action...such as disengaging
               | 
               | * once all of THAT happens in a single box...the boxes
               | are also looking to see if all three boxes are agreeing
               | with each other. This is where you get 'voting.
               | 
               | * if all three boxes agree, great! If two agree,
               | disregard the third. If none agree, execute fault
               | fallbacks.
               | 
               | * If you run out of computers doing things that make
               | sense - shut the computers off and make really loud
               | noises to alert the pilots they are on their own
               | 
               | so...you have the computer agreeing with itself and then
               | you have the computers agreeing with each other. Both are
               | important/critical for fault tolerance.
        
             | marcodiego wrote:
             | > So in this specific scenario what happened is that the
             | flight computers for some reason didn't transition between
             | "normal law" and "ground law" simultaneously (or close
             | enough to simultaneously). So the COM computer
             | significantly changed its rudder output as a result of
             | changing law, but the MON computer didn't, because it
             | hadn't changed law yet (the inverse of this is also
             | possible). As a result they were producing very large
             | differences in rudder orders, resulting in the monitoring
             | watchdog killing the COM computer and failing over to the
             | spare. Where the above situation happened a second time,
             | resulting in all computers being shut down.
             | 
             | Possible solution: always designate COM/MON computers which
             | agree on the mode: flight or ground. Only disable a primary
             | COM computer if it disagrees with the MON computer while
             | both are running on the same mode.
             | 
             | > There's a separate watchdog system that monitors the
             | outputs (orders) of both the COM and MON, and if their
             | order values diverge by too much for too long, it shuts
             | down the COM computer and passes control to the MON
             | computer and the spare.
             | 
             | So, if the MON computer is faulty it will always disable
             | the 3 computers?
        
               | avianlyric wrote:
               | > Possible solution: always designate COM/MON computers
               | which agree on the mode: flight or ground. Only disable a
               | primary COM computer if it disagrees with the MON
               | computer while both are running on the same mode.
               | 
               | I'm not sure that helps, how do know that the COM
               | computer is correct and MON isn't? Ultimately you only
               | really care if the two computers are trying to the plane
               | to do different things, if they're in different modes but
               | producing the same outputs I'm not sure how much you
               | would care.
               | 
               | > So, if the MON computer is faulty it will always
               | disable the 3 computers?
               | 
               | I'm just interpreting what I've read. If you know better
               | then please do tell us.
        
               | marcodiego wrote:
               | Actually, I know nothing about the subject; Please don't
               | take my comments as such. Sorry, I should have made that
               | clear.
               | 
               | About the first proposal: redundancy using majority of
               | votes is well known.
               | 
               | Second, GP said:
               | 
               | > There's a separate watchdog system that monitors the
               | outputs (orders) of both the COM and MON, and if their
               | order values diverge by too much for too long, it shuts
               | down the COM computer and passes control to the MON
               | computer and the spare.
               | 
               | What I read from this is: COM differs from MON; watchdog
               | disables COM and uses MON and spare as a new COM/MON. But
               | if previous MON was faulty, it will still differ from
               | spare (except if both are failing in a sufficiently
               | similar way).
        
             | strogonoff wrote:
             | > If you're a pilot landing a plane applying significant
             | rudder inputs, doesn't the above me that those inputs will
             | have a vastly different effect once the wheels touch the
             | runway?
             | 
             | IANAP so hopefully someone will correct me, but from my
             | understanding in this case the answer would be "yes, they
             | will have a different effect, as they should".
             | 
             | Rudder is used in crosswind landings to maintain runway
             | alignment while still in air; once enough weight is on
             | wheels and speed is low enough rudder quickly loses its
             | effectiveness and control shifts to wheel steering and
             | brakes.
             | 
             | That said, I suspect it doesn't help that flight computer
             | sort of has a binary context flag (either we are in the air
             | or on the ground), it might have simplified some of the
             | business logic but does not seem to map well to reality at
             | a crucial moment. If imagined in slow motion, the system
             | doesn't just flip a state but goes through a spectrum.
        
           | cwizou wrote:
           | > Perhaps the root cause is an issue in the weight-on-wheel
           | sensor inputs that the computers use to transition. The
           | computers need to use redundant sensors so they don't all
           | rely on the same faulty sensor, but maybe the sensors are not
           | appropriately cross-fed or the sensor input combination logic
           | is too different between the computers.
           | 
           | This would indeed explain the discrepancies between COM/MON
           | which is what they called asynchronicity (which is a bit
           | different from where my mind went when I read that word the
           | first time). There was no particular fault found in the
           | computers.
           | 
           | Maybe wet runway factored in a bit (or some other issue with
           | the runway itself), but this is a common occurence around the
           | world and this was a first on the whole 330/340 family.
           | 
           | > The problem with not disconnecting a computer in this
           | situation is that bad inputs from that computer may also
           | cause a crash. But maybe if no other computer is available to
           | take its place the logic should be different, at least during
           | takeoff/landing.
           | 
           | Yes, only thing I would add is that they failed in cascade
           | within one second of each other, 3 seconds after main gear
           | touching (and before the nose gear touched down):
           | 
           | > Three seconds after the main gear touched down, autobrake
           | system fault was recorded on FDR. One second later,
           | PRIM1/PRIM2/PRIM3 faults were recorded at the same time and
           | the spoilers retracted, as the ground spoiler function was
           | lost.
           | 
           | Whether all three should be allowed to be failed within one
           | second then default back to direct law (manual) is a hard
           | question to solve, in this precise case it seems like try
           | again would have maybe worked? Would it always be safe to do
           | a retry? I don't know. This is a super critical phase of
           | flight so retrying doesn't sound safe.
           | 
           | Apparently they would have needed 2 functional for full
           | autobrake:
           | 
           | > Ground spoilers function requires at least one functional
           | FCPC, arming autobrake requires at least two functional
           | FCPCs, deployment of thrust reversers require unlock signal
           | from either FCPC1 or FCPC3
        
         | phkahler wrote:
         | That's how I read it too. The discrepancy was deemed "I'm
         | broken and need to shut down" instead of something more benign
         | in a condition they didnt realize might happen. That a critical
         | difference since all 3 will think "I'm broken" as happened
         | here.
         | 
         | That stuff can be hard to get right. Glad they found this one
         | with 10 meters to spare.
        
           | avianlyric wrote:
           | It seems that flight planning is done on the basis that the
           | loss of all flight computers is possible, and to make sure
           | that your runway is long enough to accommodate that
           | situation.
           | 
           | So thankfully the loss of all three computers isn't
           | inherently dangerous, and is demonstrated by this incident.
           | Based on my reading of the report the primary take away from
           | this (apart from a an issue with the flight computers
           | software), is that flight plans aren't and prep by the
           | airport wasn't conservative enough, so they ended up with
           | slightly less safety margin than they expected in this
           | scenario.
        
         | noir_lord wrote:
         | > "we can't all be wrong, it must be the sensor", to avoid
         | situations like this?
         | 
         | Not an expert but my understanding is that typically with these
         | systems they take a poll and vote, if 1/3 disagree it's ignored
         | if 2/3 disagree they scream and switch to manual/fallback
         | simpler systems.
        
           | zymhan wrote:
           | Indeed, it seems like the computers were not in agreement
           | 
           | > the combination of a high COM/MON channels asynchronism
           | 
           | In this case, since the automation failed, the plane reverted
           | to "Direct law", where the pilot's inputs directly control
           | the plane, instead of passing through computer checks first.
        
           | sgtnoodle wrote:
           | For a binary signal, it's impossible for all three to
           | mismatch. For a more analog signal, all three will generally
           | mismatch to some extent whether it's in time or in space or
           | both.
           | 
           | Fault tolerance and fault detection are two separate but
           | often coupled concepts. Systems can be designed to be
           | inherently tolerant to a fault without detecting the fault
           | (and good designs often are). All faults need to be detected
           | eventually, though, so that they can be repaired before more
           | faults occur and compound into a broken system.
           | 
           | There's typically very tight timing requirements for fault
           | tolerance, and significantly looser timing requirements for
           | detection. As a result, you can often solve them differently.
           | In a 3 string system with voting, it's often the case that
           | the median signal is used for control without any
           | interpretation of "goodness". That strategy works fine for
           | short time periods of fault tolerance, as two strings would
           | have to produce bad signals for the system to be affected.
           | Separate from the median voting control path, you would then
           | have a variety of consistency checking algorithms looking at
           | the three signals and trying to intelligently determine
           | whether any of the strings have failed. Those algorithms are
           | often stateful and complicated, and rely on heavy filtering
           | to avoid false positives.
           | 
           | When a fault is detected, at minimum it needs to be
           | communicated to an operator. In some cases, the detected
           | fault will also trigger a "fault response", i.e. disabling
           | the offending computer.
           | 
           | In this case, it sounds like maybe a fault detection
           | algorithm had a false positive that disabled the computer,
           | and the same algorithm was running on all three computers.
           | 
           | Despite there being 3 computers, it doesn't sound like this
           | is a 3 string voting system. Rather, each of the three
           | computers are independently able to control the system. The 3
           | strings exist for redundancy rather than for fault tolerance.
           | Fault tolerance is provided by having two computers that
           | cross-check everything they do, and third computer is there
           | so that the cross-checking is fault tolerant. Two string
           | redundancy is very common in automotive and aerospace.
        
       | oakmad wrote:
       | For everyone commenting about different hardware failure modes
       | and possible solutions. Take a look at the airbus << flight
       | control laws >> such as https://apstraining.com/wp-
       | content/uploads/FCS-Airbus-Quick-... it governs what happens to
       | flight controls and what protections are available in different
       | scenarios. In this case the system worked, it crashed and
       | reverted to direct law providing manual braking control.
       | 
       | Off topic but I found this very interesting << The deceleration
       | performance of the occurrence flight between 6,600 feet and 7,300
       | feet from the threshold of runway 10 deteriorated. It may be due
       | to paint marking and rubber deposit on the touchdown zone of
       | runway 28. >>
        
         | cwizou wrote:
         | > The deceleration performance of the occurrence flight between
         | 6,600 feet and 7,300 feet from the threshold of runway 10
         | deteriorated. It may be due to paint marking and rubber deposit
         | on the touchdown zone of runway 28
         | 
         | Just to clarify for others, this is the same runway but taken
         | in two different directions (west to east and east to west,
         | multiply the number by 10 and you get [rounded] angle in
         | degree, so 100ish vs 280ish). It's 8500 ft. long and it looks
         | like there was a loss of performance around the "touchdown"
         | point of going the other way on that runway, where you had
         | accumulated rubber from those landings (which is pretty normal
         | and monitored, as pointed by leecb below), if that makes any
         | sense.
        
           | leecb wrote:
           | > accumulated rubber from those landings
           | 
           | Rubber accumulation on runways is a normal and expected
           | situation. Part of runway maintenance at airports serving
           | larger aircraft includes regular removal of the rubber using
           | high pressure water or chemicals.
           | 
           | https://en.wikipedia.org/wiki/Airfield_rubber_removal
        
             | ThePadawan wrote:
             | I'm still always surprised _just how much_ rubber is
             | allowed to accumulate.
             | 
             | Here's a snapshot of one of the Zurich runways: https://www
             | .google.com/maps/@47.4792249,8.5411849,660m/data=...
             | 
             | Scroll a little NW or SE to see what the _actual_ color of
             | the runway is underneath all the rubber.
        
               | waynesonfire wrote:
               | i'm not surprised, look how invasive the cleaning
               | techniques are.
        
             | salawat wrote:
             | Not going to lie, I never thought this was a thing that'd
             | need doing, but on further reflection it seems so obvious
             | I'm kinda mad it never occurred to me in retrospect.
        
           | oakmad wrote:
           | Its unfortunate they don't calculate how big a difference it
           | made. They'd calculated a 170ft margin and stopped with 30ft
           | margin and I'm curious what their braking delta was for that
           | 700ft of deterioration.
           | 
           | << With three FCPCs inoperative, actual remaining runway
           | distance (30 feet margin) of the occurrence flight was
           | shorter than the calculated value (172 feet margin), possibly
           | due to tailwinds, runway conditions, and manual braking as
           | these factors might increase the braking distance. >>
           | 
           | I get sucked into how all the little things that are not
           | right make such a big difference in these kind of scenarios.
        
             | colechristensen wrote:
             | Complex systems and aerospace particularly are dominated by
             | little things. Failure or near misses are never single
             | cause, those are all designed for. When fails to happen it
             | is almost always a conspiracy of serval things out put
             | together any one of which would have mostly prevented the
             | incident.
             | 
             | There is a lot that practices involving life critical
             | complex systems can teach other fields.
        
               | dmurray wrote:
               | Agreed, and let's not lose sight of the fact that...this
               | was a successful landing. Everyone walked away from it
               | and the passengers likely never knew anything went wrong.
               | 
               | Four hopefully independent things went wrong here: three
               | computer failures plus less-than-documented braking
               | performance. Additionally, there were at least two
               | aggravating circumstances: wet conditions and a tailwind
               | landing. Part of the investigation is to find out if
               | there was a fourth or fifth system failure (poor pilot
               | reaction time? Rubber level on the runway unacceptable?
               | Either is plausible, but far from indicated by the report
               | so far).
               | 
               | One extra thing going wrong here, in the wrong direction
               | (anything that impaired braking or pilot reaction time)
               | would likely have led to loss of life. Investigating near
               | misses like this, and not only being exercised once five
               | things go wrong and a hundred people die, is a sign of a
               | healthy safety culture.
        
         | jcrawfordor wrote:
         | Build-up of rubber from the tires of landing airplanes is a
         | routine problem with runway maintenance. Typically the airport
         | authority will use a friction-measurement device at regular
         | intervals to determine when the surface friction of the runway
         | has fallen below required parameters, and resurface. But rain
         | makes things much worse and might have combined with rubber
         | build-up to produce a particularly slick section.
         | 
         | Flying light aircraft, I was taught to avoid landing right in
         | the touchdown zone just to have better traction. I'm not sure
         | if this is actually useful or just Old Pilot Superstition but
         | it has some logic to it, and putting a Skyhawk down on a 14k
         | foot runway you have a lot of room for eccentric opinions
         | (similar to landing just a touch off centerline so the
         | nosewheel isn't on the painted stripes, another one I've heard
         | people mention as a "best bad practice").
        
           | kylegordon wrote:
           | If I recall rightly Concorde pilots often took control from
           | autoland and landed it slightly off the center line.
           | 
           | This was to avoid upsetting the passengers and to ensure the
           | champagne didn't get shaken too much.
        
           | azalemeth wrote:
           | My dad was a commercial pilot, and taught me to fly at a
           | young age. One of the bits of sage advice was something along
           | the lines of "if your nosegear has two or more wheels, try to
           | get it bang on the centre line, every time. Otherwise, miss a
           | little tiny bit. The lights go THUMP-THUMP-THUMP and
           | eventually drive you mad or, combined with a little gust of
           | wind, cause you to hop once more".
           | 
           | I think there's a lot of Old Pilot Superstition with more
           | than a dollop of truth in it.
        
           | ErikVandeWater wrote:
           | It's very odd they don't do anything to speed up the wheels
           | or apply a coating of water to them on dry days to avoid
           | extreme wear and excess rubber for future planes landing.
           | 
           | Even a passive design with the shape of the tread could help
           | substantially reduce the amount of rubber scraped off the
           | wheel on each landing.
        
             | ncmncm wrote:
             | Apparatus to spin up wheels was analyzed exhaustively,
             | early on. They determined that the added weight and
             | complexity did not pay.
             | 
             | That is not to say that some clever physical design could
             | not help. But the idea of spinning up wheels is well-known
             | to gear engineers. Tire wear is a substantial expense, so
             | any bright ideas would be welcomed, and eventually tried
             | out.
        
           | rteuionwiv wrote:
           | I don't think anything too bad could happen when landing
           | light aircraft on slippery but long runway. I was doing some
           | winter flying and was landing pa28 on slippery runway
           | (compressed snow) and I didn't noticed any difference until i
           | started pressing brakes which had virtually no effect on
           | deceleration. On touchdown you have a lot of aerodynamic
           | authority on your controls so runway friction doesn't really
           | matter unless you have super strong crosswind that could blow
           | you out the runway i guess.
        
       | teeray wrote:
       | I wonder what would have happened if they tried to go around? The
       | TO/GA is configured ahead of time in the FMS... so if the flight
       | computers are out, I wonder if there's some hard-coded
       | performance targets for the engines in a go around situation.
        
         | concerned_user wrote:
         | They would push throttle manually in max power and perform the
         | maneuver by hand.
        
         | [deleted]
        
         | hugh-avherald wrote:
         | There was barely a second between the first indication of a
         | primary computer fault and the call for reversers to deploy.
         | Once you select reverse thrust, you cannot go around under any
         | circumstances.
        
           | strogonoff wrote:
           | It's an interesting scenario. Your wheels touch the ground;
           | you engage reverse thrust and it instantly fails; you are
           | rolling at dangerous landing speed but now can't go around?
           | 
           | Relatedly, I wonder why did the pilots not engage the regular
           | brakes ASAP.
        
             | avianlyric wrote:
             | Not quite, reading around it seems that flight planning
             | requires you to assume that reverse thrusters and ground
             | spoilers don't work.
             | 
             | So you land, your reverse thrusters immediately fail, and
             | you're now required to stop using your manual breaks and
             | consume all of the breaking distance you originally planned
             | for, with presumably a bit of margin on top.
        
       ___________________________________________________________________
       (page generated 2021-09-05 23:00 UTC)