RISKS-LIST: RISKS-FORUM Digest Sunday, 13 December 1987 Volume 5 : Issue 73 FORUM ON RISKS TO THE PUBLIC IN COMPUTERS AND RELATED SYSTEMS ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator Contents: Australian datacom blackout (Barry Nelson) Finally, a primary source on Mariner 1 (John Gilmore, Doug Mink, Marty Moore) Re: Computer-controlled train runs red light (Nancy Leveson) Re: interconnected ATM networks (John R. Levine, Darren New) Control-tower fires (dvk) Loss-of-orbiter (Dani Eder) Re: EEC Product Liability (John Gilmore) The Presidential "Football"... (Carl Schlachte) Radar's Growing Vulnerability (Jon Eric Strayer) The RISKS Forum is moderated. Contributions should be relevant, sound, in good taste, objective, coherent, concise, nonrepetitious. Diversity is welcome. Contributions to RISKS@CSL.SRI.COM, Requests to RISKS-Request@CSL.SRI.COM. For Vol i issue j, FTP SRI.COM, CD STRIPE:, GET RISKS-i.j. Volume summaries for each i in max j: (i,j) = (1,46),(2,57),(3,92),(4,97). ---------------------------------------------------------------------- Date: Tue, 8 Dec 87 16:11:43 EST From: Barry Nelson Subject: Australian datacom blackout To: risks@csl.sri.com Cc: telecom@xx.lcs.mit.edu From The Australian, 23 November 1987, Sydney, Australia, Page 1, 2nd edition. [without permission] 8-Column BANNER: SABOTEUR TRIED TO BLACK OUT AUSTRALIA The heart of Sydney's business district remains in chaos after a dangerously well-informed saboteur wreaked havoc on the city's fragile telecommunications system in an attack intended to destroy [Australian] Telecom's operations nationwide. [An estimated 2000 central city services remain out this morning] Investigators described the sinister saboteur as a lone, former Telecom employee with expert knowledge of the underground cables network. [...] But Telecom said it could have been much worse. [only Sydney was hit] but all international services are routed through Sydney [...] [The attacker entered the underground tunnels and] severed 24 of the 600 heavy cables in 10 carefully selected locations. The bizarre attack knocked out 35,000 telephone lines in 40 Sydney suburbs and brought dozens of [ATMs, POS, stores, telex, facsimile and betting office] services to a standstill. [...] Hundreds of computers broke down, leaving communications and computer specialists to ponder the real possibility of vital information being erased from tapes in banking, insurance and other industries. [The largest banks and the international and local PTT offices were all cut off. Speculation is that the attacker's information was over two years old because the same attack at that time would have completely crippled Telecom Australia. Security locks have now been put on the manhole covers. Just the reconnection effort is estimated to cost millions of dollars and full damages will not be known until businesses have time to detect losses. A man seen leaving a manhole on Wednesday night was possibly the saboteur reconnoitering his targets. ...] Page 2, 4-columns, 5x7 foto - SABOTAGE IS A NIGHTMARE FOR TELECOM'S WEARY BAND Four hundred Telecom managers, technicians and linesmen worked frantically toward today's 9am deadline [to restore the damaged services. Some worked 48 hours straight with only brief napping.] When the enormity of the sabotage was realised (sic) on Friday, a team of technicians and linesmen was sent into the tunnels to discover the damage. The cuts, which were only a centimeter across, could only be found by touch in the dark, dank tunnels. "The workmen had to run their hands along the entire length of the cables until all the cuts were discovered. Some of them walked over 20 miles on Friday night", said Roger Bamber, the [New South Wales] Telecom Operations Manager [The system contains 27 km of tunnels. It is estimated that the damage could have been done by one well-prepared man over a period of less than one hour.] Things started to go wrong in the city about 7pm on Friday, and workmen searched through the night until 6am to find all the damage. [Other searches were launched over half the state for bombs or other evidence of sabotage.] [ ... in other articles ] [Employees' anger at turncoat Telecom policies suggest an insider hacked the cables. The telephone workers' union objects to deregulation which has resulted in years of acrimonious debate. Last week's Telecom statements suggested that an independent regulator will be created. The union doesn't approve of this action and prefers monopoly.] -----One must wonder if the REAL crime was obscured by the Telecom outage----- ------------------------------ Date: Sun, 13 Dec 87 05:30:10 PST From: hoptoad.UUCP!gnu@cgl.ucsf.edu (John Gilmore) To: RISKS@KL.SRI.COM Subject: Finally, a primary source on Mariner 1 My friend Ted Flinn at NASA (flinn@toad.com) dug up this reference to the Mariner 1 disaster, in a NASA publication SP-480, "Far Travelers -- The Exploring Machines", by Oran W. Nicks, NASA, 1985. "For sale by the Superintendent of Documents, US Government Printing Office, Wash DC." Nicks was Director of Lunar and Planetary Programs for NASA at the time. The first chapter, entitled "For Want of a Hyphen", explains: "We had witnessed the first launch from Cape Canaveral of a spacecraft that was directed toward another planet. The target was Venus, and the spacecraft blown up by a range safety officer was Mariner 1, fated to ride aboard an Atlas/Agena that wobbled astray, potentially endangering shipping lanes and human lives." ..."A short time later there was a briefing for reporters; all that could be said -- all that was definitely known -- was that the launch vehicle had strayed from its course for an unknown reason and had been blown up by a range safety officer doing his prescribed duty." "Engineers who analyzed the telemetry records soon discovered that two separate faults had interacted fatally to do in our friend that disheartening night. The guidance antenna on the Atlas performed poorly, below specifications. When the signal received by the rocket became weak and noisy, the rocket lost its lock on the ground guidance signal that supplied steering commands. The possibility had been foreseen; in the event that radio guidance was lost the internal guidance computer was supposed to reject the spurious signals from the faulty antenna and proceed on its stored program, which would probably have resulted in a successful launch. However, at this point a second fault took effect. Somehow a hyphen had been dropped from the guidance program loaded aboard the computer, allowing the flawed signals to command the rocket to veer left and nose down. The hyphen had been missing on previous successful flights of the Atlas, but that portion of the equation had not been needed since there was no radio guidance failure. Suffice it to say, the first U.S. attempt at interplanetary flight failed for want of a hyphen." ------------------------------ Date: Tue, 8 Dec 87 11:42:36 EST From: mink%cfa@harvard.harvard.edu (Doug Mink) To: risks@csl.sri.com Subject: Mariner 1 from NASA reports JPL's Mariner Venus Final Project Report (NASA SP-59, 1965) gives a chronology of the final minutes of Mariner 1 on page 87: 4:21.23 Liftoff 4:25 Unscheduled yaw-lift maneuver "...steering commands were being supplied, but faulty application of the guidance equations was taking the vehicle far off course." 4:26:16 Vehicle destroyed by range safety officer 6 seconds before separation of Atlas and Agena would have made this impossible. In this report, there is no detail of exactly what went wrong, but "faulty application of the guidance equations" definitely points to computer error. "Astronautical and Aeronautical Events of 1962," is a report of NASA to the House Committee on Science and Astronautics made on June 12, 1963. It contains a chronological list of all events related to NASA's areas of interest. On page 131, in the entry for July 27, 1962, it states: NASA-JPL-USAF Mariner R-1 Post-Flight Review Board determined that the omission of a hyphen in coded computer instructions transmitted incorrect guidance signals to Mariner spacecraft boosted by two-stage Atlas-Agena from Cape Canaveral on July 21. Omission of hyphen in data editing caused computer to swing automatically into a series of unnecessary course correction signals which threw spacecraft off course so that it had to be destroyed. So it was a hyphen, after all. The review board report was followed by a Congressional hearing on July 31, 1962 (ibid., p.133): In testimony befre House Science and Astronautics Committee, Richard B. Morrison, NASA's Launch Vehicles Director, testified that an error in computer equations for Venus probe launch of Mariner R-1 space- craft on July 21 led to its destruction when it veered off course. Note that an internal review was called AND reached a conclusion SIX DAYS after the mission was terminated. I haven't had time to look up Morrison's testimony in the Congressional Record, but I would expect more detail there. The speed with which an interagency group could be put together to solve the problem so a second launch could be made before the 45-day window expired and the lack of speed with which more recent problems (not just the Challenger, but the Titan, Atlas, and Ariane problems of 1986 says something about 1) how risks were accepted in the 60's, 2) growth in complexity of space-bound hardware and software, and/or 3) growth of the bureaucracy, each member of which is trying to avoid taking the blame. It may be that the person who made the keypunch error (the hyphen for minus theory sounds reasonable) was fired, but the summary reports I found indicated that the spacecraft loss was accepted as part of the cost of space exploration. Doug Mink, Harvard-Smithsonian Center for Astrophysics, Cambridge, MA Internet: mink@cfa.harvard.edu UUCP: {ihnp4|seismo}!harvard!cfa!mink ------------------------------ Date: 11 Dec 87 16:54:00 EST From: "Marty Moore" Subject: Mariner I To: "risks" I've just caught up on two months of back RISKS issues. I have the following to contribute on Mariner I, based on my time at the Cape: 1. Mariner I was before my time, but I was told the story by a mathematician who had been at the Cape since 1960. According to him, an algorithm, written as mathematical formulae, involved a Boolean entity R. At the point of failure, the mathematician had written NOT-R, that is, "R" with a bar above the character; however, the programmer implementing the algorithm overlooked the bar, and so used R when he should have used NOT-R. This explanation could subesequently have been interpreted as "missing hyphen", "missing NOT", or "data entry problem", all of which we've seen in recent contributions. 2. I think the FORTRAN version of the story is very unlikely. Remember that the error occurred in a critical on-board computer. I consider it extremely unlikely that such a computer would have been programmed in FORTRAN in 1962, considering that the first use I saw of FORTRAN in a ground-based critical system at the Cape was not until 1978! (Of course, I wasn't aware of *every* computer in use, so there may have been an earlier use of FORTRAN, but I'd be surprised if it was more than a few years earlier.) It is possible that the originator of the FORTRAN version of the story may have been aware of another error caused by the period/comma substitution, and also aware of the Mariner problem as a "single character" error, and incorrectly associated the two. [There were other messages (e.g., from Eric Roberts, Eugene Miya, and Jim Valerio) on this subject as well, but there is too much redundancy or lack of definitude to include them all... PGN] ------------------------------ To: risks@csl.sri.com Subject: Re: Computer-controlled train runs red light Date: Sat, 12 Dec 87 20:31:44 -0800 From: Nancy Leveson In Risks 5.69, Steve Nuchia writes: >Surely these engineers can't be so paranoid as to think that an exact >duplication of their (primarily digital) relay-based control system in >software would be hard to verify. It should at least be possible to build a >software implementation that could be easily shown to be equivalent to the >relays, leaving aside the problem of validating an arbitrary "spagetti code" >implementation. The failure modes of mechanical systems are usually well understood and very limited in number. Therefore, system safety engineers are able to build in interlocks and other safety devices to control these hazards. The failure modes of software are much more complex and less is known about how to control software hazards. Even if the same functionality is implemented in the software, that does not mean that the failure modes and mechanisms are identical nor that the complexity of the two systems is equivalent. Software also exhibits discontinuities not usually found in mechanical relay systems. If identical function is implemented in software, then the probability of requirements errors in the software is equivalent to design errors in the mechanical system. But there is an additional possibility of introducing implementation errors in the software. Given identical function of both types of systems (and thus identical probability of accidents arising from problems in this functional design), then the additional probability of design and coding errors in the software is not necessarily identical to the probability of random "wearout" failures in the mechanical system (the primary cause of failures in mechanical systems). >Automobile traffic light control boxes, based on relay technology quite >similar to that used in railroads, fail every so often due to ants building >mounds in the nice warm cabinets. People have been killed by this bug in a >relay system, yet it fails to generate the kind of emotional response that >software bugs do. --- Certainly there are accidents in conventional mechanical systems. However, the concern about software bugs is more than just an irrational emotional response. There are very good scientific reasons for it. Besides that noted above (greater understanding of failure modes and mechanisms in mechanical systems and thus better methods to control hazards), it is also possible to perform risk assessment on mechanical systems due to reuse of standard components with historical failure probability data. This is not possible for software. Certainly these risk figures are not always accurate, but it is not irrational to feel more comfortable about a system with a calculated risk of an accident of 10^-9 over 10 years time than a system with a calculated risk of "?". Besides, I question whether accidents caused by mechanical failures generate less emotional response than accidents caused by software bugs. Consider Challenger and Three Mile Island. It is natural for computer scientists to have considerable interest in computer-related accidents and reasonable for non-computer scientists to be worried about software bugs. Nancy Leveson, UCI ------------------------------ Date: Tue, 8 Dec 87 22:06:27 EST From: johnl@ima.ISC.COM (John R. Levine) To: risks@csl.sri.com Subject: Re: interconnected ATM networks The story about BayBanks vs. Bank of Boston ATM cards is even more interesting than it initially sounds. BayBanks and Bank of Boston are arch-rivals in consumer banking, and they run the two largest ATM networks in the region, XPress 24 and Yankee 24, respectively. (Yankee 24 is a consortium, but Bank of Boston is by far the largest participating bank.) When Yankee 24 was expanded from its Connecticut base to cover all of New England, XPress 24 was invited to join, but they declined and BayBanks has since filed an anti-trust suit against Yankee 24, so far to no effect. A few years ago, the two banks jointly set up a system of retail store cash dispensers called Money Supply. Both XPress 24 cards and Monec cards (Bank of Boston's previous network, now folded into Yankee 24) work at Money Supply machines. One day shortly after Money Supply came up, while waiting for a plane at Logan Airport in Boston, I noticed that one of the BayBank XPress 24 machines had a small Money Supply sticker on it, and upon trying my Bank of Boston card, was surprised to discover that it worked. Subsequent experimentation showed that other than the four airport BayBank machines, neither bank's machines accepted the other's cards, and the XPress 24 machines gave a peculiar message that "your bank has restricted use of this card at this terminal." The fact that Bank of Boston cards worked at the airport was not widely known, even at the two banks. Thus I was as surprised as anybody to discover that when both banks joined NYCE, they started taking each other's cards, since I was under the impression that BayBanks' network already routed Bank of Boston requests via other paths which were usually blocked, and vice versa. This suggests that perhaps BayBanks doesn't entirely understand how their ATM network routes messages to off-network banks. If I were they, I'd be pretty nervous. John Levine, johnl@ima.isc.com or ima!johnl or Levine@YALE.something ------------------------------ To: RISKS@kl.sri.com Subject: Re: ATM PIN numbers Date: Sun, 29 Nov 87 21:40:17 -0500 From: new@UDEL.EDU For what it is worth, the PINs for Mellon cards are not stored on the cards. I had both a checking and a savings account at Mellon. Several years after opening them, I closed the checking account but retained the savings account. All of a sudden, the card no longer worked. I visited a branch office in person to find out what happened. It seems that when the checking account closed, the first digit of the PIN number changed. The clerk implied that I had simply forgotten what the number was, but this was not the case; I had been using the number for years. I suspect that the data entry person who closed the account bumped the wrong key on the screen form, accidently changing the PIN field. I never followed it further. However, since the card was never out of my possession, I know that the PIN is not on the card. With regard to Otto Makela's "Your bank's computer is down" message appearing after entering the PIN: I suspect that all of your information is gathered before any connection to your bank is attempted. This prevents tying up the lines during "think time". I think the X.25 standards even include a special kind of "open connection" packet, whereby an encrypted batch of data is sent off and a yes/no reply comes back without any true "connection" ever being established. Of course, this does not invalidate any of his points, nor does it imply that other countries or banks follow the same protocols as Mellon Bank, USA. Darren New [For the record, there were somewhat overlapping messages from John McLeod, Robert Stroud, B.J. Herbison, and Peter da Silva. PGN] ------------------------------ Date: Wed, 9 Dec 87 10:28:01 EST From: dvk@SEI.CMU.EDU To: risks@csl.sri.com Subject: Control-tower fires Control-tower fire - a nightmare that wasn't... I was flying out of Cairo airport in 1982 or so, and the night before they had had a control tower fire. The immediately visible ramifications of this were that none of the terminal monitors (the flip chart kind you see in European train stations) were working, and the gate agents reported delays on almost every outbound flight (I am not sure about inbound flights - I got to the airport at 6:45am (for a 10:30am flight) so there was not much inbound). Cairo International is a fairly busy aiport, yet most of the flights were departing within an hour of scheduled departure time (i.e., they were "on time" for Cairo). The reason for this is that they had ATCs in the burned out shell of the control tower visually sighting aircraft on the ground (and possibly in the air), communicating via walkie-talkies to the aircraft and to ground based directors who literally waved the planes onto the runways. Basically, everything worked. Why? Because the airport was able to shift into a manual mode of operation when the tower (and computers?) were down. There were no super failsafes to get in the way. Now, I am not advocating the removal of failsafes. What I am suggesting is that our current failsafes be made a little less restrictive. In Chuck Weinstock's post about O'Hare, the aircraft had trouble getting fuel because of safety interlocks, even when technicians *knew* there was no danger to the fuel feed. In Cairo, the whole system was toasted, but it kept running. Granted, there are differences, but there are also lessons to be learned here. Failsafes whould keep you from making stupid mistakes, but not prevent you from making intelligent decisions. ------------------------------ Date: Tue, 8 Dec 87 11:39:44 pst From: ucbcad!ames.UUCP!uw-beaver!ssc-vax!eder@ucbvax.Berkeley.EDU (Dani Eder) To: uw-beaver!KL.SRI.COM!RISKS Subject: Loss-of-orbiter (Re: RISKS DIGEST 5.70) Reliability work done here at Boeing (as part of the Advanced Launch System program) predicts the loss-of-orbiter rate to be 1 in 60 launches AFTER the fixes in progress are completed. The loss of crew rate is somewhat lower, since there are accidents where you render an Orbiter unuseable, but do not kill the crew. For example, landing hard can stress the structure enough that it would be unsafe to ever fly again, but with no visible damage occurring. What our reliability work indicates, is that adoption of airplane-like design rules: such as ability to fly a mission with a single engine failure, all engines running before launch, double and triple redundant flight control systems, and powered (jet engine) return to a runway for the booster stage, should bring the loss-of-payload for a next generation rocket to 1 in 5000 flights. The lesson we learned from the commercial airplane side of the company is: use improved technology (such as lighter structural materials and smaller electronics) to get better reliability rather than a few more pounds of performance. Your hardware will last longer, and costs will come down more that way. Dani Eder/Boeing/Advanced Space Transportation ------------------------------ Date: Sun, 13 Dec 87 05:13:40 PST From: hoptoad.UUCP!gnu@cgl.ucsf.edu (John Gilmore) To: RISKS@KL.SRI.COM Subject: Re: EEC Product Liability > For imported goods, the original importer into the EEC is liable. I am curious how long the US->European email/netnews gateway at mcvax will last after its first suit under this Directive. Plenty of buggy PD and redistributable software enters the EEC this way; in fact, it may be the largest single channel for import of software. > It is expected that the Act will greatly increase the adoption of software > Quality Assurance (to conform to ISO standard ISO 9001) and the use of > mathematically rigorous specification and development methods (VDM, Z etc). Note that this is posted by someone who makes his living selling such products (at Praxis). I would say "caveat emptor" but clearly in Europe this no longer applies. It might be fun for someone to sue Praxis for bugs in their product, especially bugs that result in delivered systems with undiagnosed failures which later cause suits. Does Lloyd's of London sell "bug insurance"? ------------------------------ From: hplabs!motsj1!motbos!mcdham!carl@ucbvax.Berkeley.EDU Date: Sat, 12 Dec 87 10:17:19 PST Apparently-To: ucbvax!CSL.SRI.COM!RISKS Subject: The Presidential "Football"... I am looking for information related to the "Black Box" that is supposedly near the President at all times. This box is reportedly the control center from which the President can authorize a nuclear launch. I have heard it referred to as "The Football". Can anyone tell me anything about it? Even folklore is acceptable. Are there any texts with this information in it? Whatever you could let me know would be a help. I am writing a fictional account of a Nuclear War and need the inforrmation to complete the work. Thanks in advance for your help. Carl Schlachte [Folklore may be OK for Carl, but please provide him with folklore privately, and keep RISKS messages factual. PGN] ------------------------------ Date: Thu, 10 Dec 87 15:51:10 EST From: ndq@h.cc.purdue.edu (Jon Eric Strayer) To: risks@kl.sri.com Subject: Radar's Growing Vulnerability >From: Peter G. Neumann ... (RISKS readers will recall that the British investigation concluded that the Sheffield's own radars were jammed by a communication back to London that was being held at the time.) While there are anti-radiation missiles, the Exocet that hit the Sheffield was not one of them. I also have serious doubts that the Sheffield's radars were "jammed" by a communication transmitter. I understand that the radars (and ESM/ECM equipment) were shut off because they jammed the comm equipment. [Yes, that was one report. Sorry I turned it around. PGN] ------------------------------ End of RISKS-FORUM Digest ************************