Return to Tier III 415A
Normal Accidents: Living with High-Risk Technologies, by Charles Perrow, Basic Books, NY, 1984.
This book review has been prepared for Tier III 415 A, "Entropy and Human Activity." It includes much of the material presented in class and further commentary.
A normal accident occurs in a complex system, one that has so many parts that it is likely that something is wrong with more than one of them at any given time. A well-designed complex system will include redundancy, so that each fault by itself does not prevent proper operation. However, unexpected interactions, especially with tight coupling, may lead to system failure.
System operators must make decisions, even with ambiguous information. The process of making a tentative choice also creates a mental model of the situation. When following through on the initial choice, the visible results are compared to those expected on the basis of that initial mental model. Provided that the first few steps' results are consistent, the fact that the mental model was tentative is likely to be forgotten, even if later results contradict it. They become "mysterious" or "incomprehensible" rather than functioning as clues to the falsity of the earlier tentative choice. This is simply the way the human mind works, and systems designed with contrary expectations of their operators are especially vulnerable to system accidents.
The primary cooling system is a high-pressure system using water to extract heat from the nuclear core. This heated water (so hot that it would be steam at atmospheric pressure) circulates to a heat exchanger (like a radiator) that turns the water in the secondary cooling system to steam. The secondary system is also pressurized, but at a lower pressure.
The water in the primary system contains many radioactive nuclei, fission products and neutron-activation products (including tritium, produced when one hydrogen atom absorbs first one and then another neutron). The water in the secondary system is not radioactive, but its chemical and mechanical purity is critical, because the high pressure, high temperature steam it turns into will be sprayed directly against the precisely machined turbine blades, to drive the turbine that turns the electrical generator. The "condensate polisher system" is responsible for removing impurities from the secondary coolant.
The accident started when two secondary cooling system pumps stopped operating (probably because of a false control signal that was caused by leakage of perhaps a cup of water from seals in the condensate polisher system). With no secondary cooling system flow, the turbines shut down, and heat stopped being removed from the primary system.
The emergency feedwater pumps activated, to remove the heat that was building up in the reactor, now that the secondary cooling system was no longer removing it through the turbines. These pumps circulate water in the secondary cooling system, which boils off because the energy is not removed by the turbine, and draw in replacement water from the emergency water storage tank.
Both emergency feedwater pumps were operating against closed valves: they had been closed during maintenance two days earlier. The operator did verify that the pumps were operating, but did not know that they were accomplishing nothing because of the closed valves. One of the two indicator lights on the control panel that might have alerted them to the valves being closed was obscured by a repair tag hanging on the switch above it. It was only eight minutes later that this problem was discovered.
With no secondary circulation, no more heat was being removed from the reactor core, its temperature started to rise, and the automatic emergency shutdown procedure, known as a "scram," was started. This involves the rapid insertion of control rods whose composition includes a large percentage of neutron-absorbing materials. This absorbs most of the fission neutrons before they have a chance to initiate a new fission event, stopping the chain reaction. It does not immediately stop the release of heat in the reactor core. Because many of the fission products are unstable nuclei, with half-lives ranging from fractions of a second to days, heat continues to be released in a nuclear reactor core for quite some time after the chain reaction itself is stopped by the scram. Because no heat was being removed through the secondary coolant system, temperatures and therefore also pressures rose within the core and primary coolant system.
The PORV is designed to valve off enough coolant from the primary system to keep pressures at safe levels. It initially opened because of the pressure rise resulting from the cooling failure. It was instructed to close, to keep the bubbles squeezed small. It did not close, and therefore the radioactive primary coolant continued to be drained into the sump and the bubbles in the core grew larger and larger as coolant turned to steam at the reduced pressures. Steam is much less effective at conducting heat away from the reactor fuel rods, so their temperatures rose even faster, reaching values that permitted them to resume fissioning.
As soon as the pressure had been adequately reduced, a signal was sent to the PORV to close again. The control panel included an indicator light that showed that this signal had been sent. Unfortunately, despite the indicator light showing that the valve was being told to close, it did not in fact close. The primary cooling system stayed open for 140 minutes, venting 32,000 gallons, one third of the core capacity, and keeping the pressure in the core at a much lower level than it would have been with the PORV properly seated.
All four of these failures took place within the first thirteen seconds, and none of them are things the operators could have been reasonably expected to be aware of.
The High Pressure Injection (HPI) pumps activated (one automatically, one by operator intervention) to flood the core with cold water. Reactor containment vessels are made of steel, and in operation are exposed to large amount of radiation, especially neutrons. The steel becomes brittle with age, prone to shattering. The shock of HPI operation with its cold water is a risk to be traded off against the risk of letting the core heat up.
To reduce the risks of high pressure operation, and in particular, the risks of hydraulic shock waves traveling through the plumbing, the reactor is designed with a large surge tank. This tank, known as the "pressurizer," normally has its bottom half filled with water and its top half with steam. Heating or cooling the steam permits the pressure in the reactor core and primary cooling system to be controlled. Compression of the steam absorbs any hydraulic shocks that reach the pressurizer. If the pressurizer fills up with water, as it will if the steam all condenses, then hydraulic shock waves moving through the plumbing (caused, for example, by opening or closing of valves, or the starting or stopping of pumps) will not be absorbed, and may therefore cause pipes or pipe-joints to break.
In order to prevent this well-recognized risk to a safety system, the operators followed standard procedures and reduced High Pressure Injection when the pressurizer pressure indicator rose toward the point that would indicate that it was about to fill.
The reduced pressure also caused cavitation in the reactor coolant pumps, which could erode the moving metal parts of the pump, distributing fragments throughout the coolant (where they would destroy other pumps and valves in the system), so the reactor coolant pumps had to be shut down, further reducing coolant flow.
In the reactor core, this hydrogen is inert, because there is no oxygen available to burn with it. During a loss of coolant incident, however, it is likely that some of the hydrogen will be carried out along with the coolant. It will then collect in the containment building, where it will have oxygen available. Since the building containing the reactor is full of large-scale electrically powered equipment (motor-driven pumps, etc.) it is only a matter of time until the hydrogen-air mixture is ignited by a stray spark. [Reviewer's note: So far as I know, it is not a standard precaution to continually have multiple open flames in the containment building to make sure that any hydrogen-air ignition occurs before too much hydrogen has accumulated, so that the resulting explosion is gentle.]
At TMI, the hydrogen-air explosion took place 33 hours into the accident and produced an over-pressure fully half of the design strength of the building! The danger from hydrogen-air explosions also includes the fact that they are likely to spread debris and shrapnel throughout the building, possibly cutting control or instrumentation wiring and breaking holes in cooling water or compressed-air control lines.
There are a number of factors that limit the enhancement of intrinsic safety, however. The primary limiting factor is the basic transformation process that is involved in using fission chain reaction for large-scale power production. The fission products decay with a timetable that is essentially identical for all power reactors. Even after a successful "scram" shuts down the chain reaction, and even assuming that the design and the nature of the accident are such that no part of the fuel resumes chain reacting, an enormous amount of heat must be removed from the core for an extended period of time (days, not hours or minutes).
The theoretical availability of safer designs is in many ways a moot point, because (for the reasons explored in Cohen's book) it is unlikely that any significant number of new power reactors will be built, at least in the U.S., for a great many years to come.
In 1971, near Charlevoix, MI, a B-52 crashed while headed directly toward a nuclear power installation on the shores of Lake Michigan, impacting about two miles from the reactor. (The B-52 is a subsonic bomber, so using an estimated speed of 600 miles per hour, this is twelve seconds of flight time, not the "two seconds" stated by Perrow [p. 41], but it was still a very close call!)
The containment building is usually constructed of reinforced concrete. If the concrete is poured too rapidly, bubbles will not have time to collapse, resulting in "voids" and thereby reducing the strength compared to that designed.
Remember the Entropy Law: any heat engine will transform only part of the heat it extracts from the high temperature reservoir (the reactor core) into useful mechanical work (driving the turbine so that its shaft will turn the electrical generator). The rest of the heat must be delivered to a lower temperature reservoir, the surrounding environment.
In particular, the only efficient methods known for cooling to remove the rejected heat of a large power installation, whether nuclear or fossil fuel powered, require a lot of water for the cooling towers, and therefore require the installations to be located near large rivers, lakes, or oceans, places where people naturally congregate. This makes it even more difficult to locate nuclear power installations at great distances from population centers.
Finally, there is the further limitation that the ideal locations are downwind of all nearby population centers. With the prevailing winds in the United States generally from the West to the East, radioactive debris released by a major accident anywhere in the interior or on the West coast would be likely to land on one city or another. The East coast shoreline includes locations that would be downwind of major cities much of the year, but those locations would be vulnerable to hurricanes in the short term and beach erosion in the long term, and the cooling water would be much more corrosive than the fresh water of lakes and rivers. Furthermore, the mountains and the various weather systems from the arctic and from the Gulf of Mexico interact to make every direction of air flow occur part of the time at any location in North America. There is no location that can be predicted to be always downwind of nearby population centers.
The point of these is simply that nuclear power is a human endeavor, and when people operate on an industrial scale, things go wrong from time to time. When the industry in question is nuclear power, the possibility for catastrophic consequences is more obvious. Other large scale transformation processes do provide similar levels of risk (petrochemical plants, for one example).
The Fermi reactor was a breeder reactor, designed to produce fissionable plutonium from the U-238 in the fuel. It was designed to operate at such high temperatures that the coolant was liquid (molten) sodium metal. Four of the fuel subassemblies were damaged, two had welded themselves together.
The cause of the accident was eventually identified as a piece of zirconium sheet metal that had been added as a "safety device" on the insistence of the Advisory Reactor Safety Committee. It did not appear on the blueprints. After it was ripped loose by the flow of the liquid sodium coolant, it was moved to a position where it blocked the flow of coolant, permitting part of the core to overheat and melt the uranium fuel elements.
Perrow highlights these points about the Fermi accident:
Mining is routinely dangerous, and uranium mining has both mechanical and radiation hazards. The various processing and re-processing steps are transformation processes performed on dangerous materials. Like many chemical processing plants, these activities are prone to system accidents.
An incident may require shutdown or reduced output operation of the system as a whole. The distinction is that an accident will involve failure or damage at the subsystem or system level. Design safety features (e.g., redundant pumps or valves) are often found at the boundary between units and subsystems, and consequently failures or unanticipated behaviors of such safety features will often play a significant role in whether an event is an incident or an accident, as Perrow uses those terms.
Perrow divides the victims into four groups:
"System accidents involve the unanticipated interaction of multiple failures."
Both component failure accidents and system accidents start with the failure of a part. System accidents are characterized by the progression of the accident involving multiple failures and those failures interacting in ways that are not anticipated by or comprehensible to the designers and properly trained operators.
Perrow also excludes from his analysis what he calls "final accidents," such as a wing falling off an airplane in flight or an earthquake shattering a dam: they are not interesting from an analytical point of view because there is nothing that the operator can do to influence the course of events.
In the nuclear power industry, roughly 3,000 "Licensee Event Reports" are filed each year. Perrow estimates that 90% of these are incidents, only 10% are accidents. Of the accidents, Perrow estimates that perhaps 5% or 10% are system accidents. So far as we know, all of the accidents in U.S. nuclear power plants have had only first-party victims, and very few of those.
One source of complex or non-linear interactions occurs when a unit links one or more subsystems. Failure of a heat-exchanger that removes heat from one subsystem and transfers it to another will disrupt both subsystems simultaneously. These "common-mode" failures are intrinsically more difficult for operators to cope with, because they will be confronted with two intermingled sets of symptoms. System designs that include such features are routinely more efficient (e.g., using waste heat from one subsystem as input to another one, instead of burning extra fuel to provide the heat input).
A large class of safety features in many designs are specifically intended to reduce opportunities for common-mode failures. These devices themselves become sources of failures. Perrow cites the illustrative example of a check-valve designed to prevent back-flow between two tanks. Because the system normally operates with the first tank at a higher pressure, the check-valve spends most of its life passing flow in the intended direction. It may then not function when a pressure difference reverses the flow (because of debris blocking its motion, corrosion, or weakening of a spring held in a compressed position for too long), or once actuated, it may not release to permit the resumption of normal flow.
Unintended or unanticipated interactions may result from physical proximity. If two subsystems are located next to each other, they can both be rendered inoperable by the same explosion.
The complexity or non-linearity of the interactions has to do with their being a part of the normal operational or maintenance sequence, with their visibility, and with their comprehensibility. Interactions that are unusual, unexpected, hidden, or not immediately comprehensible deserve the description of "complex" or "non-linear." Interactions that are normal or that are visible (even if unplanned) should be described as "linear" or "simple" in this sense, even though they may involve many different parts or units.
Large control panels are difficult to design: their layout is a compromise between ease of assembly and ease of use. Ease of use issues include the functional grouping of indicators and controls (difficult to choose when subsystems may interact in many ways), the uniqueness or the uniformity of indicators and controls, the coding of indicators (valves may always be shown with one color for open, whether or not their normal condition is open, or they may be shown with one color for the normal condition, whether that is open or closed; shape, or angle, or color or all of those characteristics may convey information).
Complex non-linear systems are often made still more difficult to control because critical information cannot be directly obtained, but must instead be inferred from indirect measurements or observations. At TMI, for example, there was no direct measurement of the level of the coolant within the reactor.
Complex systems are characterized by:
For linear systems, tight coupling seems to be the most efficient arrangement: an assembly line, for example, must respond promptly to a breakdown or maladjustment at any stage, in order to prevent a long series of defective product.
All systems need to be able to survive part and unit failures, so that incidents to not become accidents. Loosely coupled systems have the advantage that not all of the mechanisms for such survival have to be planned ahead of time. In many cases, the designers of loosely coupled systems have more than used up their advantage by not designing in even quite obvious, simple, safety features. Designers of tightly coupled systems must invest a great deal of effort and ingenuity in anticipating failure modes and providing safety features to permit survival and prompt recovery with minimal impact.
Perrow's book was published in 1984. On December 3, 1984, a Union Carbide pesticide plant in Bhopal, India, suffered a catastrophic accident that released a cloud of toxic gas, causing 2,000 immediate deaths, perhaps 8,000 delayed fatalities, and several hundred thousand injuries. Additional information about the Bhopal accident is available on-line at
The nature of chemical plants, with large capital investments, elaborate structures with complicated plumbing, highly automated operations, etc., usually places a small number of skilled operators in a central control location. Most system accidents will therefore not create many first- or second-party victims. The catastrophic risk is to third- and fourth-party victims, if promptly toxic, carcinogenic, or mutagenic materials are dispersed beyond the plant boundaries.
As Perrow explains, the private and relatively unregulated nature of the industry limits the availability of information, both narrative and statistical, about chemical industry accidents. Oil refineries and ammonia plants are somewhat more thoroughly studied, and their experiences are not comforting: an average of more than one fire per year per plant.
Perrow describes briefly the Texas City, Texas, fetilizer explosion abourd ships in the harbor, in 1947, and the chemical plant explosion in 1969. The latter was a system accident with no fatalities or serious injuries.
Perrow describes the 1974 disaster at Flixborough, England, in a chemical plant that was manufacturing an ingredient for nylon. There were 28 immediate fatalities and over a hundred injuries. The situation illustrates what Perrow describes as "production pressure" -- the desire to sustain normal operations for as much of the time as possible, and to get back to normal operations as soon as possible after a disruption.
Should chemical plants be designed on the assumption that there will be fires? The classical example is the gunpowder mills in the first installations that the DuPont family built along the Brandywine River: they have very strongly built (still standing) masonry walls forming a wide "U" with the opening toward the river. The roof (sloping down from the tall back wall toward the river), and the front wall along the river, were built of thin wood. Thus, whenever the gunpowder exploded while being ground down from large lumps to the desired granularity, the debris was extinguished when it landed in the river water, and the masonry walls prevented the spread of fire or explosion damage to the adjacent mill buildings or to the finished product in storage sheds behind them. As Perrow points out, this approach is difficult to emulate on the scale of today's chemical industry plants and their proximity to metropolitan areas.
The practice of formal risk assessment (cost-benefit analysis) has developed into an elaborate process that includes mathematical models, numerical simulations, statistical analysis of the results of surveys of the opinions of large numbers of supposedly well-informed individuals, and so on. These trappings of "scientific" analysis can readily mislead the practitioners and the public into placing unwarranted faith in the accuracy of their results.
A classic case of unrealistic risk assessment comes from the early Space Shuttle program. As described by Richard Feynman in his book, What do you care what other people think?, the managers all thought the risk of a catastrophic failure during a shuttle mission was on the order of one in 100,000 or safer. The engineers and technicians who worked more closely with the equipment thought it was on the order of one in 100 or 1 in 200. The Challenger tragedy brought this discrepancy to light, but even then it might not have been made publicly obvious if Feynman's service on the investigating committee had followed the traditional bureaucratic approach.
As Perrow points out, risk assessment traditionally regards all lives as equivalent. It makes no difference in calculations of Loss of Life Expectancy whether 50 people die in traffic accidents in a state during a holiday week-end, or if 50 people die in a town of 100 downwind from a nuclear power plant catastrophe. Obviously the impact on the survivors is very different if their own community is essentially intact, rather than if it is devastated. In comparing the costs and benefits, risk assessors routinely gloss over the question of whose risks, and whose benefits.
Under what circumstances is it right for an automobile company, for example, to make choices that increase or decrease the risks for their customers, trading off profits and lives? Those choices must be made by someone. Could society be organized in such a way that they would be made by the customers, or by the government, instead of by the company?
As Perrow points out, "A technology that raises even unreasonable, mistaken fears is to be avoided because unreasonable fears are nevertheless real fears."
The study compared the responses of "experts" and the general public on these various criteria, and on the overall estimation of riskiness for each avtivity. Although the experts and the public agreed on the ranking of the activities for the characteristics listed above, the experts and the public disagreed on the overall riskiness of the activities. In general, the experts seemed to be responding to the actuarial numbers, but the general public was much more sensitive to issues of uncertainty, catastrophe, and impacts on future generations.
Is it reasonable to believe that nuclear power, for example, can be done safely? That is, can nuclear power plants be designed that are not complex, interactive, tightly-coupled systems? Could there be such a thing as a nuclear power plant that was not subject to "system accidents," in which incidents initiated by component failures or operator mistakes rapidly grow into sub-system or system failures? Perrow argues that it simply cannot be done. Not because nuclear power is especially scary, but because of its intrinsic characteristics.
It is not reasonable to expect any organization to meet both of these criteria. Therefore, one way or another, we must expect that all nuclear power plants will be "badly run."
Dick Piccard revised this file (http://oak.cats.ohiou.edu/~piccard/entropy/perrow.html) on March 7, 1999.
Please E-Mail comments or suggestions to "firstname.lastname@example.org".