Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Feynman's actually observations are well worth reading for anyone who builds anything that may be vaguely considered engineering.

http://science.ksc.nasa.gov/shuttle/missions/51-l/docs/roger...

The key items for me were:

1) While they had no expectation of erosion, and the design did not call for the o-rings to erode, once they observed them eroding, they retroactively invented a "margin of error" based on what fraction the o-rings eroded. This was not based on an actual understood process, and is akin to saying "well, the bridge didn't break when we drove that truck over it, so it must be okay"

2) The engineers actually knew the risk (~1% chance of loss per launch, not specific to the o-rings, compared with two actual losses of the shuttle over ~130 missions). Management used entirely invented numbers for the risk which were not justified.



Your paraphrasing of Feynman's bridge quote is inaccurate. From Appendix F[1] of the report:

    [..]  In spite of these variations from case to case, officials behaved as
    if they understood it, giving apparently logical arguments to each
    other often depending on the "success" of previous flights. For
    example. in determining if flight 51-L was safe to fly in the face of
    ring erosion in flight 51-C, it was noted that the erosion depth was
    only one-third of the radius. It had been noted in an experiment
    cutting the ring that cutting it as deep as one radius was necessary
    before the ring failed. Instead of being very concerned that
    variations of poorly understood conditions might reasonably create a
    deeper erosion this time, it was asserted, there was "a safety factor
    of three." This is a strange use of the engineer's term ,"safety
    factor." If a bridge is built to withstand a certain load without the
    beams permanently deforming, cracking, or breaking, it may be designed
    for the materials used to actually stand up under three times the
    load. This "safety factor" is to allow for uncertain excesses of load,
    or unknown extra loads, or weaknesses in the material that might have
    unexpected flaws, etc. If now the expected load comes on to the new
    bridge and a crack appears in a beam, this is a failure of the
    design. There was no safety factor at all; even though the bridge did
    not actually collapse because the crack went only one-third of the way
    through the beam. The O-rings of the Solid Rocket Boosters were not
    designed to erode. Erosion was a clue that something was wrong.
    Erosion was not something from which safety can be inferred.
His point about NASA's nonsensical use of the "safety factor" is not that you could drive over a bridge, and look, it didn't break, so it must be OK!

It's even worse, you drive a truck over it, afterwards 1/3 of the steel is cracked, so you conclude that it must be able to safely accept 3x the weight. Nonesense! This is the sort of moronic engineering that killed the crew of the Challenger.

1. http://science.ksc.nasa.gov/shuttle/missions/51-l/docs/roger...


> This is the sort of moronic engineering that killed the crew of the Challenger.

Note that it wasn't even engineering, reading the story in full (it's great, and covers the software side for which Feynman had nothing but praise) Feynman repeatedly noted that engineers were fairly realistic[0] and had been ringing alarms pretty much all along, this was entirely manglement mangling.

[0] unless the spectre of manglement was involved, at least for some of them


It's been some time but I've read the entire report cover-to-cover. While yes, the general conclusion is that NASA's dysfunctional management structure and institutional optimism driven by moneyed interests is the primary culprit. The report never really tries to perform a root cause analysis on how something like the O-ring "safety factor" problem arose.

Something which, as summarized by Feynman's quote above, should be patently obvious to any engineer as bullshit.

Feynman's appendix is the only part that even tries, but it doesn't go far enough through no fault of Feynman's, he had no resources to pursue this line of inquiry. It was a struggle just to get that appendix into the report.

They should have interviewed every single person in any way remotely involved in that O-ring decision, find out if they objected to it, and if they didn't what money/institutional/social obstacles there were to prevent that.

Did some engineer actually sign off on the aforementioned "safety factor"? We don't know, but somehow I doubt that's language management came up with on their own, and if they did that there was no way for an engineer to spot that and report "wtf? The system doesn't work like that!".

Reading between the lines some engineer actually did come up with that estimate, but likely that engineer was where he was because NASA had a culture of promoting mindless yes-men.


> It's been some time but I've read the entire report cover-to-cover.

I meant Feynman's later recounting of the whole affair (in "What do you care what other people think"), rather than just the report.

> Did some engineer actually sign off on the aforementioned "safety factor"? We don't know, but somehow I doubt that's language management came up with on their own

That doesn't mean they were fed that by an engineer, only that they'd encountered the term before.

> and if they did that there was no way for an engineer to spot that and report "wtf? The system doesn't work like that!".

And then what? Upper-management uses "safety factor" in a completely bullshit manner, and engineer spots that (because they're masochistic and read management reports?), tells their direct manager it's inane, and then what, you think it's going to go up the chain to upper-management which will fix the issue? Because IIRC (I don't have my copy of What Do You Care on me so I can't check) Feynman noted that engineering systematically got lost somewhere along management ladder as one middle-manager decided not to bother their manager with a mere engineer (or worse, technician!)'s concern or suggestions.

> Reading between the lines some engineer actually did come up with that estimate, but likely that engineer was where he was because NASA had a culture of promoting mindless yes-men.

That's really not what I read behind the lines considering engineers had failure estimates in the % range and management had estimates in the per-hundred-thousand range.


> I meant Feynman's later recounting of the whole affair

I've read that too. You're dangerously close to getting me to re-read everything Feynman's written, again. I don't know whether to curse you or thank you :)

> And then what? [...]

I feel we're in violent agreement as to what the actual problem at NASA was, yes, I'm under no illusion that if some engineer had raised these issues it would have gone well for him. This is made clear in the opening words of Feynman's analysis,:

    [...] It appears that there are enormous differences of opinion as to the
    probability of a failure with loss of vehicle and of human life. The
    estimates range from roughly 1 in 100 to 1 in 100,000. The higher
    figures come from the working engineers, and the very low figures from
    management. What are the causes and consequences of this lack of
    agreement? Since 1 part in 100,000 would imply that one could put a
    Shuttle up each day for 300 years expecting to lose only one, we could
    properly ask "What is the cause of management's fantastic faith in the
    machinery?"
I'm pointing out, not to disagree with you, but just to use your comment as a springboard, that to an outside observer this whole process led to some "moronic engineering". Engineering is the sum of the actual construction & design process and the management structure around it.

The real flaw in the report is that it didn't explore how that came to be institutional practice at NASA, Feynman is the only one who tried.

> That's really not what I read behind the lines.

Regardless of what sort of dysfunctional management practices there were at NASA they couldn't have launched the thing without their engineers. If they were truly of the opinion that shuttle reliability was 3 orders of magnitude less than what management thought, perhaps they should have refused to work on it until that death machine was grounded pending review.

Of course that wouldn't have been easy, but it's our responsibility as engineers to consider those sorts of options in the face of dysfunctional management, especially when lives are on the line.


I think the engineers (and astronauts) accepted 1 in 100 odds of failure as a price they were willing to accept to be part of the project. That is not a "death machine", just a risky and exciting one. For comparison, that risk is equivalent to working 5 years in a coal mine in the 1960's. https://www.aei.org/publication/chart-of-the-day-coal-mining...


Yes, which is fair enough, and personally I think that's fine. With odds like that you'll still get people to sign up as astronauts, and it'll be easier to advance the science. In the grand scheme of things it's silly to worry about those deaths and not say death from traffic accidents.

The real issue was that that's not how NASA presented it outwardly. I doubt that teacher that blew up with Challenger was told about her odds of surviving in those terms.

As human launch vehicles go I think the shuttle's reliability was fine. The reason I called it a death machine is that if you make a vehicle that explodes 1% of the time you better advertise that pretty thoroughly before people step on board. NASA didn't.


There's a lot of blame that (deservedly) gets pinned on the NASA administrators, but this fails to ask the really important question -- what sort of political and other pressures were put on the administrators such that they felt compelled to make shit up?

Seems like it was politically impossible for NASA to say outright that there was a 1% chance of failure for every launch -- it would have led to loss of public support for the Shuttle program. So we have a systems failure where both politicians and the public contributed by making NASA admins feel compelled to lie and cover up to make the launches work.

I mean, the courageous thing to do would be to stand up and say space travel is inherently risky, people might die, but it's still worth it. But courageous politicians regularly get voted out of office, and I would bet a courageous NASA admin who said that would end up fired.


You're absolutely right, but after so much success there also seems to be a fair amount of confidence/arrogance that set in at NASA regardless of other pressures.

Additionally, when you have a civilian on board, I think it really changes how you think about what an appropriate level of risk is (13% might have been ok with professional astronauts who knew the risks beforehand, but likely was too high for a civilian).

And the line engineers at Morton Thiokol fought back pretty hard on the decision, even if it might have impacted their careers negatively.


At that point in space exploration, 13% was way too high a risk, even for professional astronauts.


Since 1 part in 100,000 would imply that one could put a Shuttle up each day for 300 years expecting to lose only one, we could properly ask "What is the cause of management's fantastic faith in the machinery?"

I wonder if you told the management that, in those words, if they would have still believed such a ludicrous idea.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: