Sunday, September 7, 2014

For Lack of Configuration Control

DevOps is a software development methodology that calls for the close integration of software development teams and the operations teams that run their developed software in production. One of software development's dirty secrets is "The Wall", as in "throw it over the wall", the idea that you can develop software without considering operations, and when it doesn't work in production, you can throw it back on development for them to fix. DevOps calls for, among other things, simple, repeatable processes; quality testing; small, frequent releases; and, above all, close collaboration, if not integration, between the development and operations teams. In short, it attempts to change the process of developing software and putting it into production from an artistic to an engineered process. Because DevOps attempts to add engineering discipline to software development, DevOps personnel can learn a great deal from the kinds of problems engineers in other disciplines have encountered, and solved. A Fiery Peace In A Cold War is Neil Sheehan's biography of General Bernard Schriever, the USAF Officer that lead the US ICBM development effort in the 1950s and 1960s. He was appointed to lead Western Development Division, which managed the development of the Thor, Atlas, Titan and Minuteman missile systems. Nominally a native Texan, he was promoted to General in 1961 and commanded Air Force System Command, managing all USAF weapons systems and approximately 40% of the Air Force budget. After the launch of Sputnik in 1959, the US believed itself to be in a "missile gap", behind the Soviet Union. The Air Force's first ICBM, Atlas, was rushed into early deployment, but had major bugs. Most of the bugs were not in the missile system, but in the support systems, for example, the liquid oxygen (LoX) fueling system. One Atlas exploded on the pad during a fueling exercise. Earlier, a Thor missile had exploded four seconds into flight. Subsequent analysis showed that the fueling crew had allowed the LoX hose to be dragged through the sand on the way to the pad, leading to sand contamination of the LoX lines. After instituting a configuration control system, it was discovered that the missile supplier, Convair, was modifying parts at the missile test site in order to get missiles to fly without notifying the assembly line and without making any records of the changes that they were making. The configuration of a successful launch could not be duplicated; the Atlas program was getting random success, as well as random failure. To prevent uncontrolled and undocumented changes, seals and locks were placed on missile compartments and launch equipment cabinets, so that changes would only be made after review by a configuration control board. In my 28 years in IT, I can remember many instances when the looming deadline justified a quick fix and a test. I can also remember the instances where the quick fixes lead me to a point where I could not return to a known, good state. Automation and methodology are the engineering tools necessary to prevent those hubris-induced states that we can get ourselves in by believing we can meet the crisis with our artistic rather than engineering skills.

No comments:

Post a Comment

I welcome your helpful comments, but please remember these are just random musings on life, not life philosophy. YMMV!