Friday, February 19, 2010

Code of Ages

Software lifetimes range from minutes (s/LastCompanyName/NewCompanyName/g)to a few years. Well, that's what it's designed for, but often it's longer (remember Y2K?). Given the rate at which bugs a put into and taken out of software, it's hard to imagine anything staying up for a century - what would you do when the original programming team have all died and the programming language has no modern architecture compiler support. Think BBC Micro - it is still in use, but to fix a system using it means either buying a manual off ebay or paying a grey-haired chap to come out of retirement.

So how would we design a system that could work for centuries? For me, the first part of the answer is interfaces. If we carefully specified the interfaces between chunks of the system, or between the machine and the outside world, then we could simply replace the problematic part. Generally, as architectures change and paradigms come and go, it is easier to rewrite a module of code than to go hunting through the source code to fix it. We could also rewrite a system module-by-module, without having to rewrite the entire thing. The more levels of interfaces you have, the more choice you have later on about how much of the old system you replace.

Secondly, if you want a system to run for centuries, I'm reminded of the apocryphal 60-year old broom; it's had 3 new heads and 7 new handles, but it's the same broom. Similarly, a long term system is likely to be pretty large and unweildy, so you wouldn't want to have an enormous rewrite/replace project and a scary switchover where it all falls apart. No, if the system needs to start supporting the new cryogenics hardware it needs to have just that part of the system replaced.

As far as I can see, the web may have one of the best answers. With its clearly defined protocols and request-response system, it is possible to switch hardware, software, networks, IP addresses etc without the user noticing. Just start routing the requests to the new server, and then decomission the old one.

So those enormous, intercontinental, centuries-long up times will one day be possible, but only if we design them to be so. Like the broom that never dies, it has to take less time to fix a given problem than you get between the warning signs and the event itself.

Evolution as Nature's way of doing details

Programmers attempt to deal with complexity by getting the details of a system right. Largely programming is simply creating a system that gets all the details right. That's why we buy software packages - because creating software that gets the details correct is a long, slow and hard process. Test, fix, repeat. We try to make software that is as correct as we can make it within the constraints on resources and time.

Nature, however, comes at it from the opposite angle. In evolution, she takes the scatter-gun approach, like the Monte Carlo simulation method. How many legs are better when you live in trees? What kind of skin? Cold or warm-blooded? Nature is pure pragmatism - we can't guess this very well, so just make a load of similar animals, and see which ones survive. Mutation and cross-breeding combine to produce a never-ending stream of ever-differing versions of the same thing. The best ones survive and reproduce.

Imagine you are trying to make a repair system for a robot. The design route says you spend a lot of research money designing ever more sophisticated repair mechanisms, and pack in replacement and redundant parts. The human body just repairs itself automatically, all the time. Not by some clever design but by a process of elimination - knock out the blood platelet genes and you get haemophiliacs.

Nature has a few advantages over programmers, though. It has limitless resources and a slow-changing set of requirements; the competitors, predators, prey and environment of an organism change very slowly, and there's always more oxygen and more sunlight. Programmers, however, have to determine both the resource allocation and the tests of a system, so that what they give to the customer for a fight-to-the-death with the customer's expectations wins the first time.

Largely, however, programmers will run out of time shortly after working out what the real priorities are, and deliver something with a lower probability of survival. So, I suspect a good strategy when specifying software is to demand a minimum level of stuff by a short deadline, then just before it arrives extend it by an arbitrary amount.

One route that I'm waiting for unit testing to take is that of mutated software - if there is a series of automated tests that cover all the required functionality of the software, then why not leave a few machines churning out modified versions of the code and competing them against each other for survival? A part of the rational brain nags - won't this just create hideous amorphs that pass some unit tests and fail as software? But that would imply that your unit tests lack coverage, no?