Friday, February 19, 2010

Code of Ages

Software lifetimes range from minutes (s/LastCompanyName/NewCompanyName/g)to a few years. Well, that's what it's designed for, but often it's longer (remember Y2K?). Given the rate at which bugs a put into and taken out of software, it's hard to imagine anything staying up for a century - what would you do when the original programming team have all died and the programming language has no modern architecture compiler support. Think BBC Micro - it is still in use, but to fix a system using it means either buying a manual off ebay or paying a grey-haired chap to come out of retirement.

So how would we design a system that could work for centuries? For me, the first part of the answer is interfaces. If we carefully specified the interfaces between chunks of the system, or between the machine and the outside world, then we could simply replace the problematic part. Generally, as architectures change and paradigms come and go, it is easier to rewrite a module of code than to go hunting through the source code to fix it. We could also rewrite a system module-by-module, without having to rewrite the entire thing. The more levels of interfaces you have, the more choice you have later on about how much of the old system you replace.

Secondly, if you want a system to run for centuries, I'm reminded of the apocryphal 60-year old broom; it's had 3 new heads and 7 new handles, but it's the same broom. Similarly, a long term system is likely to be pretty large and unweildy, so you wouldn't want to have an enormous rewrite/replace project and a scary switchover where it all falls apart. No, if the system needs to start supporting the new cryogenics hardware it needs to have just that part of the system replaced.

As far as I can see, the web may have one of the best answers. With its clearly defined protocols and request-response system, it is possible to switch hardware, software, networks, IP addresses etc without the user noticing. Just start routing the requests to the new server, and then decomission the old one.

So those enormous, intercontinental, centuries-long up times will one day be possible, but only if we design them to be so. Like the broom that never dies, it has to take less time to fix a given problem than you get between the warning signs and the event itself.

No comments: