Joe Wilcox posed an interesting question today over at Microsoft Monitor.
After stating that he had suffered from a number of problems he stated this:
There is a strange lesson here. Nearly every week, I talk to technology managers resistant to change. They would rather run that aging Windows NT 4 server until it breaks, because computing changes, they say, create unforeseen problems and so more work. I didn’t realize that my Comcast broadband account was broken, because I had service. The attempt to fix the problem, so that I could upgrade to a higher-level service, eventually really broke the account. Suddenly, some IT manager’s “no changes” policies make lots more sense.
In my many years experience of running complex corporate systems I have some sympathies with this point-of-view, but it would not be my preferred way of doing it.
The ‘no change’ policy works for simple single instance systems with few dependencies, it most certainly does not for systems which have dependencies. And there in lies a bit of a dilemma, because it’s actually the simple systems which are easy to update, it’s the complicated ones which aren’t.
I have regularly been in a situation where people have left a system without any maintenance (on the basis stated above) and then they have run into a problem. resolving that problem takes an incredible amount of time because you get into loops and cycles with the vendors.
This cycle sometimes due to the situation requiring an upgrade to something like the driver on a card in a server in order to fix one problem, resulting in a chain reaction of events requiring us to upgrade everything. All of the upgrades that chained on from the initial upgrade were already known about, they just hadn’t been done. So rather than trying to assess the impact of one change you are left assessing the impact of 20, 40, 60, 80 changes. I’m actually in the middle of one now where in order to update the firmware on one piece of equipment we need to update the drivers on more than 30 servers.
More regularly the support cycle is severely elongated because vendors who provide the ‘expert’ support work on single tracks. Once they find a problem they insist on that problem being fixed before they will go any further. Often this problem isn’t even related to the problem that is being experienced. Of course as soon as that one is fixed there is another problem found, and on it goes. This iterative cycle just takes forever. And of course the greater the number of changes the less confidence you have that you haven’t actually caused the problem by making the change. The longer the system has been running, the longer it takes to resolve the problem.
I much prefer the situation where a system is maintained to a reasonable level in a consistent incremental way. This way, when a problem does occur, it can be fixed in a reasonable time-frame.