The presentations from the Architecture Insight Conference are available here.
Since then Steve has commented.
Michael has also added another document.
The chasing of definitions can be a wonderful tool for procrastination and I’m in danger of doing just that so I’m not going to comment anymore. If I ever produce something as wonderful as Sir Christopher Wren (Architect) or as useful as Sir Joseph Bazalgette (Engineer) I’ll be more than happy.
“My definition of an expert in any field is a person who knows enough about what’s really going on to be scared.” P. J. Plauger, Computer Language, March 1983
In my previous post I talked to the problem of the complex infrastructure.
Is there anything going on in the industry to try and resolve these issues.
One of the first things that should be evidently clear is that this issue isn’t an issue for a single company and thankfully a number of companies are working together to resolve the issues (Microsoft, IBM, HP).
As with most technologies that are early in their development cycle many names are being used and there is no clear taxonomy yet. Most people seem to recognise the title of ‘Autonomic’ which was originally conceived by IBM (I think) but each vendor has their own initiative but are coming together under the ‘WS-DM’ banner also. The problem with the ‘Autonomic’ word is that is has another perfectly good use in biology. I’m not sure that WS-DM help either, as it links the issue to Web Services which is a bit limiting when the major elements are infrastructure and infrastructure does lots of things which aren’t really Web Services.
The basic concept is that a service and all of its elements can be described starting with the business requirements and working down into technical requirements, Microsoft call this the Service Definition Model. Each of the service elements are then told to follow the document, if the document updates they update, likewise changes made to the elements are assessed against the document and can only be applied if they don’t have an impact, they then update the document.
The technology has a long way to go, but the concept seems to work.
One of the questions that CFO did ask, one of the few repeatable ones, was this:
“What changed to cause the problem, it was working fine so it must have been caused by a change”.
It’s one of those questions that cuts through to the issue, “who knows” is the real response. There are lots of changes going on all the time, patches, fixes, configuration. If someone did change something how were they supposed to know it was impacting upon a service that was using the element of the infrastructure that they were changing.
Until we can answer this question categorically and precisely, and preferably with the answer “nothing that’s had an impact”, we haven’t finished.
Once upon a time a developer sat down and started work on an application he had been commissioned to write. He looked at the specification, it required screens which required inputs. So he started coding. He decided what the data file structure should look like and created it on the local computer, he decided what the screens should look and created them and he decided the logic and coded it. There was no network involved, there was no database involved, there was only going to be one client on one computer. The whole project required a team of one.
But that was then and this is now:
An organisation has a process they are wanting to automate, it’s a completely new process because they have already automated all of the others. A group of architects get together to try and understand the data inputs, the work-flow, the security requirements, the resilience requirements, the flexibility requirements, the extensibility requirement, the interfaces to other applications and the diverse client base. They decide that this process can be automated by linking together a number of existing systems and by using a browser based interface which will be developed using an off-the-shelf application. The people who require access to the interface include people from a variety of different partners and suppliers, each of them with a variety of browsers. In order for the process to complete a plethora of applications, databases, networks and servers are involved. This service makes this customer the bulk of their money, it is critical to them, the faster it runs the faster they get paid.
About 9 months after the service has been created it has a problem, it’s running slow and people are starting to notice. The CFO makes an angry call to the person responsible for the service.
“Where is the problem and why haven’t you fixed it?” is the question the CFO starts with.
“Well” says the service provider “this is the first I have heard of a problem, who contacted you?”.
“John from our delivery partner phoned me to say that they were only getting requests for deliveries after the due date and that it wasn’t their fault that parcels were being delivered late” the CFO retorts.
“I’ll get the team together to assess the likely cause” says the Service Provider a little sheepishly, though trying to sound bold and assertive.
“You have until 14:00” says the CFO.
Following the call the Service Provider gets together the team of people he regards as responsible for the technical elements of the service, they each involve others they think should also be involved, they each contact the vendors who deliver the software or hardware they are responsible for. They each take a look at their elements of the service.
At 14:00 the Service Provider has his follow-up call with the CFO and delivers this report:
“I have formed an investigative team to try and get to the root cause of this problem, I am still waiting for a representative from our local networks team, but I already have representatives from the internal SAP team, the Windows team, the SQL Server team, the firewall team, the wide area network team, the storage team, the backup team, the batch processing team, the CRM team, the Oracle team, the work-flow team, the identity management team, the desktop team, the AS400 team, the directory team, the email team and the UNIX team. I have also managed to free up one of the original architects to try and get his overall view of the issue. Furthermore a number of the vendors have offered their help.
They have each assessed their element of the service and can find no issues. Everything is working as they would expect it to work.”
I think you can imagine the CFO’s response.
This story is based on a caricature of a real situation I have personally been involved in, and I don’t believe it is at all overstated. The modern infrastructure and application mix is very complicated.
Current monitoring and management techniques don’t recognise the service in the same way as the person using it does, they recognise all of the elements, but don’t put it together as a whole.
Is anyone working on an answer?