Known Error Database (KEDB)-III

ajaym259General, ResourcesLeave a Comment

The KEDB implementation

Technically when we talk about the KEDB we are really talking about the Problem Management database rather than a completely separate store of data. At least a decent implementation would have it setup that way.

There is a one-to-one mapping between Known Error and Problem so it makes sense that your standard data representation of a Problem (with its number, assignment data, work notes etc) also holds the data you need for the KEDB.

It isn’t incorrect to implement this in a different way – storing the Problems and Known Errors in seperate locations, but my own preference is to keep it all together.

Known Error and Workaround are both attributes of a Problem

Is the KEDB the same as the Knowledge Base?

This is a common question. There are a lot of similarities between Known Errors and Knowledge articles.

I would argue that although your implementation of the KEDB might store its data in the Knowledgebase they are separate entities.

Consider the lifecycle of a Problem, and therefore the Known Error which is, after all, just an attribute of that Problem record.

The Problem should be closed when it has been removed from the system and can no longer affect users or be the cause of Incidents. At this stage we could retire the Known Error and Workaround as they are no longer useful – although we would want to keep them for reporting so perhaps we wouldn’t delete them.

Knowledgebase articles have a more permanent use. Although they too might be retired, if they refer to an application due to be decommissioned, they don’t have the same lifecycle as a Known Error record.

Knowledge articles refer to how systems should work or provide training for users of the system. Known Errors document conditions that are unexpected.

There is benefit in using the Knowledgebase as a repository for Known Error articles however. Giving Incident owners a single place to search for both Knowledge and Known Errors is a nice feature of your implementation and typically your Knowledge tools will have nice authoring, linking and commenting capabilities.

What if there is no Workaround

Sometimes there just won’t be a suitable Workaround to provide to customers.

I would use an example of a power outage to provide a simple illustration. With power disrupted to a location you could imagine that there would be disruption to services with no easy workaround.

It is perhaps a lazy example as it doesn’t allow for many nuances. Having power is a normally a binary state – you either have adequate power or not.

A better and more topical example can be found in the Cloud. As organisations take advantage of the resource charging model of the Cloud they also outsource control.

If you rely on a Cloud SaaS provider for your email and they suffer an outage you can imagine that your Servicedesk will take a lot of calls. However there might not be a Workaround you can offer until your provider restores service.

Another example would be the February 29th Microsoft Azure outage. I’m sure a lot of customers experienced a Problem using many different definitions of the word but didn’t have a viable alternative for their users.

In this case there is still value to be found in the Known Error Database. If there really is no known workaround it is still worth publishing to the KEDB.

Firstly to aid in associating new Incidents to the Problem (using the Known Error as a search key) and to stop engineers in wasting time in searching for an answer that doesn’t exist.

You could also avoid engineers trying to implement potentially damaging workarounds by publishing the fact that the correct action to take is to wait for the root cause of the Problem to be resolved.

Lastly with a lot of Problems in our system we might struggle to prioritise our backlog. Having the Known Error published to help routing new Incidents to the right Problem will bring the benefit of being able to prioritise your most impactful issues.

A users Known Error profile

With a populated KEDB we now have a good understanding of the possible causes of Incidents within our system.

Not all Known Errors will affect all users – a network switch failure in one branch office would be very impactful for the local users but not for users in another location.

If we understand our users environment through systems such as the Configuration Management System (CMS) or Asset Management processes we should be able to determine a users exposure to Known Errors.

For example when a user phones the Servicedesk complaining of an interruption to service we should be able to quickly learn about her configuration. Where she is geographically, which services she connects to. Her personal hardware and software environment.

With this information, and some Configuration Item matching the Servicedesk engineer should have a view of all of the Known Errors that the user is vulnerable to.

Measuring the effectiveness of the KEDB.

As with all processes we should take measurements and ensure that we have a healthy process for updating and using the KEDB.

Here are some metrics that would help give your KEDB a health check.

Number of Problems opened with a Known Error

Of all the Problem records opened in the last X days how many have published Known Error records?

We should be striving to create as many high quality Known Errors as possible.

The value of a published Known Error is that Incidents can be easily associated with Problems avoiding duplication.

Number of Problems opened with a Workaround

How many Problems have a documented Workaround?

The Workaround allows for the customer Incident to be resolved quickly and using an approved method.

Number of Incidents resolved by a Workaround

How many Incidents are resolved using a documented Workaround. This measures the value provided to users of IT services and confirms the benefits of maintaining the KEDB.

Number of Incidents resolved without a Workaround or Knowledge

Conversely, how many Incidents are resolved without using a Workaround or another form of Knowledge.

If we see Servicedesk engineers having to research and discover their own solutions for Incidents does that mean that there are Known Errors in the system that we aren’t aware of?

Are there gaps in our Knowledge Management meaning that customers are contacting the Servicedesk and we don’t have an answer readily available.

A high number in our reporting here might be an opportunity to proactively improve our Knowledge systems.

OLAs

want to ensure that Known Errors are quickly written and published in order to allow Servicedesk engineers to associate incoming Incidents to existing Problems.

One method of measuring how quickly we are publishing Known Errors is to use Organisational Level Agreements (or SLAs if your ITSM tool does’t define OLAs).

We should be using performance measurements to ensure that our Problem Management function is publishing Known Errors in a timely fashion.

You could consider tracking Time to generate Known Error and Time to generate Workaround as performance metrics for your KEDB process.

In summary

Additionally we could also measure how quickly Workarounds are researched, tested and published. If there is no known Workaround that is still valuable information to the Service desk as it eliminates effort in trying to find one so an OLA would be appropriate here.

Leave a Reply

Your email address will not be published. Required fields are marked *