[% META title = "Defining Service Expectations and Measurements" %]

Service Level Agreements

Simple SLAs

A Service Level Agreement (SLA) is a formal definition for a service contract. The most important target is the Turn-Around-Time — the time to complete a request.

We measure a service level is a percentage of requests that are resolved within a given time. However, in order to balance work priorities, we can assign a severity to a tickets based upon business needs, and assign a different SLA target for each severity. For example, we may chose three (relatively simple) severities:

Critial
An issue that is of major impact, cost, or effect. This may be measured as lost revenue, wasted (staff) time, or branding impact.
Normal
Almost everything that happens is Normal.
Minor
Background tasks for which there is no pressing time need to get them done.

I will chose a critical issue as:

When a request (ticket) is submitted to the system an immediate Auto Response is generated so the requestor knows their issue has been logged, however this gives no assurance to the requestor that their issue has been reviewed by a human and scheduled for action. Luckily, we can do this automatically when we assign (take) and priorities (set the custom field of severity) the ticket. If we want we could also put a time estimate in at this stage...

To determine our ability to adequately perform our work, we can define our SLA, per severity as outlined above, as a time to:

Simple SLA per Severity
SeverityTarget time to respondTarget time to resolve
Critical1 hour1 day
Normal1 hour7 day
Minor1 hourNone

We set the target time to resolve as the same for all Severities as we won't know the Severity or the resolution time until we have reviewed the ticket initally. So we should immediately classify, and (auto) respond to the requestor at that time.

Getting more complex

Not everything is deserving of 24x7 cover. No one cares if the office copier stops working at 3am; but you do care at 9am until 6pm. Alternatively you would be concernced about your e-Commerce application going down at any time.

If we have offices in multiple time zones (eg: London, New York) then there may be some services that are in use during your Global Business Hours, and some during your Local Business Hours. To deal with this we'll chose three classes (levels) of services to indicate an importance, and we select what services belong to which classification:

Gold
24 x 7 x 365. We care about this big-time; this is your web site (brand damage if it goes down) or your eCommerce application (money lost if it goes down)
Silver
Global Business Hours, typically Monday - Friday. This may be your intranet or other internal global resources
Bronze
Local Business Hours, normally 9am - 6pm. This would be your office printer, phone system, etc

For each of these Class of service we have our three (or more) severities, so we can now define our target response and resolution times, and our target SLA percentage to meet this.

Complex SLA per Service Class per Severity
ClassSeverityResponse timeResponse SLAResolve timeResolve SLA
Gold
24 x 7
Critical1 hour90%2 hours90%
Normal1 hour90%1 day90%
Minor1 hour90%None-
Silver
Global business hours
Critical1 hour90%24 global business hours90%
Normal1 hour90%96 global business hours90%
Minor1 hour90%None-
Bronze
Local business hours
Critical1 hour90%24 local business hours90%
Normal1 hour90%96 local business hours90%
Minor1 hour90%None-

The default ticket has all these fields, but we need to extend with a custom field for Severity and Service. This will be a Custom Field.