Your SLA is bullshit

I was asked to develop SLAs (Service Level Agreements) for various services our team provides to the organization. I began by pulling data from our operational systems – our deployment management tool, as well as Service Now, and Eclipse. As I prepped the data to begin analysis, I was pulled away from that work. I set this aside for what turned out to be the rest of the week.

When I returned to developing the SLAs, I realized I missed a serious part of the process. I hadn’t thought through how our tools, our processes and policies, nor our workload affected the data I had in front of me. I hadn’t added attributes to the data to be able to visualize how we, our team, and our organization changed the shape of this data by our decisions. As I worked to understand these effects, I added how other teams – their policies, processes, projects and decisions – also affected our SLAs. I began to reach out to other leaders to start the conversation around SLAs and those leaders processes to develop time frames for delivery.

After many discussions on other’s SLAs what I found was that almost none of our leaders were taking the time to understand how outside effects – those processes and polices from other teams and the broader organization – affected their work. Even when teams looked into how these forces were affecting their SLAs, almost none were normalizing the data and developing service level agreements that were based on “Ceterus Paribus”, all things being equal. Nor were these leaders seeking to understand the reasons behind decisions that were affecting their SLAs, and working across the organization to try and shorten those time lines when they could.

An example of a force that artificially inflates SLAs would be change management processes that rely on a routinely scheduled CAB meeting before changes could be deployed. While there are many good reasons behind a standard change process and change windows, a review should be done to understand whether shorter change cycles can be managed in a way that still maintains a certain level of control and stability in a system. This could move to significantly shortened SLAs, and remove the feeling among the organization that change management was a roadblock to agility in obtaining business value.

Another example would be a year such as we are having now, filled with what feels like endless project work. If we were to include our delivery times for operational work over the last few months to develop SLAs, they would be wildly out of step with the reality of today, or what will most likely be tomorrow. So instead we should either remove these data points from our analysis, or better yet develop meta data to capture what our delivery times are with and without projects. Then we can either again normalize the data, or develop an SLA and communication plan that includes multiple scenarios, when each is used, and how to communicate movement between scenarios.

In the end we need to be cautious when developing our SLAs. We need to ensure that analysis includes an understanding of the why behind the shape of our data. We need to take into consideration all factors, whether we have control over those factors or not. Ultimately we need to develop SLAs that are flexible and related to real life scenarios . We also need to make every attempt to remove self imposed restrictions that inflate our SLAs.