How effective Incident Notifications and Ownership Transparency change the Game
Meet sleepless Ralf
Ralf is in charge. In charge of various applications that are used by hundred thousands of customers. His company hosts IT services and applications for major insurance companies. Ralf loves his job. But it is a lot of responsibility, too. And Ralf used to not sleep very well. It has only been a few weeks ago since they had to deal with that major incident at night – and it was even worse than the one last year. One of the applications he’s in charge of simply stopped working at 2 am.
But Ralf didn’t find out until much later
When he woke up at 6 am and turned his cell phone on (which he switched off to protect his kid’s sleep) he was immediately bombarded with menacing mailbox notifications. Oh gosh, he knew he was in trouble when he checked those voice mails. He didn’t know what was worse – his boss or his company’s client. Customers of the insurance company in overseas had been unable to use some of their services including the mobile app since their early morning.
He immediately called his boss and the client, telling them he’s on the job, tracking what happened and figuring out how fast they could fix it. Uh, what a job to calm down the client. He then started calling his team – one team member after the other, attempting to nail the problem down. First of all Peter who he remembered being on call this week. But Peter didn’t answer the phone. What was wrong with that man? Ralf started boiling. He called another team member and was told that Peter had been trying to find a stand-in last night because his son had broken his leg at a football game. But who had jumped in as a stand-in?
Finally, he reached Mike. Mike had jumped in for Peter. But Mike claimed to have not received the alert notification that had been sent by email and text. Ralf didn’t want to finger point, but in all fairness, Mike had never been the most reliable person. Unfortunately, he couldn’t prove anything as he didn’t have any track record of the alert message and its delivery. Their monitoring tool didn’t produce any notification records. On top of it all, Mike claimed that although he had seen the incident early that morning – he didn’t feel responsible for it, as he was certain that only the network team could solve the issue. After all, it wasn’t anything that had to do with their application not running. Right…
Ralf then tried to setup a call with the network team – what a morning – he couldn’t get hold of anyone there. And he didn’t have access to their on-call calendar. At 7.30 am, Ralf gave in, fired up is notebook and started to dig into the IT stack himself. At 9.15 am he seemed to have found a potential cause – so, apparently, if you want something done, do it yourself. It still took him, his team and the networks guys until 10.45 am to fix the issue and to have everything up and running again. Unfortunately, this hadn’t been the first disruption of crucial services. The reputation of his company as well as his own was at stake.
Ralf’s Team needed a Tool.
He knew what he wanted. Gaps and bottlenecks were easily identified. He needed to respond to such major incident much better. So, his and his team’s top priorities were:
- 100% reliable alert notifications that were traceable, so there is clear accountability
- A ‘who’s on call’ that was accessible, ideally from a mobile phone, to check what’s going on
- Some good and transparent on-call scheduling, to comfortably ensure timely responsibilities
- Automated alert routing to the person on call, to make sure incidents are not overlooked
Above all, he needed to stay in the loop. So, he could provide his boss and his client with the answers they were looking for.
What Ralf needed for himself and for his Role was even more
- To be looped-in when the “sh…hits the fan”. And yes, please persistently, on the red line…
- See who’s taking care of what, so he doesn’t need to run after the information he so desperately needs
- Ideally, some sort of early morning, post-night-shift report with incident ownership information
- And finally, some way of easily setting up a conference bridge. With his team or even cross-team.
Ralf needed a tool that would ensure a reliable means of “letting the right person know”and deliver the desperately needed transparency about incident accountability and resolution progress.
Ralf now sleeps much better