Mike is a family man
He loves his little kids and tries to spend as much time with them as his job allows. Mike is part of an IT operations team at a large financial IT service provider. One of his responsibilities is being on-call. He shares with 3 other team members so, effectively he has to cover once every four weeks. This cuts into his family time as it means dealing with major and critical issues after business hours. His IT department team has set up email group notifications from their monitoring tools. But there are multiple issues with this. First of all – in the middle of the night an email notification easily goes unnoticed. That short “beep” on his cell phone is simply not enough to wake him up. Even if he would change that ringtone to make it more persistent, he would simply get too many alerts that are irrelevant. And being woken up for something that turns out to belong to a different team or different area of expertise is no fun. When on call, Mike has to constantly check his cell phone for new email alerts. There is always that fear of missing something important which makes him edgy. That is quite frankly a pain and totally disrupts his family life.
Mike did get this one Alert
Last week they had a major incident. What a nightmare! Especially as he was only covering for his colleague Peter who had his first date in 6 months! By early evening hundreds of alert emails had already piled up in his Ops inbox. Somebody had forgotten to switch on a maintenance window in the monitoring tool. To make matters worse, his son had a bad fever that night and his wife was away with his little daughter. So, he was literally jumping back and forth between his son’s bed, the kitchen, his cell phone and his PC. At 12am he fell asleep, totally exhausted. Still pretty tired, he woke up at 6:30 and tried to dig through all those new maintenance-related alert emails in his inbox. All looked relatively OK to him but he was a little suspicious. When he checked again he stumbled across one email that pointed towards a potential critical issue. Under normal circumstances, he should have received a text alert on this critical issue, but apparently that had not been sent this time. Their text alerts weren’t really very reliable. Plus, the issue was clearly related to the database. And of course, there was a database team with at least one person on-call 24×7. Mike was sure that it was not his team’s issue. Unfortunately, there was no tracking or confirmation option with their email notifications. He wanted to call that one guy he knew on the database team, in order to cross-check. But then his son woke up crying and he had to take care of him first.
Then his boss Ralph called
Man, he was simply furious. Mike’s ear was burning. Mike tried to explain the situation, the missing text, his evaluation of the issue belonging to the database team. Ralph was not satisfied, shouted at him over and, over again and finally hung up. Mike felt with him, he was under enormous pressure and this incident really was a very severe one which affected hundreds of customers. On top of that he could not really do anything about it because his son needed him this morning.
A solution was badly needed
Mike knew exactly what was missing. Their alerting and notification sucked So, his and his team’s top priorities were:
- 100% reliable alert notifications not only email and somewhat flaky text messaging, but ideally also voice calls
- Escalations to a backup, so just in case he overlooked an alert it would go to another team member
- Great filtering, so he would only receive or be woken up by truly critical alerts
- Targeted notifications, so he and his team would only receive alerts from systems that they are responsible for
- Duty scheduling with ad-hoc stand-ins and automated alert routing, he could have used this when his son was sick
- Maybe, some “who’s on call” across the whole IT division to cross-check and communicate with other teams, ideally accessible from a cell phone
- And acknowledgements so he can confirm alerts, annotate them and his boss is always in the loop
Mike now actually enjoys his family time – even while being on-call