I recently did some web research on the confusion of events, alerts, incidents and notifications – terms that play an important role in understanding the value of what IT service alerting delivers. I have found an abundance of interpretations and definitions. You will also find different naming conventions with each vendor of tools for monitoring and service management. For example, what is called an “alert” in Microsoft SCOM is a “key incident” in HPE NNMi. We also offen get into discussion like this “Ah, you do alerting for SCOM? But doesn’t SCOM do alerting?” Well, of course SCOM raises and shows alerts, i.e. does alerting, but this is of course not what we do. We do advanced “alert notifications”. But I agree, it is somewhat confusing.
Let’s dive into this discussion a bit more. It is pretty interesting…
Event
ITIL links events and notifications directly by saying:
“An event can be defined as any detectable or discernible occurrence that has significance for the management of the IT Infrastructure or the delivery of IT service and evaluation of the impact a deviation might cause to the services. Events are typically notifications created by an IT service, Configuration Item (CI) or monitoring tool.” Source: Wikipedia, i.e. from the ITIL Service Operation Book
I wouldn’t agree as events can pass unnoticed because an event is NOT a notification. We’ll see below a better definition. ITIL also defines categories for events mixing up events and alerts:
“Standard categorization based on the significance of an event:
- Informational (INFO): the event does not require any immediate action and does not represent an exception. They are recorded in the log files and maintained for a predetermined period. This type of event is used to check the status of a device or service, to confirm the state of an activity, to generate statistics (user login, batch job completed, device power up, number of users logged into an application)
- Warning (WARN / ALERT): the event is generated when a device or service, (application / utility), is approaching an agreed threshold (KPI). Warnings are intended to notify the group/process/tool in order to take the necessary actions to prevent an exception occurring.
- Exception (ERROR): means that a service or device is currently operating below the normal parameters/indicators (predefined). This mean that the business service is impacted and the device or service presents a failure, performance degradations or loss of functionality (web server down, CS coverage lost for several sites). A device failure is an error.”Source here
I think the following definition makes some sense but it is not entirely correct:
“An event is an observed change to the normal behavior of a system, environment, process, workflow or person.” Source: danielmiessner.com
I would even say that any change to the current behavior or status is already an event.
Alert
It gets even more confusing with alerts. Some define alerts as events that meet a certain thresh-hold, have a specific relevance (as in ITIL – events of warning/alert type) or require action.
Let’s start with this:
“An alert is a notification that a particular event (or series of events) has occurred, which is sent to responsible parties for the purpose of spawning action” Source: danielmiessner.com
Here we again see the confusion between alerts and notifications. From my point of view, an alert is not a notification but just the pure occurrence of a specific event (meeting criteria) or a series of events (meeting criteria but also a number of similar events can be a criteria in itself). In SCOM for instance or other monitoring tools, an alert is a specific event or triggered by an event and is usually made visible in the console. It can also trigger an action (auto-recovery) or a notification.
So, I found this pretty striking definition:
“Not a by the book definition, simply my understanding: An event will happen. It happens even if it is not detected and flagged. An alert is when a monitoring system detects it and raises this fact somewhere for further processing (and potentially triggers a notification as well). So an Alert is always in response to an event (in other words there is always an event with an alert) but there is not always an alert with an event.” Source here
Not much to add to this.
Incident
ITIL (v2) defines incidents as follows: “An event which is not part of the standard operation of a service and which causes or may cause disruption to or a reduction in the quality of services and Customer productivity.” In ITIL v3 it is defined as “An unplanned interruption to an IT Service or a reduction in the Quality of an IT Service. Failure of a Configuration Item that has not yet impacted Service is also an Incident. For example, Failure of one disk from a mirror set.”
Is says “event”, not “alert”. Again, a bit confusing because we somewhat agreed on the sequence event->alert->incident. Not all alerts are incidents, nor is there necessarily a 1:1 relation between alerts and incidents. Incidents can be linked to alerts, i.e. certain alerts indicate an incident. In many scenarios, alerts of certain severity are automatically transferred to an service management system and are the basis for the creation of an incident ticket.
Just for completeness – I wouldn’t by the way agree with this particular definition:
“An incident is a human-caused, malicious event that leads to (or may lead to) a significant disruption of business.” Source: danielmiessner.com
I think this is definition is incorrect. Why should an incident only be caused by humans? If a manufacturing robot fails, this IS an incident. And it is not caused by humans (except you wish to blame it on the designer of that machine).
Notification
As we have seen above some already define events as notifications and alerts as notifications. This doesn’t hold. So, notifications spawn all three discussed categories – events, alerts and incidents. You can have events trigger notifications (resulting in quite some spamming), you can have alerts trigger notifications (much better, but even then you’d filter out the most critical alerts) and you can have incidents trigger status notifications (upon creation, resolution progress, etc). So, notifications are the very part that bring alerts and incidents to the attention of people that need to act and to respond.
Alert and Incident Notifications
And here we are exactly arriving at the typical job our enterprise notification software does. Using multiple channels (voice, text, push, IM, etc), duty schedules, escalation plans, mobile apps and much more to automatically to notify operational staff upon alerts & incidents. I.e. deliver critical information to the right people at the right time and wherever they are.
Read more about Enterprise Alert