- Updated on 05 Jun 2018
- 4 minutes to read
The glossary will be a helpful reference to learning all of the terms involved with Retrace's Alerts and Notifications.
A Monitor represents a specific source and type of data collected by Stackify. The collection mechanism is dependent upon the Monitor type. Example Monitor Types include:
- CPU %
- Memory Used
- SQL Query Monitor
- Website Check Monitor
- Error Rate Monitors
- and many others (possibly a link to something about monitor types)
An alert represents a period of time where a stream of Monitor Metrics for a specific Monitor exceed a configured severity.
- Important! Only production servers are eligible for alerts.
When configuring Alerts for a Monitor, you will be required to define Alert Severities. There are 3 possible severities:
These have been listed in ascending order (Warning is less severe that Critical, Outage is more severe than Critical). The order here is used by Retrace when determining the most severe alert for a given resource, app, or server. So, it is important to keep your configuration in line with these semantics. Alert Severities define 2 requirements that must be met in order for an alert to open at or transition to that severity:
- Value Threshold: the instantaneous value requirement for a Monitor Metric
- Duration Threshold: the amount of time all Monitor Metric values must meet the value threshold requirement
An example configuration might look like: For this configuration, an alert will be opened at Warning when CPU has been greater than or equal to 95% for 10 minutes consecutively. If the Monitor Metric values remain above 98% for 15 minutes, it will then transition into a Critical severity. After 30 minutes it will then transition into Outage. If, at any point, a Monitor Metric is received with a value below 95%, the alert will close.
A Notification is a message that can be sent when some event ocurrs. Possible Notification targets include:
- Slack (this can also, be configured with an @channel mention)
A Notification Group is a collection of contacts that have been associated to.
Notifiable Events / Types of Notifications
There are 4 types of notifications. All of these notification types are subject to the specific monitor severities associated with the Notification Group (i.e. we will not send a alert, escalation, or reminder if the current severity does not match the configuration for that monitor on the Notification Group).
Alerts Sent whenever an alert transitions to a severity that is at or above the severity configured for the Notification Group.
Reminders A reminder is a type of notification that is sent on a repeating interval since the last alert severity transition. Example: If you had a Reminder setup for 20 minutes, and an Alert started at 9:00 AM which has not been acknowledged, a reminder notification would be sent out at 9:20 AM, 9:40 AM, etc. The interval for a reminder is configured on the notification group.
Escalations An escalation is a type of notification that is sent when an alert remains above an acknowledged severity for the configured escalation interval. If an alert is never acknowledged, the escalation notification will be sent as soon as the escalation interval expires.
Clears Sent whenever any alert is closed. Alerts are closed when a Monitor Metric is received that does not meet the required severity.
Types of Notification Suppression
Acknowledge An acknowledgement represents an accepted Alert Severity that should not send anymore notifications while the alert is open. Acknowledgements are not time-bound (i.e. they do not expire). Typically, an alert will be acknowledged when someone is actively working to fix the issue but should be notified if it gets worse. The suppression behavior obeys the following rules:
- If an alert transitions below the acknowledged severity, notifications will not be sent.
- If an alert transitions above the acknowledged severity, notifications will be sent.
- If the alert transitions back to the acknowledged severity, it will remain suppressed.
Snooze A snooze is a time-based form of supression applied to a monitor. No notifications of any kind will be sent until the snooze period expires. When the snooze expires, notifications will resume according to their configuration.A note on reminders and escalations: reminders and escalations are processed on an interval. Snooze does not impact the timing of this process. For example, consider a Notification Group that has been configured with a reminder interval of 10 minutes, the following demonstrates how snoozes and reminders/escalation will work:
00: alert started
10: reminder sent
15: 10 minute snooze applied
20: no reminder sent because the monitor is snoozed
25: snooze expires (no notifications are sent when the snooze expires)
30: reminder sent
Maintenance Window A Maintenance Window is a special form of snooze that is applied to ALL monitors. All of the behavioral rules that apply to snooze also apply to Maintenance Windows.