Issue Discovered - Service disruption in Europe
Incident Report for xMatters
Postmortem

What happened?

On Monday, August 21, 2017, at approximately 07:00 BST the xMatters monitoring systems alerted the Client Assistance team to an issue with the xMatters On-Demand services for some clients located in the European region. Some users may have experienced intermittent access to the user interface, and a delay or rejection when injecting an event into xMatters.

Why did it happen?

The root cause of this issue was an operating system error on one of the data center servers that comprise the xMatters private cloud infrastructure. 

How did we respond?

As soon as the xMatters network monitoring tools detected unreliable connectivity, the Client Assistance and Operations teams initiated the internal Major Incident Management process. The incident response teams began simultaneously investigating the underlying cause and working to restore services for clients. The teams launched the automatic failover procedures, and promoted the impacted clients to alternate services located in the same data center. This automated process allowed for quick recovery of the impacted systems and all services were back online within a few minutes.

What are we doing to prevent it from happening again?

While the automatic failover processes and resource redundancy meant that the impact was limited to very few clients, the xMatters team continues to work to improve failover capabilities to help reduce the impact of any future issues and decrease even further the amount of time required to complete the failover of services. Any factors that the team identifies as potential improvements will be evaluated on an individual basis and implemented during regularly scheduled maintenance windows.

Timeline:

2017-08-21 07:02 - xMatters monitoring systems alert the Client Assistance team to an operating system software issue impacting a few clients in one of the data centers in Europe

2017-08-21 07:05 - Internal Major incident process is initiated

2017-08-21 07:14 - Impacted client services are promoted to alternate data center

2017-08-21 07:18 - All xMatters services are restored and operations are back to normal

If you have any questions, please visit http://support.xmatters.com

Posted Aug 25, 2017 - 09:59 PDT

Resolved
The issue has been addressed, and all services have been restored. Thank you for your patience while we addressed this matter.
Posted Aug 20, 2017 - 23:28 PDT
Monitoring
The xMatters Incident Response team has deployed a fix for the issue. We are currently monitoring the situation to ensure the implementation is stable and that all services are restored.
Posted Aug 20, 2017 - 23:21 PDT
Identified
The xMatters Incident Response team has identified the source of the issue and is working on a fix. We will update once a solution has been identified and implemented.
Posted Aug 20, 2017 - 23:18 PDT
Investigating
The xMatters monitoring tools have identified a potential issue with xMatters On-Demand for some clients located in Europe. We are currently investigating the issue, and will update as information becomes available.
Posted Aug 20, 2017 - 23:06 PDT
This incident affected: Europe, Middle East, and Africa (Web Interface, Email Notifications, SMS Notifications, Voice Notifications, Conferencing, Integration Platform, API, Mobile App).