On Monday, August 21, 2017, at approximately 07:00 BST the xMatters monitoring systems alerted the Client Assistance team to an issue with the xMatters On-Demand services for some clients located in the European region. Some users may have experienced intermittent access to the user interface, and a delay or rejection when injecting an event into xMatters.
Why did it happen?
The root cause of this issue was an operating system error on one of the data center servers that comprise the xMatters private cloud infrastructure.
How did we respond?
As soon as the xMatters network monitoring tools detected unreliable connectivity, the Client Assistance and Operations teams initiated the internal Major Incident Management process. The incident response teams began simultaneously investigating the underlying cause and working to restore services for clients. The teams launched the automatic failover procedures, and promoted the impacted clients to alternate services located in the same data center. This automated process allowed for quick recovery of the impacted systems and all services were back online within a few minutes.
What are we doing to prevent it from happening again?
While the automatic failover processes and resource redundancy meant that the impact was limited to very few clients, the xMatters team continues to work to improve failover capabilities to help reduce the impact of any future issues and decrease even further the amount of time required to complete the failover of services. Any factors that the team identifies as potential improvements will be evaluated on an individual basis and implemented during regularly scheduled maintenance windows.
2017-08-21 07:02 - xMatters monitoring systems alert the Client Assistance team to an operating system software issue impacting a few clients in one of the data centers in Europe
2017-08-21 07:05 - Internal Major incident process is initiated
2017-08-21 07:14 - Impacted client services are promoted to alternate data center
2017-08-21 07:18 - All xMatters services are restored and operations are back to normal
If you have any questions, please visit http://support.xmatters.com