What happened?
On Monday, February 26, 2018, at approximately 5:50 AM PST, the xMatters monitoring systems alerted the Client Assistance team to an issue with the xMatters On-Demand service for some clients located in Europe. Some users may have experienced delays in notification delivery after injecting an event into xMatters via the Integration Builder. Some users may have also experienced a brief outage of their service during the resolution of this issue.
Why did it happen?
This issue occurred when multiple systems failed within a short period of time and experienced issues with their networking components. This caused significant performance degradation problems and other stability issues for some customers hosted in one of our European data centers while the cluster recovered.
How did we respond?
As soon as the xMatters network monitoring tools detected unreliable connectivity, the Client Assistance and Operations teams initiated the internal Severity-1 process. The incident response teams began simultaneously investigating the underlying cause and working to restore services for clients. The teams launched the failover procedures, and failed over the impacted clients to an alternate data center. Once this was completed, all services were restored and functioning as expected.
What are we doing to prevent it from happening again?
The Operations team is currently still investigating why several systems failed in short order, and is performing preventative software and firmware patching based on the observed behaviors.
Timeline:
2018-02-26 05:52 - xMatters monitoring tools alert the Client Assistance team to an issue with On-Demand services in the European region
2018-02-26 06:00 - Internal Severity-1 process initiated
2018-02-26 06:40 - Integration Builder issues detected for some EU customers
2018-02-26 07:04 - Bulletin posted to xMatters status page: http://status.xmatters.com/incidents/ftmng8h9j6h3
2018-02-26 07:37 - Notification delays detected for some EU customers.
2018-02-26 07:37 - Failover process initiated for some customers.
2018-02-26 08:05 - Solution is implemented
2018-02-26 08:20 - Services are confirmed restored If you have any questions, please visit http://support.xmatters.com