Issue Discovered - Service disruption in All Regions – Multiple Services
Incident Report for xMatters
Postmortem

What happened? 

On October 31, 2023, at approximately 10:22 AM Pacific, some customers reported that they were unable to log in to xMatters or, if they were already logged in, encountered "503" errors in the web user interface. Customers may also have noticed some flows failing to execute. 

Why did it happen? 

xMatters deployed a regularly scheduled update to one of the backend services that comprise the platform. Due to recent hosting changes that included the physical relocation of a data center, the deployment caused a conflict that resulted in a lack of processing availability. 

How did we respond? 

As soon as customers reported an inability to access the web user interface, the Support team confirmed the issue and initiated the internal major incident process. The response teams quickly identified the root cause and rolled back the deployment to the previous version of the service. This resolved the issue and customers reported that all services were restored. The xMatters Engineering teams then investigated the recent deployment and were able to reconfigure the update and redeploy the service. The service was deployed and restarted without further impact to customers within 20 minutes of resolving the initial issue. 

What are we doing to prevent it from happening again? 

The xMatters teams regularly deploy updates to backend services and aim for a seamless transition between versions that won't impact customers. To help prevent this type of issue from reoccurring, the teams are adding more process checks to ensure that updates meet backend service requirements and dependencies before customers are switched over to a new version of a service.

Posted Nov 22, 2023 - 15:16 PST

Resolved
The issue has been addressed, and all services have been restored. Thank you for your patience while we addressed this matter.
Posted Oct 31, 2023 - 10:58 PDT
Update
We are continuing to monitor for any further issues.
Posted Oct 31, 2023 - 10:38 PDT
Monitoring
The xMatters Incident Response team has deployed a fix for the issue. We are currently monitoring the situation to ensure the implementation is stable and that all services are restored.
Posted Oct 31, 2023 - 10:37 PDT
Identified
The xMatters Incident Response team has identified the source of the issue and is working on a fix. We will update once a solution has been identified and implemented.
Posted Oct 31, 2023 - 10:35 PDT
Investigating
xMatters monitoring tools have identified a potential issue with xMatters On-Demand for clients in All Regions. We are currently investigating the issue and will update as information becomes available.

Please see incident details for specific services impacted.

If you are also experiencing issues, or if you're not sure whether this issue impacts your service, please contact xMatters Client Assistance at https://support.xmatters.com/hc/en-us/requests/new - our support team is waiting to help.
Posted Oct 31, 2023 - 10:34 PDT
This incident affected: North America (Web Interface, Email Notifications, SMS Notifications, Voice Notifications, Conferencing, Integration Platform, API, Mobile App), Europe, Middle East, and Africa (Web Interface, Email Notifications, SMS Notifications, Voice Notifications, Conferencing, Integration Platform, API, Mobile App), and Asia Pacific (Web Interface, Email Notifications, SMS Notifications, Voice Notifications, Conferencing, Integration Platform, API, Mobile App).