On October 31, 2023, at approximately 10:22 AM Pacific, some customers reported that they were unable to log in to xMatters or, if they were already logged in, encountered "503" errors in the web user interface. Customers may also have noticed some flows failing to execute.
Why did it happen?
xMatters deployed a regularly scheduled update to one of the backend services that comprise the platform. Due to recent hosting changes that included the physical relocation of a data center, the deployment caused a conflict that resulted in a lack of processing availability.
How did we respond?
As soon as customers reported an inability to access the web user interface, the Support team confirmed the issue and initiated the internal major incident process. The response teams quickly identified the root cause and rolled back the deployment to the previous version of the service. This resolved the issue and customers reported that all services were restored. The xMatters Engineering teams then investigated the recent deployment and were able to reconfigure the update and redeploy the service. The service was deployed and restarted without further impact to customers within 20 minutes of resolving the initial issue.
What are we doing to prevent it from happening again?
The xMatters teams regularly deploy updates to backend services and aim for a seamless transition between versions that won't impact customers. To help prevent this type of issue from reoccurring, the teams are adding more process checks to ensure that updates meet backend service requirements and dependencies before customers are switched over to a new version of a service.