Issue Discovered - Service disruption in Asia Pacific Region - Multiple Services
Incident Report for xMatters
Postmortem

What happened? 

On November 9, 2023, at approximately 5:35 AM AEDT, some customers in the APAC region reported an issue to xMatters Customer Support where they were unable to add a new user. The Add User button was greyed out, and hovering over the button was showing the message "You've reached the maximum number of user licenses for your account" despite having additional licenses available. Some users may also have experienced an intermittent inability to log into the web user interface. Throughout this issue and the subsequent mitigation procedures, the system continued to accept events and generate alerts, and all notifications and responses were processed correctly. 

Why did it happen? 

During a regularly scheduled update to the backend services in the APAC region, a timing issue caused the service responsible for instance configuration and license tracking to be directed to a version that hadn't received the latest configuration data. This conflict caused the system to calculate allotted licenses incorrectly and caused intermittent login issues. 

How did we respond? 

The Engineering teams were monitoring the update and were not encountering any warnings or errors within the process that they considered outside acceptable levels for this specific operation. When customers reported the issue to xMatters Customer Support, however, the teams made the decision to roll back the deployment immediately to mitigate any potential problems. As soon as the rollback was completed, customers confirmed that all services had been restored. The Engineering team launched an internal review process and were able to identify some avenues of improvement and successfully redeployed the update without incident. 

What are we doing to prevent it from happening again? 

In addition to adding additional automated checks to ensure configuration data is always up to date across services prior to an update, the teams isolated the specific cause of the configuration data mismatch to a timeout issue and have updated the timing settings to ensure that it will not happen again.

Posted Nov 28, 2023 - 15:23 PST

Resolved
The issue has been addressed, and all services have been restored. Thank you for your patience while we addressed this matter.
Posted Nov 08, 2023 - 11:45 PST
Monitoring
The xMatters Incident Response team has deployed a fix for the issue. We are currently monitoring the situation to ensure the implementation is stable and that all services are restored.
Posted Nov 08, 2023 - 11:38 PST
Update
We are continuing to work on a fix for this issue.
Posted Nov 08, 2023 - 11:38 PST
Identified
The xMatters Incident Response team has identified the source of the issue and is working on a fix. We will update once a solution has been identified and implemented.
Posted Nov 08, 2023 - 11:36 PST
Investigating
xMatters monitoring tools have identified a potential issue with xMatters On-Demand for some clients located in the Asia Pacific region. We are currently investigating the issue and will update as information becomes available.

Please see incident details for specific services impacted.

If you are also experiencing issues, or if you're not sure whether this issue impacts your service, please contact xMatters Client Assistance at https://support.xmatters.com/hc/en-us/requests/new - our support agents are waiting to help.
Posted Nov 08, 2023 - 11:26 PST
This incident affected: Asia Pacific (Web Interface, Email Notifications, SMS Notifications, Voice Notifications, Conferencing, Integration Platform, API, Mobile App).