On November 9, 2023, at approximately 5:35 AM AEDT, some customers in the APAC region reported an issue to xMatters Customer Support where they were unable to add a new user. The Add User button was greyed out, and hovering over the button was showing the message "You've reached the maximum number of user licenses for your account" despite having additional licenses available. Some users may also have experienced an intermittent inability to log into the web user interface. Throughout this issue and the subsequent mitigation procedures, the system continued to accept events and generate alerts, and all notifications and responses were processed correctly.
Why did it happen?
During a regularly scheduled update to the backend services in the APAC region, a timing issue caused the service responsible for instance configuration and license tracking to be directed to a version that hadn't received the latest configuration data. This conflict caused the system to calculate allotted licenses incorrectly and caused intermittent login issues.
How did we respond?
The Engineering teams were monitoring the update and were not encountering any warnings or errors within the process that they considered outside acceptable levels for this specific operation. When customers reported the issue to xMatters Customer Support, however, the teams made the decision to roll back the deployment immediately to mitigate any potential problems. As soon as the rollback was completed, customers confirmed that all services had been restored. The Engineering team launched an internal review process and were able to identify some avenues of improvement and successfully redeployed the update without incident.
What are we doing to prevent it from happening again?
In addition to adding additional automated checks to ensure configuration data is always up to date across services prior to an update, the teams isolated the specific cause of the configuration data mismatch to a timeout issue and have updated the timing settings to ensure that it will not happen again.