On April 14, 2023, at approximately 9:05 AM Pacific, some customers reported an issue to xMatters Customer Support where users were encountering errors when attempting to log in to the xMatters web interface. During the incident, some customers in North America may have experienced 503 errors when attempting to access or use xMatters or encountered errors with integrations that communicated with the xMatters API. These errors were intermittent, and only impacted a subset of customers whose primary instance was based in the us-east data center. Customers in the EMEA and APAC regions, and in other North American data centers were not impacted.
The issue was caused when a customer inadvertently initiated a denial-of-service attack by launching an excessive number of API requests. The incoming requests request peaked at over 90,000 per minute and overwhelmed the capacity of edge systems to manage the volume, causing a cascade that eventually blocked access to API endpoints and triggered 503 errors for systems that rely on them.
xMatters monitoring systems alerted to the issue just before customers reported encountering errors. xMatters Customer Support confirmed the issue and initiated the major incident management process. The incident response teams determined that the best course of action was to promote impacted customers to unaffected regions and mitigate the inbound traffic by redirecting it away from critical systems. Once the traffic was mitigated, impacted systems were able to recover and customers were migrated back to their original data centers.
A status page notification was posted to status.xmatters.com but due to the limited scope and intermittent impact, it was noted as a degraded service. This classification intentionally does not email status page subscribers.
xMatters Engineering has determined that additional protections are needed at entry points to identify any excessive inbound volume and allow for quick mitigation. The teams are in the process of determining the best parameters and implementation of these protections to address both intentional and unintentional denial-of-service incidents.
Friday, April 14, 2023
9:00 AM - Customers report 503 issues
9:06 AM - xMatters Customer Support initiates Severity-1 incident
9:15 AM - Investigation reveals high volume to us-east
9:25 AM - Source of volume identified; being routing customers to other regions
9:45 AM - Routing changes complete
10:17 AM - Incident mitigated
If you have any questions, please visit http://support.xmatters.com