What happened?
On May 13, 2025 at approximately 10:45 AM Pacific, xMatters internal monitoring tools identified an issue where customers in the EU region experienced intermittent web UI and API timeouts.
Why did it happen?
The issue occurred because a backend queueing service experienced network timeouts during an unpredictable rapid increase in usage. The increase in resource consumption due to the surge in network usage caused service timeouts and restarts, as well as higher latency which caused further delays in responses to backend requests.
How did we respond?
xMatters internal monitoring tools alerted the xMatters Incident Response Team to the issue, then the team launched the internal SEV-1 process. Due to early detection, Engineering teams were able to scale up the queueing services to prevent further service degradation and availability issues. The network timeouts were resolved after resources were scaled up to accommodate the increase in usage.
What are we doing to prevent it from happening again?
The Engineering teams have adjusted resources to better compensate for sudden usage increases and to prevent them from affecting backend services. The improvement in resource allocation and adaptability should prevent similar issues from occurring in the future.