Issue Discovered - Service disruption in North American Region - Multiple Services

Incident Report for xMatters

Postmortem

What happened?

On November 21, 2025, at 12:50 PM UTC, the xMatters internal monitoring tools detected irregular behavior in how internal traffic was being routed. Some customers in the APAC region communicating with services in North America (specifically US-East) may have encountered intermittent request failures or increased latency. Only traffic between these two regions was affected; all other systems and regions continued normal operations.

Why did it happen?

A temporary network disruption between Australia Southeast and US-East caused one internal routing node in Australia to lose accurate information about available backend systems in USEast. The node generated an incomplete routing configuration and temporarily stopped directing traffic to US-East. Under normal circumstances, routing updates refresh automatically when connectivity returns. In this case, the affected node did not recover cleanly and remained in a stale state until Engineering intervened.

How did we respond?

As soon as Engineering was alerted through internal monitoring, they engaged with the platform engineering team, service owners and Customer Support to launch an investigation. The teams reached out to impacted customers to validate issue symptoms and restarted routing components in both affected regions to force a configuration refresh. Once the restart completed, routing returned to normal levels while the teams continued to monitor and investigate the root cause. They were able to confirm that only one routing node and specific cross-region traffic was impacted.

What are we doing to prevent it from happening again?

While teams were mitigating this issue, they created new alerting rules to detect the routing patterns they observed during the incident and expanded internal monitoring to help identify when routing nodes fail to refresh their configuration or otherwise enter a ‘stale’ state. The teams also have planned and prepared infrastructure updates that will further reduce the risk of similar issues. These include improved configuration recovery behavior, enhanced stability for routing components, and additional logging and observability improvements for diagnosing routing anomalies. They will deploy these updates once the current code freeze window has elapsed.

Posted Nov 27, 2025 - 10:47 PST

Resolved

The issue has been addressed, and all services have been restored. Thank you for your patience while we addressed this matter.
Posted Nov 21, 2025 - 06:01 PST

Update

We are continuing to investigate this issue.
Posted Nov 21, 2025 - 05:55 PST

Investigating

xMatters monitoring tools have identified a potential issue with xMatters On-Demand for some clients located in the North America region. This is affecting users located near the Asia Pacific Region attempting to access xMatters. We are currently investigating the issue and will update as information becomes available.

Please see incident details for specific services impacted.

If you are also experiencing issues, or if you're not sure whether this issue impacts your service, please contact xMatters Client Assistance at https://support.xmatters.com/hc/en-us/requests/new - our support agents are waiting to help.
Posted Nov 21, 2025 - 05:54 PST
This incident affected: North America (Web Interface, Email Notifications, SMS Notifications, Voice Notifications, Conferencing, Integration Platform, API, Mobile App).