Issue Discovered - Service disruption in North America – Multiple Services

Incident Report for xMatters

Postmortem

What happened?

On August 5th, 2025, at approximately 7:53 PM Pacific, the xMatters internal monitoring tools alerted the xMatters Support team to an issue in the North America region with notification delivery. Some xMatters customers may have experienced delays when receiving notifications or noticed failed alerts in the system.

Why did it happen?

The issue occurred because of a network problem that disrupted communication between parts of the queuing system. This caused some components to become temporarily out of sync, leading to timeouts, internal connectivity failures, and a small number of messages not being processed during the disruption. As the system recovered, some performance was degraded until normal operation could be restored.

How did we respond?

As soon as the xMatters monitoring tools reported an issue, the xMatters Support Team initiated the internal Major Incident Management process and engaged the Engineering and incident response teams. The teams quickly mitigated the issue and restored performance by redirecting traffic from our message queuing system in the affected region (us-central) to our message queuing system in another local region (us-east). After the issue was resolved, the teams directed traffic back to the restored region.

What are we doing to prevent it from happening again?

The Engineering team has determined that the best approach to prevent this issue from reoccurring is to replace the current message queuing system. Work on the replacement system is well underway and the teams will retire the current system as soon as they have finished the replacement.

Posted Aug 13, 2025 - 18:19 PDT

Resolved

The issue has been addressed, and all services have been restored. Thank you for your patience while we addressed this matter.
Posted Aug 05, 2025 - 22:18 PDT

Monitoring

The xMatters Incident Response team has deployed a fix for the issue. We are currently monitoring the situation to ensure the implementation is stable and that all services are restored.
Posted Aug 05, 2025 - 20:59 PDT

Identified

The xMatters Incident Response team has identified an issue with our message routing system and is working on a fix. We will update once a solution has been identified and implemented.
Posted Aug 05, 2025 - 20:44 PDT

Investigating

xMatters monitoring tools have identified a potential issue with xMatters On-Demand for clients in North America that are hosted in the on us-central1 region. We are currently investigating the issue and will update as information becomes available.

Please see incident details for specific services impacted.

If you are also experiencing issues, or if you're not sure whether this issue impacts your service, please contact xMatters Client Assistance at https://support.xmatters.com/hc/en-us/requests/new - our support team is waiting to help.
Posted Aug 05, 2025 - 20:27 PDT
This incident affected: North America (Email Notifications, SMS Notifications, Voice Notifications).