Issue Discovered - Service disruption in North America
Incident Report for xMatters
Postmortem

What happened?

On Friday, April 6, 2016, at 11:25am PST, the xMatters network monitoring systems alerted the Operations team to an issue with the On-Demand services in one of the data centers located in North America. Some users may have briefly experienced intermittent access to the user interface, and a delay or rejection when injecting an event into xMatters.

Why did it happen?

The root cause of this issue was a brief service outage experienced by the primary Internet service provider (ISP) for one of the North American data centers.

How did we respond?

As soon as the xMatters network monitoring tools detected unreliable connectivity and notified the Client Assistance and Operations teams, they initiated the internal Major Incident Management process and posted a bulletin to the xMatters status page. The incident response teams began simultaneously investigating the underlying cause and working to restore services for clients. During the investigation, the teams determined that the impacted data center was not accessible over the public internet. Over the next few minutes, the ISP was restoring services and connectivity with the affected data center was intermittent. During the event, automatic network failovers to other providers was simultaneously occurring. Once services were fully restored, the incident team made the decision to halt any failovers to the other data center. They continued to monitor the situation closely over the next several hours, but no further issues occurred.

What are we doing to prevent it from happening again?

xMatters uses multiple network backbones and performs failover to other networks by routing traffic through other data centers in the event of an Internet failure. During this event, these systems were working as designed and connectivity was re-established within the expected period of re-convergence.

Timeline:

2018-04-06 11:25AM - xMatters monitoring tools alert the Operations and Client Assistance team of accessibility issues with one of the data centers in North America

2018-04-06 11:30AM - Internal Major Incident process initiated

2018-04-06 11:30AM - Support bulletin posted: http://status.xmatters.com/incidents/gymstt3gz0s2

2018-04-06 11:30AM - xMatters monitoring tools report all systems are restored

2018-04-06 11:35AM - Issue is identified as an outage with the primary Internet service provider

2018-04-06 11:35AM - All services are confirmed restored

If you have any questions, please visit http://support.xmatters.com

Posted 2 months ago. Apr 17, 2018 - 17:27 PDT

Resolved
The issue has been addressed, and all services have been restored. Thank you for your patience while we addressed this matter.
Posted 2 months ago. Apr 06, 2018 - 11:42 PDT
Monitoring
The xMatters Incident Response team has deployed a fix for the issue. We are currently monitoring the situation to ensure the implementation is stable and that all services are restored.
Posted 2 months ago. Apr 06, 2018 - 11:35 PDT
Investigating
The xMatters monitoring tools have identified a potential issue with xMatters On-Demand for some clients located in North America accessing the User Interface. We are currently investigating the issue, and will update as information becomes available.

If you are also experiencing issues, or if you're not sure whether this issue impacts your service, please contact xMatters Client Assistance at https://support.xmatters.com/hc/en-us/requests/new - our support agents are waiting to help.
Posted 2 months ago. Apr 06, 2018 - 11:30 PDT
This incident affected: North America (Web Interface, Mobile Interface, Email Notifications, SMS Notifications, Voice Notifications, Mobile Push Notifications, Conferencing, Integration Platform, REST API, Email Initiation).