Issue Discovered - Service disruption in Asia Pacific Region
Incident Report for xMatters
Postmortem

What happened?

On July 19, 2018, at approximately 03:17 AM AEST, the xMatters monitoring tools alerted to an issue with our hosting service in the Australian region. During the incident, which lasted less than 30 minutes, some customers reported encountering errors when they attempted to access their xMatters instance, and events and notifications were not being accepted or processed.

Why did it happen?

The root cause of this issue was traced to a service failure with an upstream provider for one our data centers in the Asia-Pacific region. Our automated failover process to another data center restored connectivity for some customers but the problem was resolved within 30 minutes of first report, before the failover completed.

How did we respond?

As soon as the issue was detected, the Client Assistance team initiated the internal major incident management process and posted a notice to the xMatters Status page. The incident response teams quickly determined that external issues were preventing access to xMatters instances as all internal services were functioning normally. The teams immediately escalated the incident to the data center provider, who confirmed that they were experiencing issues. While the provider continued to investigate and attempt to restore their service, the incident response teams began implementing work-around solutions to bypass the problematic data center. During the implementation of the workaround, the provider restored service and all instances were reported as functional and healthy.

What are we doing to prevent it from happening again?

At xMatters, we understand that availability is at the core of our service and treat the requirements of our customers as a mission critical service. This disruption was caused by an unexpected network event that affected the entire hosting data center. The data center provider is currently conducting an internal investigation and providing more information they discover it. The provider is also continuing their internal processes and working with their network vendors to identify any potential remediation actions, including replacing the impacted hardware.

While these kinds of issues are difficult to predict and prevent, the xMatters teams are continually reviewing the failover processes and seeking to identify any potential areas of improvement or ways to reduce the amount of time required to get clients back online. As part of this commitment, we are conducting hosting service improvements to our infrastructure-as-a-service, scheduled to occur in the Asia-Pacific region in October 2018. For more information, see the article on our support site: https://support.xmatters.com/hc/en-us/articles/115005269506

Timeline:

July 19, 2018 - 03:17 AM - Monitoring tools alert to an issue in the Australian region

03:22 AM - Client Assistance initiates major incident management process, launches investigation

03:24 AM - Status page updated : https://status.xmatters.com/incidents/qfjqy9pfpntd

03:24 AM - Issue identified as external

03:24 AM - Issue reported to data center provider

03:24 AM - Operations begins work-around to attempt to mitigate issue for xMatters customers

03:28 AM - Majority of customers reported back up

03:35 AM - All services restored

If you have any questions, please visit http://support.xmatters.com

Posted Jul 24, 2018 - 12:57 PDT

Resolved
The issue has been addressed, and all services have been restored. Thank you for your patience while we addressed this matter.
Posted Jul 18, 2018 - 10:50 PDT
Monitoring
The xMatters Incident Response team has deployed a fix for the issue. We are currently monitoring the situation to ensure the implementation is stable and that all services are restored.
Posted Jul 18, 2018 - 10:43 PDT
Identified
The xMatters Incident Response team has identified the source of the issue and is working on a fix. We will update once a solution has been identified and implemented.
Posted Jul 18, 2018 - 10:28 PDT
Investigating
The xMatters monitoring tools have identified a potential issue with xMatters On-Demand for some clients located in the Asia Pacific region. We are currently investigating the issue, and will update as information becomes available.

If you are also experiencing issues, or if you're not sure whether this issue impacts your service, please contact xMatters Client Assistance at https://support.xmatters.com/hc/en-us/requests/new - our support agents are waiting to help.
Posted Jul 18, 2018 - 10:24 PDT
This incident affected: Asia Pacific (Web Interface, Email Notifications, SMS Notifications, Voice Notifications, Conferencing, Integration Platform, API, Mobile App).