Issue Discovered - Service disruption in Asia Pacific Region
Incident Report for xMatters
Postmortem

What happened?

On August 17, 2018 at approximately 1:15 AM AEST, the xMatters monitoring tools alerted the Client Assistance team to an issue with our hosting service in the Asia-Pacific (APAC) region, specifically Australia. During the incident, xMatters was not accepting or processing events, and some customers may have encountered errors when attempting to access their xMatters instance.

Why did it happen?

This issue was due to a loss of connectivity for one of the xMatters data centers in the APAC region, which was caused when the data center's Internet service provider experienced a failure.

How did we respond?

As soon as the issue was detected, the Client Assistance team initiated the internal major incident management process and posted a notice to the xMatters Status page. The incident response teams quickly determined that all internal services were functioning normally, and that external factors were preventing access to xMatters instances. The teams immediately escalated the issue to the data center provider, and then implemented a workaround to route traffic through another data center in the APAC region, which restored service to impacted customers without the need for a complete system failover.

Meanwhile, the data center provider confirmed that they were experiencing connectivity issues and began working to restore their service. As soon as the provider implemented a fix for the issue, full service was restored and all instances reported functional and healthy.

What are we doing to prevent it from happening again?

xMatters uses multiple network backbones and performs failover to other networks by routing traffic through alternate data centers in the event of an Internet failure. During this event, these systems worked as designed. The loss of network connectivity at the ISP level is being addressed with xMatters' ISPs to ensure they are monitoring and addressing any network issues in a timely manner. xMatters has requested an RCA from the ISP responsible for the APAC data center, and we will update customers as necessary with any relevant conclusions.

To help avoid these types of issues, which can be difficult to predict and prevent, we are conducting hosting service improvements to our infrastructure-as-a-service, scheduled to occur in the Asia-Pacific region in October 2018. For more information, see the article on our support site: https://support.xmatters.com/hc/en-us/articles/115005269506

Timeline:

01:15 AM - Monitoring tools alert Client Assistance to an issue with instances in the Asia-Pacific region

01:22 AM - Client Assistance initiates Severity-1 incident management process, launches investigation

01:56 AM - Issue identified as impacting all clients hosted in one APAC data center

02:17 AM - Client Assistance posts status update: https://status.xmatters.com/incidents/b071x1h75nxj

02:18 AM - Engineering team implements a workaround to restore services to the data center

02:28 AM - Client Assistance verifies workaround has restored access

04:52 AM - ISP implements solution

05:19 AM - All services restored

If you have any questions, please visit http://support.xmatters.com

Posted Aug 27, 2018 - 12:27 PDT

Resolved
The issue has been addressed, and all services have been restored. Thank you for your patience while we addressed this matter.
Posted Aug 16, 2018 - 12:19 PDT
Monitoring
The xMatters Incident Response team has identified the source of the issue and has put in a workaround in place. We will update once a solution has been identified and implemented.
Posted Aug 16, 2018 - 09:28 PDT
Investigating
The xMatters monitoring tools have identified a potential issue with xMatters On-Demand for some clients located in the Asia Pacific region. We are currently investigating the issue, and will update as information becomes available.

If you are also experiencing issues, or if you're not sure whether this issue impacts your service, please contact xMatters Client Assistance at https://support.xmatters.com/hc/en-us/requests/new - our support agents are waiting to help.
Posted Aug 16, 2018 - 09:17 PDT
This incident affected: Asia Pacific (Web Interface, Email Notifications, SMS Notifications, Voice Notifications, Conferencing, Integration Platform, API, Mobile App).