On August 17, 2018 at approximately 1:15 AM AEST, the xMatters monitoring tools alerted the Client Assistance team to an issue with our hosting service in the Asia-Pacific (APAC) region, specifically Australia. During the incident, xMatters was not accepting or processing events, and some customers may have encountered errors when attempting to access their xMatters instance.
This issue was due to a loss of connectivity for one of the xMatters data centers in the APAC region, which was caused when the data center's Internet service provider experienced a failure.
As soon as the issue was detected, the Client Assistance team initiated the internal major incident management process and posted a notice to the xMatters Status page. The incident response teams quickly determined that all internal services were functioning normally, and that external factors were preventing access to xMatters instances. The teams immediately escalated the issue to the data center provider, and then implemented a workaround to route traffic through another data center in the APAC region, which restored service to impacted customers without the need for a complete system failover.
Meanwhile, the data center provider confirmed that they were experiencing connectivity issues and began working to restore their service. As soon as the provider implemented a fix for the issue, full service was restored and all instances reported functional and healthy.
xMatters uses multiple network backbones and performs failover to other networks by routing traffic through alternate data centers in the event of an Internet failure. During this event, these systems worked as designed. The loss of network connectivity at the ISP level is being addressed with xMatters' ISPs to ensure they are monitoring and addressing any network issues in a timely manner. xMatters has requested an RCA from the ISP responsible for the APAC data center, and we will update customers as necessary with any relevant conclusions.
To help avoid these types of issues, which can be difficult to predict and prevent, we are conducting hosting service improvements to our infrastructure-as-a-service, scheduled to occur in the Asia-Pacific region in October 2018. For more information, see the article on our support site: https://support.xmatters.com/hc/en-us/articles/115005269506
01:15 AM - Monitoring tools alert Client Assistance to an issue with instances in the Asia-Pacific region
01:22 AM - Client Assistance initiates Severity-1 incident management process, launches investigation
01:56 AM - Issue identified as impacting all clients hosted in one APAC data center
02:17 AM - Client Assistance posts status update: https://status.xmatters.com/incidents/b071x1h75nxj
02:18 AM - Engineering team implements a workaround to restore services to the data center
02:28 AM - Client Assistance verifies workaround has restored access
04:52 AM - ISP implements solution
05:19 AM - All services restored
If you have any questions, please visit http://support.xmatters.com