Issue Discovered - Service disruption in Asia Pacific Region - Mobile App
Incident Report for xMatters
Postmortem

What happened?

At 2:56pm (PST) on Tuesday, January 8, 2018, some clients within the Asia-Pacific region reported an issue to xMatters Client Assistance where some of their users were not able to access the xMatters mobile app on either the iOS or Android platforms.

Why did it happen?

This issue occurred when the mobile API services were unable to communicate with certain xMatters components due to the re-use of network addresses which were not configured appropriately. This caused firewalls to reject traffic from the mobile API services.

How did we respond?

As soon as the xMatters Client Assistance team was alerted to a potential issue with the mobile app, they began to investigate the root cause. Once the team replicated the issue, they immediately initiated the internal Severity-1 process and engaged the incident response teams to begin simultaneously investigating the underlying cause and working to restore services for clients. The Client Assistance team posted a bulletin to the Support site informing clients about the issue and updated it throughout the incident. During the investigation, the Operations team identified the underlying issue as a networking problem caused by the re-use of certain network addresses that were not cleaned up after previous software decommissioning, and immediately applied a fix to the configuration. The Client Assistance team then confirmed that all services had been restored and were functioning as expected.

What are we doing to prevent it from happening again?

To prevent this issue from happening again, xMatters will conduct a complete audit of the configuration rules to ensure that they are accurate and up-to-date. The audit will also be added to the software decommissioning process to ensure it is completed on a regular, ongoing basis. (Currently in progress; xMatters internal reference: COREL-5566 - Deprecated NAT Rules Still Exist)

Timeline:

2018-01-08 02:56 - xMatters alerted that some clients cannot access the mobile app on iOS and Android

2018-01-08 02:58 - Client Assistance begins troubleshooting the issue

2018-01-08 03:10 - Severity-1 process initiated

2018-01-08 03:12 - Support Bulletin posted: http://status.xmatters.com/incidents/d1hgv64fqbkw

2018-01-08 03:13 - Incident response teams formed; investigation continues with additional resources

2018-01-08 03:38 - Operations team identifies the issue and begins working on a solution

2018-01-08 03:58 - Operations team applies the fix

2018-01-08 04:02 - Services confirmed as restored   If you have any questions, please visit http://support.xmatters.com

Posted Jan 10, 2018 - 13:51 PST

Resolved
The issue has been addressed, and all services have been restored. Thank you for your patience while we addressed this matter.
Posted Jan 08, 2018 - 16:02 PST
Monitoring
The xMatters Incident Response team has deployed a fix for the issue. We are currently monitoring the situation to ensure the implementation is stable and that all services are restored.
Posted Jan 08, 2018 - 15:58 PST
Identified
The xMatters Incident Response team has identified the source of the issue and is working on a fix. We will update once a solution has been identified and implemented.
Posted Jan 08, 2018 - 15:38 PST
Investigating
The xMatters monitoring tools have identified a potential issue with xMatters On-Demand for some clients located in the Asia Pacific region. Some clients may be experiencing an issue with accessing the mobile app. We are currently investigating the issue, and will update as information becomes available.
Posted Jan 08, 2018 - 15:12 PST