Issue Discovered - Service disruption with Voice Notifications (All Regions)
Incident Report for xMatters
Postmortem

What happened?

On February 26, 2018, several clients reported an issue to xMatters Client Assistance where they were unable to join conference bridges. Incoming events would initiate a conference properly, and xMatters would promptly deliver notifications via SMS or email inviting recipients to join the conference bridge. However, participants attempting to call in to a conference bridge would encounter a message indicating that an application error had occurred and could not join the conference. During the investigation, the teams discovered that the issue was affecting all voice notifications, not just those related to conference bridges.

Why did it happen?

This issue was caused by human error: during a regularly-scheduled product deployment, one of the xMatters technicians introduced an incorrect configuration change. The change affected the service responsible for voice interactions, causing voice notifications and conference calls to fail and presenting application errors to end users. The nature and extent of the issue were initially concealed by a separate and unrelated issue involving voice notification delivery. Resolving the other issue first allowed the Operations and Client Assistance teams to better see and understand the issue.

How did we respond?

When Client Assistance received the initial reports of a conferencing issue, they engaged the Engineering team to investigate, believing that it was related to the existing voice-related issue. Once the teams identified that the issues were not related, they formed a separate incident response team and quickly identified the root cause affecting voice notifications and conferences. They devised a solution and immediately began deploying the fix to all clients. Once clients confirmed that they were receiving voice calls and could again call in to join a conference, the Client Assistance team notified all of the clients who had logged a support case that the service had been restored.

What are we doing to prevent it from happening again?

To resolve this issue, the xMatters Operations team completed a full audit of the affected voice interaction service and have confirmed that the correct configuration settings are in place across the board. To help prevent similar incidents, the Operations team is reviewing their change control process and updating the deployment playbook to ensure that the correct configurations are applied and tested prior to final deployment.

Timeline (PST):

2018-02-26 12:45PM - Clients report application errors on conferences

2018-02-26 1:30PM - Teams isolate separate issues; incident response team forms

2018-02-26 1:45PM - Issue identified as configuration problem; Operations team develops and deploys configuration update

2018-02-26 2:15PM - Clients confirm conferences are working

2018-02-26 3:50PM - Deployment complete; all services restored

If you have any questions, please visit http://support.xmatters.com

Posted Mar 01, 2018 - 16:10 PST

Resolved
The issue has been addressed, and all services have been restored. Thank you for your patience while we addressed this matter.
Posted Feb 26, 2018 - 14:31 PST
Monitoring
The xMatters Incident Response team has deployed a fix for the issue. We are currently monitoring the situation to ensure the implementation is stable and that all services are restored.
Posted Feb 26, 2018 - 14:27 PST
Update
The xMatters incident response team continues to work to restore service for voice notifications. We will provide another update as soon as one becomes available.
Posted Feb 26, 2018 - 14:14 PST
Identified
The xMatters Incident Response team has identified the source of the issue and is working on a fix. We will update once a solution has been identified and implemented.
Posted Feb 26, 2018 - 13:43 PST
Investigating
The xMatters monitoring tools have identified a potential issue with xMatters On-Demand with voice notifications for some clients globally. We are currently investigating the issue, and will update as information becomes available.
Posted Feb 26, 2018 - 13:37 PST
This incident affected: Europe, Middle East, and Africa (Voice Notifications), Asia Pacific (Voice Notifications), and North America (Voice Notifications).