What happened?
On February 26, 2018, several clients reported an issue to xMatters Client Assistance where they were unable to join conference bridges. Incoming events would initiate a conference properly, and xMatters would promptly deliver notifications via SMS or email inviting recipients to join the conference bridge. However, participants attempting to call in to a conference bridge would encounter a message indicating that an application error had occurred and could not join the conference. During the investigation, the teams discovered that the issue was affecting all voice notifications, not just those related to conference bridges.
Why did it happen?
This issue was caused by human error: during a regularly-scheduled product deployment, one of the xMatters technicians introduced an incorrect configuration change. The change affected the service responsible for voice interactions, causing voice notifications and conference calls to fail and presenting application errors to end users. The nature and extent of the issue were initially concealed by a separate and unrelated issue involving voice notification delivery. Resolving the other issue first allowed the Operations and Client Assistance teams to better see and understand the issue.
How did we respond?
When Client Assistance received the initial reports of a conferencing issue, they engaged the Engineering team to investigate, believing that it was related to the existing voice-related issue. Once the teams identified that the issues were not related, they formed a separate incident response team and quickly identified the root cause affecting voice notifications and conferences. They devised a solution and immediately began deploying the fix to all clients. Once clients confirmed that they were receiving voice calls and could again call in to join a conference, the Client Assistance team notified all of the clients who had logged a support case that the service had been restored.
What are we doing to prevent it from happening again?
To resolve this issue, the xMatters Operations team completed a full audit of the affected voice interaction service and have confirmed that the correct configuration settings are in place across the board. To help prevent similar incidents, the Operations team is reviewing their change control process and updating the deployment playbook to ensure that the correct configurations are applied and tested prior to final deployment.
Timeline (PST):
2018-02-26 12:45PM - Clients report application errors on conferences
2018-02-26 1:30PM - Teams isolate separate issues; incident response team forms
2018-02-26 1:45PM - Issue identified as configuration problem; Operations team develops and deploys configuration update
2018-02-26 2:15PM - Clients confirm conferences are working
2018-02-26 3:50PM - Deployment complete; all services restored
If you have any questions, please visit http://support.xmatters.com