Issue Discovered - Service disruption in All Regions – Web User Interface
Incident Report for xMatters
Postmortem

What happened?

On July 13, 2020, at approximately 16:26 PDT, some customers reported issues with logging into the xMatters web user interface (web UI). When attempting to log in via SSO/SAML, some customers received one of two error messages:

"We've run into a problem while retrieving your data. Refresh the page to try again or, if the problem persists, contact xMatters Client Assistance" or "Server Internal Error on URI /sp/SSO.saml2; error: RFC6265 Cookie values may not contain character...".

The issue only affected customers using SSO/SAML to login to the web UI. There was no impact to event processing or notification delivery, and customers using the native xMatters login did not experience the issue.

Why did it happen?

The investigation discovered that an update to the service which supports the xMatters Web UI impacted authentication using SSO/SAML. The update applied compliance to RFC 6265, which defines how web cookies are used and no longer supports for version-1 cookies. Since many SSO systems rely on cookies, this resulted in some cookies being read as invalid, causing the login error.

How did we respond?

As soon as Customer Support verified that there were login issues via SSO, they escalated the issue to a Severity-1 incident and initiated the internal major incident management process. When the incident team identified that the issue was related to the latest release of the web UI service, they rolled back to the previous version of the service. Once the rollback was complete, the incident team and customers verified that SSO login functioned as expected.

What are we doing to prevent it from happening again?

The xMatters Engineering team is working to ensure backwards compatibility with version-1 cookies before the next release of the web UI service. The team is also updating the internal QA processes to add a verification of SSO/SAML for all cookie versions.

Timeline:

July 12, 2020

15:50 Release of new web UI version
16:26 Customers begin to report errors when logging in with SAML/SSO
16:43 Issue linked to web UI release; Severity-1 incident initiated
16:50 web UI service rollback initiated
16:59 Rollback complete
17:01 Customers confirm ability to log in
17:05 Incident resolved

Posted Jul 16, 2020 - 08:25 PDT

Resolved
The issue has been addressed, and all services have been restored. Thank you for your patience while we addressed this matter.
Posted Jul 13, 2020 - 17:02 PDT
Monitoring
The xMatters Incident Response team has deployed a fix for the issue. We are currently monitoring the situation to ensure the implementation is stable and that all services are restored.
Posted Jul 13, 2020 - 16:59 PDT
Identified
xMatters monitoring tools have identified a potential issue with the xMatters Web User Interface for some clients in All Regions. We are currently investigating the issue and will update as information becomes available.

If you are also experiencing issues, or if you're not sure whether this issue impacts your service, please contact xMatters Client Assistance at https://support.xmatters.com/hc/en-us/requests/new - our support agents are waiting to help.
Posted Jul 13, 2020 - 16:55 PDT
This incident affected: Europe, Middle East, and Africa (Web Interface), North America (Web Interface), and Asia Pacific (Web Interface).