What happened?
On July 13, 2020, at approximately 16:26 PDT, some customers reported issues with logging into the xMatters web user interface (web UI). When attempting to log in via SSO/SAML, some customers received one of two error messages:
"We've run into a problem while retrieving your data. Refresh the page to try again or, if the problem persists, contact xMatters Client Assistance" or "Server Internal Error on URI /sp/SSO.saml2; error: RFC6265 Cookie values may not contain character...".
The issue only affected customers using SSO/SAML to login to the web UI. There was no impact to event processing or notification delivery, and customers using the native xMatters login did not experience the issue.
Why did it happen?
The investigation discovered that an update to the service which supports the xMatters Web UI impacted authentication using SSO/SAML. The update applied compliance to RFC 6265, which defines how web cookies are used and no longer supports for version-1 cookies. Since many SSO systems rely on cookies, this resulted in some cookies being read as invalid, causing the login error.
How did we respond?
As soon as Customer Support verified that there were login issues via SSO, they escalated the issue to a Severity-1 incident and initiated the internal major incident management process. When the incident team identified that the issue was related to the latest release of the web UI service, they rolled back to the previous version of the service. Once the rollback was complete, the incident team and customers verified that SSO login functioned as expected.
What are we doing to prevent it from happening again?
The xMatters Engineering team is working to ensure backwards compatibility with version-1 cookies before the next release of the web UI service. The team is also updating the internal QA processes to add a verification of SSO/SAML for all cookie versions.
Timeline:
July 12, 2020
15:50 Release of new web UI version
16:26 Customers begin to report errors when logging in with SAML/SSO
16:43 Issue linked to web UI release; Severity-1 incident initiated
16:50 web UI service rollback initiated
16:59 Rollback complete
17:01 Customers confirm ability to log in
17:05 Incident resolved