On September 9, 2019, at approximately 10:20 AM Pacific, xMatters monitoring reported an issue to xMatters Customer Support where some customer integrations became unresponsive and stopped processing events. Some customers may have seen integration logs showing a number of errors related to script failures, and some events may not have been properly processed,
When the internal monitoring tools flagged errors in integration scripts, xMatters Customer Support began their investigation. As soon as customers reported issues with their integrations, Customer Support escalated the issue to a Severity-1 Incident and launched the internal major incident management process. They were able to quickly determine that the issue was related to the release of the JDK 11 upgrade, and initiated an immediate code rollback. Once the rollback was complete, the teams confirmed the issue was no longer occurring and that all services had been restored.
The xMatters Engineering teams have examined the errors from the Integration Builder logs, and isolated some differences between the two versions of the JDK that were causing the issue. Specifically, they were able to identify three Java String methods that the previous iteration of the scripting engine could process that were not being handled by GraalJS. The teams added handling to the Integration Builder service that will allow the new scripting engine to process the Java String methods without breaking script functionality. They tested the changes on internal systems and confirmed that while the Integration Builder logs will mark the errors for easy identification, the scripts will continue to execute without any developer or integrator intervention.
Customer Support posted a notice about the upcoming availability of the new scripting engine on the support site at https://support.xmatters.com/hc/en-us/articles/360033568811 and rescheduled the deployment of the JDK upgrade for Tuesday, September 17. In addition, they updated the xMatters Status page (status.xmatters.com) with a scheduled maintenance notice about the change.
September 9, 2019
10:20 AM Internal monitoring flags integration errors along with customer reports of integration errors
10:30 AM Customer Support launches Severity-1 Incident
10:35 AM Issue discovered - rollback to previous version initiated
10:45 AM Verification of resolution and return to normal operation
10:55 AM SEV-1 Issue closed