Post-Mortem: 11/21/2023 (Desktop Calling Interruption)
NaKeisha Barry avatar
Written by NaKeisha Barry
Updated over a week ago

Impact:

  • Region: All Regions

  • Impact: users were unable to make or accept calls on the desktop app. All other calling methods (mobile, SIP, IVR) and channels (SMS, Email, Web Meetings, Transcription) were uninterrupted

Timeline:

  • 7:53 CST - first support report

  • 8:30 CST - issue identified

  • 9:00 CST - new SSL certificates issued

  • 10:40 CST - new SSL certificates validated and redeployment beigns

  • 11:10 CST - issue resolved

Root Cause Analysis:

  • Invalid SSL certificates caused websocket connections from the app to our cloud infrastructure to fail. Because we distribute traffic globally through a single global address to make it easy for customers to integrate with our infrastructure, all regions that were online at the time were impacted

  • Almost all SSL certificates in our core infrastructure are managed by AWS or are internally used and self-signed. The one exception is our WEBRTC infrastructure which requires a public SSL certificate that is deployed directly onto our SIP gateways, which is not possible to do with AWS certificates.

  • During a cleanup effort of our SSL certificates earlier this year, all certificates and services were mapped to ensure proper/prompt replacement; we discovered during the incident that there were two sets of certificates with overlapping domain coverage and different expiry dates. We discovered during the incident that the one selected (the more specific one) was the wrong one as it did not match the private key file in our keystore. As a result we had to generate new certificates, which took a majority of the time.

  • Once the new certificates were deployed everything went back to normal immediately.

Short Term Mitigation

  • N/A - all of our SSL certificates except this one are issued, monitored and used in AWS core infrastructure. The reas

Long Term Mitigation

  • We are conducting an audit to ensure all public SSL keys in our inventory are matched to the right private keys.

  • We will be generating new long-term SSL certificates during the holidays in order to a) ensure they are again on a 5 year window and b) that in the event of an expiry it will always be in the 'dead' season between the 12/24-1/1 window.

Did this answer your question?