Outlook Calendars for Confluence app inactive and cannot be activated
Incident Report for yasoon
Postmortem

🔮 Executive summary

On the 5th of November 2023, we were made aware of an incident with our Outlook Calendar for Confluence app. A customer notified us on Sunday, the 5th of November, that the calendar app could not be enabled in Confluence. We were able to reproduce this issue in our own environment. We immediately started investigating the issue and found due to a path change in the location of the translation files, which was originally done for our Jira apps, the app would not enable anymore in Confluence. We released a fixed update on the 6th, hoping to re-enable the app for all customers. Unfortunately, once the corrupted update was rolled out to all customer instances, the app would not re-enable on it’s own, even after the fixed update was deployed. On Tuesday, 7th, we send out an email communication to all affected customers, explaining the need to perform a manual update of the app. In parallel, we worked with Atlassian Marketplace support to enable the app for all customers automatically. After a week, on the 16th of November, we were able to resolve the issue fully for all customers.

⛑ Postmortem report

Instructions Report
⚠️ Leadup We switched our apps build process to use translation files from a different path. Unfortunately, while making & validating the change to our Jira apps, an cross-dependency to our Confluence app was not discovered, allowing the new translation paths to go live for the Confluence app as well.
🙅‍♀️ Fault Once the Atlassian Marketplace picked up the app update & started rolling it out to all customer instances automatically, the app became disabled for all customers. Manually enabling the app again fixed the issue after we rolled out a second update on Monday. Unfortunately, the second update failed to re-enable the app for all affected customers, so a manual action was necessary (at first).
🥏 Impact The app was disabled for all customers, removing all UIs entry points from Confluence and preventing users from accessing the app.
👁 Detection We learned about the issue a few hours after the update rolled out on the Marketplace on Nov. 5th.
🙋‍♂️ Response . After the noticed the issue, we immediately began troubleshooting and located the issue in the translation paths section of the Atlassian Connect manifest file of our app. Apps pointing to non-existing translations will validate the schema correctly and also install, but fail to get into the “Enabled” state. Once in “Disabled” state, there is no way for a Marketplace vendor to get the app back to enabled.
🙆‍♀️ Recovery Once we sent out responses to the support tickets and an email communication to all affected customers, we saw apps being manually re-enabled. After ~a week, Atlassian confirmed the run of a script which re-enabled the app for all affected customers, with the exception of a few instances which churned during that period of time.
🔎 Root cause identification The root case was already identified during the development of the fix. A change to a build process for our Jira apps caused a missing translation file path to be introduced for our Confluence app.
🤔 Lessons learned We will use this incident as a learning to improve in the following areas: Improve our release process to validate a full install of the resulting Connect app manifest file to prevent any erroneous updates to be delivered to customer instances. This is especially important since the Atlassian Marketplace does not seem to validate all edge-cases before installing the app in the cloud instances.

⏱ Incident timeline

Time What
2023-11-05 9:48 PM CET First email received from customer notifying us about the issue
2023-11-06 10:22 PM CET   Raised ticket with Atlassian, letting them know we cannot fix the issue on our own, due to a quirk in how the Marketplace installs updates in cloud instances
2023-11-06 22:42 PM CET PR merged with the fix and update released on the Atlassian Marketplace
2023-11-07 15:00 PM CET Send out email communication to all affected customers, letting them know about the issue
2023-11-16 15:30 PM CET Atlassian notifies us that a script has been executed manually to enable the app again for all customers

✅ Follow-up tasks

List the issues created to prevent this class of incident in the future.

Problem Action items
Reliance on CI/CD alone to catch issues was not sufficient to catch all issues with Connect manifest. Schema validation & Atlassian Marketplace do not catch all error cases, allowing erroneous app updates to ship to all customer instances Introduce new pipeline checks, validating Connect manifest install in production environment before go-live
Posted Dec 04, 2023 - 17:07 CET

Resolved
We have successfully worked with Atlassian to re-enable Outlook Calendars for Confluence on all instances, so no manual activation should be necessary anymore. Sorry for the disruption, we'll make sure to post a post mortem in a timely manner.
Posted Nov 16, 2023 - 16:26 CET
Update
We are still working with Atlassian to bulk-enable the app for all instances again. It's still possible to manually enable the app again, so we are lowering the impact.
Posted Nov 14, 2023 - 11:06 CET
Update
After monitoring the situation, it appears that the app will not re-enable itself automatically, but this can be done manually via "Manage apps". We are working with Atlassian to re-enable the app for all customers.
Posted Nov 07, 2023 - 09:44 CET
Update
A fix has been implemented. Currently a manual update of the app is required via "Manage apps", but should be picked up automatically after 24h.
Posted Nov 06, 2023 - 16:33 CET
Monitoring
A fix has been implemented. Currently a manual update of the app is required via "Manage apps", but should be picked up automatically after 24h.
Posted Nov 06, 2023 - 16:32 CET
Identified
We have identified a deployment issue with our Outlook calendar app. The app can currently not be used, as it cannot be activated since the last update. We are working to provide an update today that will fix the issue.
Posted Nov 06, 2023 - 10:15 CET
This incident affected: Outlook Calendars for Confluence.