Post-mortem: Downtime on November 11, 2018

After today’s deployment, we faced a downtime of our reference server. We want to give you some insight into what happened.

What Happened?

At 09:41 (GMT+1), we deployed a new version to our reference server build.opensuse.org. Right after the deployment, the application didn’t boot anymore and displayed our error page.

We immediately recognized from the Apache log file a conflict between dependencies of Passenger (our web server). After fixing the conflict dependency, we were back online at 09:47 (GMT+1).

Why Did It Happen?

Last week, we updated the gem rack due a security issue. However, it was unclear to us that Passenger does not use the bundled gems, but the Ruby gems of the system.

After updating the obs-api package via zypper up, Passenger started to complain about the rack version. Updating the system package for rack (via zypper up rubygem-rack) fixed the issue.

How Are We Going to Do Better in the Future?

We updated our spec file to make sure the system gems for rack and rake match the gems installed by bundle gems service. The necessary changes and more information can be found in the PR #6234