Lost Server Access — Software Postmortem
2 min readFeb 21, 2022
Issue Summary:
- The SSH connection of a Web server was blocked, after installing MySQL.
- The duration of the issue was 22 hours; From 2/17/2022 10:05 UTC -5, to 2/19/2022 07:58 UTC -5.
- Due to the inaccessibility of the server, the update fixes planned during said 22 hours were postponed. We also couldn’t continue with the MySQL database linking.
- The end-user was not affected. However, the client was negatively impacted since we couldn’t complete the database linking by the time planned.
- The root cause of this issue was an incompatible MySQL version (v5.0.12) installed blocked the ports to connect via SSH to the server.
Timeline:
- The issue was detected at 2/17/2022 10:22 UTC -5.
- The issue was detected the SSH connection was closed, and the developer Christian Martinez tried to reconnect with the server.
- The actions taken by Christian Martinez were soft rebooting and hard rebooting the server.
- Christian’s lead (David Arias) rebooted the server and reconfigured the private and public keys, to no avail.
- It was escalated to tier 3, and 2 additional technical leads were working on solving this issue alongside Christian and David; Andres Ramirez and Luis Hernandez.
- It was never thought the server had hardware issues, since it was still responding to HTTP requests.
- It was known the firewall was blocking the SSH connection. However, it couldn’t be fixed by reconfiguring the Firewall, since anyone could reconnect with the server.
Root cause:
- Installing an incompatible MySQL version misconfigured the server’s Firewall, blocking all SSH access to the server.
Resolution:
- The incident was resolved by destroying the server (software based), initializing a new one, configuring it (Apache, Firewall, Load balancer), and finally installing the compatible MySQL packages (V8.028)
- Liking was effective once the new server was created and configured.
Preventive Measurements:
- The correct commands used were added to the server documentation to avoid further issues. They were also added to this Github repo for further reference.
- In case the server access is blocked, contact your lead for troubleshooting steps immediately.
- Only leads can follow the troubleshooting steps mentioned.
Software Postmortem inspired by FreeCodeCamp.org