Lost Server Access — Software Postmortem

Christian Martinez
2 min readFeb 21, 2022

Issue Summary:

  • The SSH connection of a Web server was blocked, after installing MySQL.
  • The duration of the issue was 22 hours; From 2/17/2022 10:05 UTC -5, to 2/19/2022 07:58 UTC -5.
  • Due to the inaccessibility of the server, the update fixes planned during said 22 hours were postponed. We also couldn’t continue with the MySQL database linking.
  • The end-user was not affected. However, the client was negatively impacted since we couldn’t complete the database linking by the time planned.
  • The root cause of this issue was an incompatible MySQL version (v5.0.12) installed blocked the ports to connect via SSH to the server.

Timeline:

  • The issue was detected at 2/17/2022 10:22 UTC -5.
  • The issue was detected the SSH connection was closed, and the developer Christian Martinez tried to reconnect with the server.
  • The actions taken by Christian Martinez were soft rebooting and hard rebooting the server.
  • Christian’s lead (David Arias) rebooted the server and reconfigured the private and public keys, to no avail.
  • It was escalated to tier 3, and 2 additional technical leads were working on solving this issue alongside Christian and David; Andres Ramirez and Luis Hernandez.
  • It was never thought the server had hardware issues, since it was still responding to HTTP requests.
  • It was known the firewall was blocking the SSH connection. However, it couldn’t be fixed by reconfiguring the Firewall, since anyone could reconnect with the server.

Root cause:

  • Installing an incompatible MySQL version misconfigured the server’s Firewall, blocking all SSH access to the server.

Resolution:

  • The incident was resolved by destroying the server (software based), initializing a new one, configuring it (Apache, Firewall, Load balancer), and finally installing the compatible MySQL packages (V8.028)
  • Liking was effective once the new server was created and configured.

Preventive Measurements:

  • The correct commands used were added to the server documentation to avoid further issues. They were also added to this Github repo for further reference.
  • In case the server access is blocked, contact your lead for troubleshooting steps immediately.
  • Only leads can follow the troubleshooting steps mentioned.

Software Postmortem inspired by FreeCodeCamp.org

--

--

Christian Martinez

I'm a Software Development student, musician and Athlete.