Incidents/20150625-LabsNFSOutage
Appearance
Summary
There was a 12 minute Labs NFS outage (and subsequently Tools outage) caused by human error (reboot entered on the wrong terminal accidentally). The system came back up, the arrays were assembled and NFS was back up in 12 minutes.
Timeline
16:35: Yuvi accidentally types reboot into labstore1002 instead of a labs instance awaiting NFS removal.
16:38: Machine comes back up, Coren starts assembling arrays and bringing the NFS server back up
16:47: Instances all recovered from NFS stall, everything working ok
Actionables
- Feed Yuvi to a Moose
Not done Yuvi, we love you! - Install the molly-guard package on production hosts https://phabricator.wikimedia.org/T103873
Done