Jump to content

Incidents/20151024-LabsNFS-Lag

From Wikitech
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Summary

NFS lagged for all labs instances when I (milimetric (talk)) executed a

tail -n +2

on a huge 35GB file in /data/project/milimetric/. This used up all the bandwidth.


Timeline

  • 2015-10-24 00:52:44UTC Started the operation
  • 2015-10-24 01:12:42UTC Coren alerted me on IRC
  • 2015-10-24 01:33:16UTC I killed the operation


Conclusions

  • I could have asked someone from ops to do the file operation locally on labstore1002
  • I could have rate limited with pv -L
  • I probably should have just fixed the source file on stat1002 (where it came from originally) and re-copied it to labs