User:Razzi/Plan to drain hadoop cluster

The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html

for production, draining cluster: shutting down input disabling camus timers on an-launcher

by disabling, no data flowing in

some jobs like refine are scheduled

Should drain in less than an hour

7-day retention in kafka; kafka used as buffer

now that we have capacity scheduler, you can disable queues

Plan:

disable puppet on an-master1002
- sudo puppet agent --disable 'razzi: upgrade hadoop masters to debian buster'
Disable jobs on an-launcher1002
- sudo systemctl stop 'camus-*'
- sudo systemctl stop 'drop-*'
- sudo systemctl stop 'hdfs-*'
- sudo systemctl stop 'mediawiki-*'
- sudo systemctl stop 'refine_*'
- sudo systemctl stop 'refinery-*'
- sudo systemctl stop 'reportupdater-*'
disable queue
- sudo systemctl stop hadoop-yarn-resourcemanager^[1]
kill yarn applications
- for jobId in $(yarn application -list | awk 'NR > 2 { print $1 }'); do yarn application -kill $jobId; done
enable safe mode
- sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -safemode enter
checkpoint
- sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -saveNamespace
create snapshot tar
- sudo su
- cd /srv/hadoop/namenode
- tar -czf /home/razzi/hdfs-namenode-snapshot-buster-reimage-$(date --iso-8601).tar.gz current
copy snapshot to elsewhere
- (from my personal computer)
- scp -3 an-master1001.eqiad.wmnet:/home/razzi/hdfs-namenode-snapshot-buster-reimage-$(gdate --iso-8601).tar.gz thorium.eqiad.wmnet:/home/razzi/hdfs-namenode-snapshot-buster-reimage-$(gdate --iso-8601).tar.gz
  - Based on scp-ing a test file, this will take about 30 minutes; that's acceptable, but if there's a faster way (distcp?) it'd be good to know
change uids
reimage

stop the cluster

make a backup

change uids

reimage

↑ https://docs.cloudera.com/runtime/7.2.1/yarn-allocate-resources/topics/yarn-start-and-stop-queues.html this is for cloudera ui :P https://stackoverflow.com/questions/42589764/how-to-delete-a-queue-in-yarn

[1] ttps://docs.cloudera.com/runtime/7.2.1/yarn-allocate-resources/topics/yarn-start-and-stop-queues.html this is for cloudera ui :P https://stackoverflow.com/questions/42589764/how-to-delete-a-queue-in-yarn

[1]