Jump to content

Decom script

From Wikitech
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

A real life example how to decom a host using the latest method, a Spicerack cookbook which replaced "wmf-decommission-host".

1) ssh to one of the cumin masters: cumin1002.eqiad.wmnet, cumin2002.codfw.wmnet

2) example command, as dry-run:

sudo cookbook -d sre.hosts.decommission analytics1032.eqiad.wmnet -t T233080

3) replace host name, ticket ID and remove the "-d" to actually run it.

example output of the dry run:

elukey@cumin1001:~$ sudo cookbook -d sre.hosts.decommission analytics1032.eqiad.wmnet -t T233080
DRY-RUN: Executing cookbook sre.hosts.decommission with args: ['analytics1032.eqiad.wmnet', '-t', 'T233080']
DRY-RUN: START - Cookbook sre.hosts.decommission
ATTENTION: destructive action for 1 hosts: analytics1032.eqiad.wmnet
Are you sure to proceed?
Type "done" to proceed
> done
DRY-RUN: Resolved CNAME record for icinga.wikimedia.org: icinga.wikimedia.org. 300 IN CNAME icinga1001.wikimedia.org.
DRY-RUN: MGMT_PASSWORD environment variable not found
Management Password:
DRY-RUN: Scheduling downtime on Icinga server icinga1001.wikimedia.org for hosts: ['analytics1032.eqiad.wmnet']
DRY-RUN: Executing commands ['icinga-downtime -h "analytics1032" -d 14400 -r "Host decommission - elukey@cumin1001 - T233080"'] on 1 hosts: icinga1001.wikimedia.org
DRY-RUN: Downtimed host on Icinga
DRY-RUN: Resolved A record for analytics1032.mgmt.eqiad.wmnet: analytics1032.mgmt.eqiad.wmnet. 3600 IN A 10.65.3.215
DRY-RUN: Management FQDN for analytics1032.eqiad.wmnet is analytics1032.mgmt.eqiad.wmnet
DRY-RUN: Scheduling downtime on Icinga server icinga1001.wikimedia.org for hosts: ['analytics1032.mgmt.eqiad.wmnet']
DRY-RUN: Executing commands ['icinga-downtime -h "analytics1032" -d 14400 -r "Host decommission - elukey@cumin1001 - T233080"'] on 1 hosts: icinga1001.wikimedia.org
DRY-RUN: Downtimed management interface on Icinga
DRY-RUN: Executing commands ['true'] on 1 hosts: analytics1032.eqiad.wmnet
DRY-RUN: Executing commands ["lsblk --all --output 'NAME,TYPE' --paths | awk '/^\\/.* disk$/{ print $1 }' | xargs -I % bash -c '/sbin/wipefs --all --force %*'"] on 1 hosts: analytics1032.eqiad.wmnet
DRY-RUN: Wiped bootloaders
DRY-RUN: Running IPMI command: ipmitool -I lanplus -H analytics1032.mgmt.eqiad.wmnet -U root -E chassis power off
DRY-RUN: Powered off
DRY-RUN: skipping host status write due to dry-run mode for analytics1032 Active -> Decommissioning
DRY-RUN: Set Netbox status to Decommissioning
DRY-RUN: Skip removing host analytics1032.eqiad.wmnet from Debmonitor in DRY-RUN
DRY-RUN: Removed from DebMonitor
DRY-RUN: Executing commands ['puppet node clean analytics1032.eqiad.wmnet', 'puppet node deactivate analytics1032.eqiad.wmnet'] on 1 hosts: puppetmaster1001.eqiad.wmnet
DRY-RUN: Removed from Puppet master and PuppetDB
DRY-RUN: Skip updating Phabricator task T233080 in DRY-RUN with comment: cookbooks.sre.hosts.decommission executed by elukey@cumin1001 for hosts: `analytics1032.eqiad.wmnet`
-  analytics1032.eqiad.wmnet (**PASS**)
  - Downtimed host on Icinga
  - Downtimed management interface on Icinga
  - Wiped bootloaders
  - Powered off
  - Set Netbox status to Decommissioning
  - Removed from DebMonitor
  - Removed from Puppet master and PuppetDB
DRY-RUN: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)

example output of the actual run:

[cumin1001:~] $ sudo cookbook sre.hosts.decommission labtestcontrol2001.wikimedia.org -t T218021
START - Cookbook sre.hosts.decommission
Scheduling downtime on Icinga server icinga1001.wikimedia.org for hosts: ['labtestcontrol2001.wikimedia.org']
Scheduling downtime on Icinga server icinga1001.wikimedia.org for hosts: ['labtestcontrol2001.mgmt.codfw.wmnet']
Removed host labtestcontrol2001.wikimedia.org from Debmonitor
Updated Phabricator task T218021
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)

You should see logmsgbot and stashbot talk about it on #wikimedia-operations and your Phabricator ticket should be automatically updated.

An example on a Phabricator ticket the result looks like https://phabricator.wikimedia.org/T218021#5107910

Also see: https://doc.wikimedia.org/spicerack/master/cookbook.html