Jump to content

Server Admin Log/Archive 69

From Wikitech
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

2023-08-15

  • 23:26 hmonroy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Set wikidiff2 maxSplitSize = 10 on group0 wikis T341754 (duration: 07m 39s)
  • 22:27 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1012.eqiad.wmnet with OS bullseye
  • 22:22 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1013.eqiad.wmnet with OS bullseye
  • 21:53 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1013.eqiad.wmnet with reason: host reimage
  • 21:50 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1012.eqiad.wmnet with reason: host reimage
  • 21:47 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1013.eqiad.wmnet with reason: host reimage
  • 21:47 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1012.eqiad.wmnet with reason: host reimage
  • 21:32 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1013.eqiad.wmnet with OS bullseye
  • 21:32 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1012.eqiad.wmnet with OS bullseye
  • 21:16 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:16 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: pdus - robh@cumin1001"
  • 21:15 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: pdus - robh@cumin1001"
  • 21:09 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 21:07 robh@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 21:07 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 20:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs3010.esams.wmnet with OS bullseye
  • 20:55 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 20:54 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 20:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs3010.esams.wmnet with reason: host reimage
  • 20:36 ebernhardson: T342444 start cirrussearch reindex of all wikis to enable new text analysis components from mwmaint1002
  • 20:33 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs3010.esams.wmnet with reason: host reimage
  • 20:20 ryankemper@deploy1002: Finished scap: Backport for elastic: allow only 1 enwiki_content per host (T343820) (duration: 09m 25s)
  • 20:20 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs3008.esams.wmnet with OS bullseye
  • 20:20 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 20:19 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 20:19 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3080.esams.wmnet with OS bullseye
  • 20:19 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 20:18 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 20:17 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3072.esams.wmnet with OS bullseye
  • 20:17 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 20:16 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 20:14 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs3010.esams.wmnet with OS bullseye
  • 20:13 ryankemper@deploy1002: ryankemper: Continuing with sync
  • 20:12 ryankemper@deploy1002: ryankemper: Backport for elastic: allow only 1 enwiki_content per host (T343820) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:11 ryankemper@deploy1002: Started scap: Backport for elastic: allow only 1 enwiki_content per host (T343820)
  • 20:01 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs3008.esams.wmnet with reason: host reimage
  • 20:01 sukhe: running dummy authdns-update
  • 19:57 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs3008.esams.wmnet with reason: host reimage
  • 19:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3080.esams.wmnet with reason: host reimage
  • 19:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3072.esams.wmnet with reason: host reimage
  • 19:51 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3080.esams.wmnet with reason: host reimage
  • 19:49 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3072.esams.wmnet with reason: host reimage
  • 19:45 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns3004.wikimedia.org with OS bullseye
  • 19:45 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 19:44 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 19:38 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs3008.esams.wmnet with OS bullseye
  • 19:32 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "manual trigger - cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002 - brett@cumin2002"
  • 19:32 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "manual trigger - cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002 - brett@cumin2002"
  • 19:31 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:31 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merge flink-zk2002 DNS changes - sukhe@cumin2002"
  • 19:31 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3078.esams.wmnet with OS bullseye
  • 19:31 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
  • 19:30 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3070.esams.wmnet with OS bullseye
  • 19:30 brett@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
  • 19:30 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merge flink-zk2002 DNS changes - sukhe@cumin2002"
  • 19:29 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3080.esams.wmnet with OS bullseye
  • 19:28 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3072.esams.wmnet with OS bullseye
  • 19:28 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
  • 19:26 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
  • 19:26 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 19:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3078.esams.wmnet with reason: host reimage
  • 19:03 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns3004.wikimedia.org with reason: host reimage
  • 19:01 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3070.esams.wmnet with reason: host reimage
  • 18:58 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dns3004.wikimedia.org with reason: host reimage
  • 18:57 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3078.esams.wmnet with reason: host reimage
  • 18:57 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3070.esams.wmnet with reason: host reimage
  • 18:37 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host dns3004.wikimedia.org with OS bullseye
  • 18:36 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns3004.wikimedia.org with OS bullseye
  • 18:36 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3078.esams.wmnet with OS bullseye
  • 18:36 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3070.esams.wmnet with OS bullseye
  • 18:32 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3076.esams.wmnet with OS bullseye
  • 18:32 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
  • 18:30 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3068.esams.wmnet with OS bullseye
  • 18:30 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
  • 18:30 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
  • 18:28 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
  • 18:26 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host dns3004.wikimedia.org with OS bullseye
  • 18:17 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns3003.wikimedia.org with OS bullseye
  • 18:17 fabfur@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 18:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3074.esams.wmnet with OS bullseye
  • 18:16 sukhe@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 18:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3066.esams.wmnet with OS bullseye
  • 18:16 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 18:11 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 18:10 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 18:09 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 18:08 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3076.esams.wmnet with reason: host reimage
  • 18:05 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3068.esams.wmnet with reason: host reimage
  • 18:05 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3076.esams.wmnet with reason: host reimage
  • 18:01 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3068.esams.wmnet with reason: host reimage
  • 17:54 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 17:53 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 17:48 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3074.esams.wmnet with reason: host reimage
  • 17:45 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3066.esams.wmnet with reason: host reimage
  • 17:44 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3074.esams.wmnet with reason: host reimage
  • 17:42 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3066.esams.wmnet with reason: host reimage
  • 17:42 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns3003.wikimedia.org with reason: host reimage
  • 17:42 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3076.esams.wmnet with OS bullseye
  • 17:40 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3068.esams.wmnet with OS bullseye
  • 17:39 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dns3003.wikimedia.org with reason: host reimage
  • 17:22 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3074.esams.wmnet with OS bullseye
  • 17:21 brett: Upload libvmod-netmapper 1.9-4 (bookworm) to archive - T342154
  • 17:20 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3066.esams.wmnet with OS bullseye
  • 17:15 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host dns3003.wikimedia.org with OS bullseye
  • 17:02 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3070']
  • 17:01 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3066']
  • 17:01 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3068']
  • 17:00 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3072']
  • 16:57 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk2001.codfw.wmnet
  • 16:57 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host flink-zk2001.codfw.wmnet with OS bookworm
  • 16:56 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3074']
  • 16:55 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3066']
  • 16:55 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3068']
  • 16:55 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3070']
  • 16:54 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3072']
  • 16:54 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3076']
  • 16:53 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs3010']
  • 16:52 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3078']
  • 16:51 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3080']
  • 16:50 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3074']
  • 16:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3066.mgmt.esams.wmnet with reboot policy FORCED
  • 16:49 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1098.eqiad.wmnet with OS bullseye
  • 16:48 robh@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp3074']
  • 16:48 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3074']
  • 16:47 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk2002.codfw.wmnet
  • 16:47 bking@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 16:46 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3076']
  • 16:46 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3078']
  • 16:45 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3080']
  • 16:45 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 16:45 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk2002.codfw.wmnet
  • 16:45 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs3010']
  • 16:44 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti3006']
  • 16:43 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dns3004']
  • 16:43 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs3010']
  • 16:42 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti3008']
  • 16:42 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs3008']
  • 16:37 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns3004']
  • 16:37 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti3006']
  • 16:37 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti3008']
  • 16:36 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs3008']
  • 16:33 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs3010']
  • 16:32 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3066.mgmt.esams.wmnet with reboot policy FORCED
  • 16:30 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3070.mgmt.esams.wmnet with reboot policy FORCED
  • 16:29 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3068.mgmt.esams.wmnet with reboot policy FORCED
  • 16:29 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3072.mgmt.esams.wmnet with reboot policy FORCED
  • 16:28 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3074.mgmt.esams.wmnet with reboot policy FORCED
  • 16:28 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3078.mgmt.esams.wmnet with reboot policy FORCED
  • 16:27 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3076.mgmt.esams.wmnet with reboot policy FORCED
  • 16:24 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1098.eqiad.wmnet with reason: host reimage
  • 16:24 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 16:21 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1098.eqiad.wmnet with reason: host reimage
  • 16:20 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 16:20 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 16:18 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 16:11 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3068.mgmt.esams.wmnet with reboot policy FORCED
  • 16:11 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3070.mgmt.esams.wmnet with reboot policy FORCED
  • 16:11 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3072.mgmt.esams.wmnet with reboot policy FORCED
  • 16:10 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3074.mgmt.esams.wmnet with reboot policy FORCED
  • 16:10 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3076.mgmt.esams.wmnet with reboot policy FORCED
  • 16:09 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3080.mgmt.esams.wmnet with reboot policy FORCED
  • 16:09 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3078.mgmt.esams.wmnet with reboot policy FORCED
  • 16:09 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs3010.mgmt.esams.wmnet with reboot policy FORCED
  • 16:09 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns3004.mgmt.esams.wmnet with reboot policy FORCED
  • 16:08 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti3008.mgmt.esams.wmnet with reboot policy FORCED
  • 16:05 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1098.eqiad.wmnet with OS bullseye
  • 16:02 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti3006.mgmt.esams.wmnet with reboot policy FORCED
  • 16:02 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs3008.mgmt.esams.wmnet with reboot policy FORCED
  • 16:00 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 15:59 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 15:58 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124 (duration: 00m 15s)
  • 15:58 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124
  • 15:56 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124 (duration: 00m 14s)
  • 15:56 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124
  • 15:51 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3080.mgmt.esams.wmnet with reboot policy FORCED
  • 15:51 robh@cumin1001: START - Cookbook sre.hosts.provision for host dns3004.mgmt.esams.wmnet with reboot policy FORCED
  • 15:50 robh@cumin1001: START - Cookbook sre.hosts.provision for host ganeti3006.mgmt.esams.wmnet with reboot policy FORCED
  • 15:50 robh@cumin1001: START - Cookbook sre.hosts.provision for host ganeti3008.mgmt.esams.wmnet with reboot policy FORCED
  • 15:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs3009.esams.wmnet with OS bullseye
  • 15:50 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 15:50 robh@cumin1001: START - Cookbook sre.hosts.provision for host lvs3008.mgmt.esams.wmnet with reboot policy FORCED
  • 15:49 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 15:49 robh@cumin1001: START - Cookbook sre.hosts.provision for host lvs3010.mgmt.esams.wmnet with reboot policy FORCED
  • 15:44 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:44 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns3004 - robh@cumin1001"
  • 15:44 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1015.eqiad.wmnet with OS bullseye
  • 15:44 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns3004 - robh@cumin1001"
  • 15:42 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 15:41 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk2001.codfw.wmnet with OS bookworm
  • 15:41 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk2001.codfw.wmnet - bking@cumin1001"
  • 15:40 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk2001.codfw.wmnet - bking@cumin1001"
  • 15:40 bking@cumin1001: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) flink-zk2001.codfw.wmnet on all recursors
  • 15:40 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk2001.codfw.wmnet on all recursors
  • 15:40 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:40 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2001.codfw.wmnet - bking@cumin1001"
  • 15:40 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp3066
  • 15:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3007.esams.wmnet
  • 15:39 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp3066
  • 15:39 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp3068
  • 15:39 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp3068
  • 15:39 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2001.codfw.wmnet - bking@cumin1001"
  • 15:39 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp3070
  • 15:39 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp3070
  • 15:38 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp3072
  • 15:38 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp3072
  • 15:38 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp3074
  • 15:38 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp3074
  • 15:38 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp3076
  • 15:38 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp3076
  • 15:38 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp3078
  • 15:37 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp3078
  • 15:37 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns3004
  • 15:37 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dns3004
  • 15:37 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp3080
  • 15:37 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp3080
  • 15:36 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 15:36 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk2001.codfw.wmnet
  • 15:35 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs3008
  • 15:35 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host lvs3008
  • 15:35 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs3010
  • 15:35 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host lvs3010
  • 15:34 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk2001.codfw.wmnet
  • 15:33 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:33 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rack bw27 hosts - robh@cumin1001"
  • 15:32 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rack bw27 hosts - robh@cumin1001"
  • 15:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs3009.esams.wmnet with reason: host reimage
  • 15:30 bking@cumin1001: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) flink-zk2001.codfw.wmnet on all recursors
  • 15:30 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk2001.codfw.wmnet on all recursors
  • 15:30 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:30 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM flink-zk2001.codfw.wmnet - bking@cumin1001"
  • 15:29 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM flink-zk2001.codfw.wmnet - bking@cumin1001"
  • 15:29 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 15:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3007.esams.wmnet
  • 15:27 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1014.eqiad.wmnet with OS bullseye
  • 15:27 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs3009.esams.wmnet with reason: host reimage
  • 15:07 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1097.eqiad.wmnet with OS bullseye
  • 15:06 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs3009.esams.wmnet with OS bullseye
  • 15:05 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs3009.mgmt.esams.wmnet with reboot policy FORCED
  • 14:56 robh@cumin1001: START - Cookbook sre.hosts.provision for host lvs3009.mgmt.esams.wmnet with reboot policy FORCED
  • 14:55 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs3009.mgmt.esams.wmnet with reboot policy FORCED
  • 14:54 robh@cumin1001: START - Cookbook sre.hosts.provision for host lvs3009.mgmt.esams.wmnet with reboot policy FORCED
  • 14:54 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1014.eqiad.wmnet with reason: host reimage
  • 14:54 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 14:52 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1015.eqiad.wmnet with reason: host reimage
  • 14:52 bking@cumin1001: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) flink-zk2001.codfw.wmnet on all recursors
  • 14:52 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk2001.codfw.wmnet on all recursors
  • 14:52 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:52 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2001.codfw.wmnet - bking@cumin1001"
  • 14:51 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs3009.esams.wmnet with OS bullseye
  • 14:51 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2001.codfw.wmnet - bking@cumin1001"
  • 14:50 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti3005.mgmt.esams.wmnet with reboot policy GRACEFUL
  • 14:49 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1014.eqiad.wmnet with reason: host reimage
  • 14:49 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1015.eqiad.wmnet with reason: host reimage
  • 14:45 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 14:45 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk2001.codfw.wmnet
  • 14:43 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ganeti3005.esams.wmnet
  • 14:39 robh@cumin1001: START - Cookbook sre.hosts.provision for host ganeti3005.mgmt.esams.wmnet with reboot policy GRACEFUL
  • 14:38 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1097.eqiad.wmnet with reason: host reimage
  • 14:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti3007.esams.wmnet with OS bullseye
  • 14:37 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
  • 14:34 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1097.eqiad.wmnet with reason: host reimage
  • 14:33 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
  • 14:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3005.esams.wmnet
  • 14:26 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1014.eqiad.wmnet with OS bullseye
  • 14:26 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1015.eqiad.wmnet with OS bullseye
  • 14:26 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3081.esams.wmnet with OS bullseye
  • 14:23 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti3005.esams.wmnet
  • 14:22 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs3009.esams.wmnet with OS bullseye
  • 14:17 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1097.eqiad.wmnet with OS bullseye
  • 14:14 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns3003.wikimedia.org with OS bullseye
  • 14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3005.esams.wmnet
  • 14:07 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ganeti3005.esams.wmnet
  • 14:03 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3081.esams.wmnet with reason: host reimage
  • 14:00 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3081.esams.wmnet with reason: host reimage
  • 13:51 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host dns3003.wikimedia.org with OS bullseye
  • 13:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti3007.esams.wmnet with reason: host reimage
  • 13:47 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 13:46 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 13:45 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 13:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti3007.esams.wmnet with reason: host reimage
  • 13:45 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 13:45 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 13:44 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns3003.wikimedia.org with OS bullseye
  • 13:44 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 13:38 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp3081.esams.wmnet with OS bullseye
  • 13:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3005.esams.wmnet
  • 13:29 urbanecm@deploy1002: Finished scap: Backport for Remove knwiktionary tagline (T343662) (duration: 10m 20s)
  • 13:25 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti3007.esams.wmnet with OS bullseye
  • 13:24 filippo@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:23 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti3007']
  • 13:23 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host dns3003.wikimedia.org with OS bullseye
  • 13:23 filippo@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:22 urbanecm@deploy1002: urbanecm and anzx: Continuing with sync
  • 13:20 urbanecm@deploy1002: urbanecm and anzx: Backport for Remove knwiktionary tagline (T343662) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:19 urbanecm@deploy1002: Started scap: Backport for Remove knwiktionary tagline (T343662)
  • 13:18 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti3007']
  • 13:17 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti3007']
  • 13:17 urbanecm@deploy1002: Finished scap: Backport for GrowthExperiments: enable AddLink backend 13th round of wikis (T308138) (duration: 10m 47s)
  • 13:10 urbanecm@deploy1002: sgimeno and urbanecm: Continuing with sync
  • 13:07 urbanecm@deploy1002: sgimeno and urbanecm: Backport for GrowthExperiments: enable AddLink backend 13th round of wikis (T308138) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:06 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti3007']
  • 13:06 urbanecm@deploy1002: Started scap: Backport for GrowthExperiments: enable AddLink backend 13th round of wikis (T308138)
  • 13:05 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti3005']
  • 12:56 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti3005']
  • 12:47 filippo@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:36 filippo@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:12 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "sw - ayounsi@cumin1001"
  • 12:12 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sw - ayounsi@cumin1001"
  • 12:04 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti3007.esams.wmnet with OS bullseye
  • 12:02 sukhe: sukhe@contint2002:~$ sudo systemctl restart zuul: T344238
  • 12:02 sukhe: sukhe@contint2002:~$ sudo systemctl restart zuul
  • 12:02 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3077.esams.wmnet with OS bullseye
  • 12:02 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 11:54 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 11:54 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3069.esams.wmnet with OS bullseye
  • 11:54 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 11:53 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3067.esams.wmnet with OS bullseye
  • 11:53 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 11:52 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 11:51 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 11:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3075.esams.wmnet with OS bullseye
  • 11:50 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 11:49 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 11:31 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp3077.esams.wmnet with reason: host reimage
  • 11:29 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp3069.esams.wmnet with reason: host reimage
  • 11:27 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp3075.esams.wmnet with reason: host reimage
  • 11:26 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3077.esams.wmnet with reason: host reimage
  • 11:26 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp3067.esams.wmnet with reason: host reimage
  • 11:24 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3069.esams.wmnet with reason: host reimage
  • 11:24 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti3005.esams.wmnet
  • 11:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3005.esams.wmnet
  • 11:23 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3075.esams.wmnet with reason: host reimage
  • 11:22 sukhe: sukhe@contint2002:~$ sudo systemctl restart zuul
  • 11:22 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti3005.esams.wmnet
  • 11:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3005.esams.wmnet
  • 11:22 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3067.esams.wmnet with reason: host reimage
  • 11:07 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host an-db1001.eqiad.wmnet
  • 11:04 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp3077.esams.wmnet with OS bullseye
  • 11:03 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3069.esams.wmnet with OS bullseye
  • 11:01 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3075.esams.wmnet with OS bullseye
  • 11:00 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp3067.esams.wmnet with OS bullseye
  • 10:56 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-db1001.eqiad.wmnet
  • 10:56 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1096.eqiad.wmnet with OS bullseye
  • 10:54 sukhe: zuul@contint1002:/srv/zuul/git/operations/puppet$ git fetch --force --tags -v origin
  • 10:45 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti3007.esams.wmnet with OS bullseye
  • 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti3005.esams.wmnet with OS bullseye
  • 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
  • 10:43 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
  • 10:42 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3079.esams.wmnet with OS bullseye
  • 10:42 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 10:41 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 10:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3071.esams.wmnet with OS bullseye
  • 10:37 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 10:36 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 10:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3073.esams.wmnet with OS bullseye
  • 10:32 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 10:31 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 10:25 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti3005.esams.wmnet with reason: host reimage
  • 10:21 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti3005.esams.wmnet with reason: host reimage
  • 10:20 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp3079.esams.wmnet with reason: host reimage
  • 10:16 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3079.esams.wmnet with reason: host reimage
  • 10:11 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp3071.esams.wmnet with reason: host reimage
  • 10:09 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp3073.esams.wmnet with reason: host reimage
  • 10:06 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3071.esams.wmnet with reason: host reimage
  • 10:04 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3073.esams.wmnet with reason: host reimage
  • 10:01 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti3005.esams.wmnet with OS bullseye
  • 09:54 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp3079.esams.wmnet with OS bullseye
  • 09:52 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp3079.esams.wmnet with OS bullseye
  • 09:52 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp3079.esams.wmnet with OS bullseye
  • 09:51 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti3005.esams.wmnet with OS bullseye
  • 09:50 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp3079.esams.wmnet with OS bullseye
  • 09:43 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3071.esams.wmnet with OS bullseye
  • 09:41 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp3079.esams.wmnet with reason: host reimage
  • 09:37 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3079.esams.wmnet with reason: host reimage
  • 09:34 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1096.eqiad.wmnet with OS bullseye
  • 09:30 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3073.esams.wmnet with OS bullseye
  • 09:30 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti3005.esams.wmnet with OS bullseye
  • 09:17 filippo@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:15 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp3079.esams.wmnet with OS bullseye
  • 09:15 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3079.esams.wmnet with OS bullseye
  • 09:11 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti3005.esams.wmnet with OS bullseye
  • 09:11 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti3005.esams.wmnet with OS bullseye
  • 09:08 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp3079.esams.wmnet with OS bullseye
  • 09:05 filippo@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 08:55 klausman: Draining ml2003 for kubelet partition resize
  • 08:46 klausman: Draining ml2002 for kubelet partition resize
  • 08:42 zabe@deploy1002: Finished scap: Backport for Add messages for Pa'O Wiktionary (blkwiktionary) (T343540), Add messages for Sundanese Wikisource (suwikisource) (T343539) (duration: 33m 26s)
  • 08:37 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 08:36 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 08:31 zabe@deploy1002: zabe: Continuing with sync
  • 08:30 zabe@deploy1002: zabe: Backport for Add messages for Pa'O Wiktionary (blkwiktionary) (T343540), Add messages for Sundanese Wikisource (suwikisource) (T343539) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 08:28 filippo@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:16 filippo@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 08:09 zabe@deploy1002: Started scap: Backport for Add messages for Pa'O Wiktionary (blkwiktionary) (T343540), Add messages for Sundanese Wikisource (suwikisource) (T343539)
  • 07:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cr2-esams mgmt - ayounsi@cumin1001"
  • 07:55 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cr2-esams mgmt - ayounsi@cumin1001"
  • 07:49 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 07:49 ayounsi@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 07:49 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 07:33 taavi@deploy1002: Finished scap: Backport for Enable EditInSequence on all wikisources (T308098) (duration: 13m 29s)
  • 07:29 gehel: restarting wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph on wdqs2012
  • 07:27 taavi@deploy1002: soda and taavi: Continuing with sync
  • 07:21 taavi@deploy1002: soda and taavi: Backport for Enable EditInSequence on all wikisources (T308098) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan2002.codfw.wmnet
  • 07:20 taavi@deploy1002: Started scap: Backport for Enable EditInSequence on all wikisources (T308098)
  • 07:18 taavi@deploy1002: Finished scap: Backport for jawiki: reassign the changetags user right (T344150) (duration: 11m 05s)
  • 07:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host titan2002.codfw.wmnet
  • 07:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "cp3081 - ayounsi@cumin1001"
  • 07:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan2001.codfw.wmnet
  • 07:15 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "cp3081 - ayounsi@cumin1001"
  • 07:12 taavi@deploy1002: anzx and taavi: Continuing with sync
  • 07:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host titan2001.codfw.wmnet
  • 07:08 taavi@deploy1002: anzx and taavi: Backport for jawiki: reassign the changetags user right (T344150) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:07 taavi@deploy1002: Started scap: Backport for jawiki: reassign the changetags user right (T344150)
  • 07:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lists2001.codfw.wmnet
  • 07:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:05 taavi@deploy1002: Finished scap: Backport for clienthints: Collect Client Hints data on group0 wikis (T341110) (duration: 15m 23s)
  • 07:04 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 07:03 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 07:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host lists2001.codfw.wmnet
  • 06:59 taavi@deploy1002: taavi and dreamyjazz: Continuing with sync
  • 06:57 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 06:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast2003.wikimedia.org
  • 06:52 taavi@deploy1002: taavi and dreamyjazz: Backport for clienthints: Collect Client Hints data on group0 wikis (T341110) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 06:50 taavi@deploy1002: Started scap: Backport for clienthints: Collect Client Hints data on group0 wikis (T341110)
  • 04:34 taavi@deploy1002: Finished scap: Backport for Add a comment why PdfHandler does not use Shellbox (duration: 08m 24s)
  • 04:28 taavi@deploy1002: taavi: Continuing with sync
  • 04:28 taavi@deploy1002: taavi: Backport for Add a comment why PdfHandler does not use Shellbox synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 04:26 taavi@deploy1002: Started scap: Backport for Add a comment why PdfHandler does not use Shellbox
  • 03:58 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.19 (duration: 02m 13s)
  • 03:56 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.22 refs T343724 (duration: 53m 42s)
  • 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.22 refs T343724
  • 01:54 eileen: config revision changed from a61171bc to a05a2a82
  • 01:51 eileen: civicrm upgraded from 16c2e58a to 5e631101
  • 01:39 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3081.esams.wmnet with OS bullseye
  • 01:39 eileen: config revision changed from 2d598716 to a61171bc
  • 01:18 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3081.esams.wmnet with OS bullseye
  • 01:01 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3081.esams.wmnet with OS bullseye
  • 00:26 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3081.esams.wmnet with OS bullseye

2023-08-14

  • 23:40 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1030.eqiad.wmnet
  • 23:31 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1030.eqiad.wmnet
  • 22:56 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3081.esams.wmnet with OS bullseye
  • 22:41 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs3009']
  • 22:35 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs3009']
  • 22:35 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs3009.mgmt.esams.wmnet with reboot policy FORCED
  • 22:27 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3081.esams.wmnet with OS bullseye
  • 22:23 robh@cumin1001: START - Cookbook sre.hosts.provision for host lvs3009.mgmt.esams.wmnet with reboot policy FORCED
  • 22:19 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:19 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs3009 - robh@cumin1001"
  • 22:18 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs3009 - robh@cumin1001"
  • 22:09 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 22:02 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti3005']
  • 22:01 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti3007.mgmt.esams.wmnet with reboot policy FORCED
  • 21:59 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3075']
  • 21:56 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti3005']
  • 21:56 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3081.esams.wmnet with OS bullseye
  • 21:55 robh@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti3005']
  • 21:55 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti3005']
  • 21:54 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3067']
  • 21:54 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3073']
  • 21:54 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3077']
  • 21:54 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti3005.mgmt.esams.wmnet with reboot policy FORCED
  • 21:54 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3069']
  • 21:54 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dns3003']
  • 21:53 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3071']
  • 21:48 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns3003']
  • 21:47 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3067']
  • 21:47 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3069']
  • 21:47 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3071']
  • 21:46 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3073']
  • 21:46 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3075']
  • 21:46 urandom: upgrading Cassandra to 4.1.1, restbase10[18,25-27,30,33]-{a,b,c} (eqiad/row D) — T339298
  • 21:46 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3077']
  • 21:43 robh@cumin1001: START - Cookbook sre.hosts.provision for host ganeti3007.mgmt.esams.wmnet with reboot policy FORCED
  • 21:43 robh@cumin1001: START - Cookbook sre.hosts.provision for host ganeti3005.mgmt.esams.wmnet with reboot policy FORCED
  • 21:42 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3067.mgmt.esams.wmnet with reboot policy FORCED
  • 21:40 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns3003.mgmt.esams.wmnet with reboot policy FORCED
  • 21:39 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3069.mgmt.esams.wmnet with reboot policy FORCED
  • 21:39 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3071.mgmt.esams.wmnet with reboot policy FORCED
  • 21:38 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3073.mgmt.esams.wmnet with reboot policy FORCED
  • 21:37 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3075.mgmt.esams.wmnet with reboot policy FORCED
  • 21:37 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3077.mgmt.esams.wmnet with reboot policy FORCED
  • 21:35 maryum: security deploy for T341529
  • 21:27 urandom: upgrading Cassandra to 4.1.1, restbase10[17,22-24,29,32]-{a,b,c} (eqiad/row B) — T339298
  • 21:22 robh@cumin1001: START - Cookbook sre.hosts.provision for host dns3003.mgmt.esams.wmnet with reboot policy FORCED
  • 21:21 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3067.mgmt.esams.wmnet with reboot policy FORCED
  • 21:21 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3069.mgmt.esams.wmnet with reboot policy FORCED
  • 21:21 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3071.mgmt.esams.wmnet with reboot policy FORCED
  • 21:19 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3073.mgmt.esams.wmnet with reboot policy FORCED
  • 21:19 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3075.mgmt.esams.wmnet with reboot policy FORCED
  • 21:18 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3077.mgmt.esams.wmnet with reboot policy FORCED
  • 21:11 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:11 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new hosts in by27 - robh@cumin1001"
  • 21:10 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new hosts in by27 - robh@cumin1001"
  • 21:09 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3081.esams.wmnet with OS bullseye
  • 21:08 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3081.esams.wmnet with OS bullseye
  • 21:06 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 20:42 urbanecm@deploy1002: Finished scap: Backport for Config changes for new Android schema (duration: 13m 36s)
  • 20:35 urbanecm@deploy1002: urbanecm and sharvaniharan: Continuing with sync
  • 20:33 urandom: upgrading Cassandra to 4.1.1, restbase10[19-21,28,31]-{a,b,c} (eqiad/row A) — T339298
  • 20:30 urbanecm@deploy1002: urbanecm and sharvaniharan: Backport for Config changes for new Android schema synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:28 urbanecm@deploy1002: Started scap: Backport for Config changes for new Android schema
  • 20:25 urbanecm@deploy1002: Finished scap: Backport for NewcomerTasksLogFactory: Use getName(), not getDbKey() (T344163) (duration: 09m 08s)
  • 20:18 urbanecm@deploy1002: urbanecm: Continuing with sync
  • 20:18 urbanecm@deploy1002: urbanecm: Backport for NewcomerTasksLogFactory: Use getName(), not getDbKey() (T344163) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:17 urandom: upgrading Cassandra to 4.1.1, restbase20[12,17-18,23,26-27]-{a,b,c} (codfw/row C) — T339298
  • 20:16 urbanecm@deploy1002: Started scap: Backport for NewcomerTasksLogFactory: Use getName(), not getDbKey() (T344163)
  • 19:57 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3081.esams.wmnet with OS bullseye
  • 19:57 urandom: upgrading Cassandra to 4.1.1, restbase20[15,16,20,22,25]-{a,b,c} (codfw/row C) — T339298
  • 19:52 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3079']
  • 19:45 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3079']
  • 19:45 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3079.mgmt.esams.wmnet with reboot policy FORCED
  • 19:44 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3081']
  • 19:43 urandom: upgrading Cassandra to 4.1.1, restbase2024-{a,b,c} — T339298
  • 19:38 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3081']
  • 19:38 robh@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp3081']
  • 19:37 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3081']
  • 19:34 urandom: upgrading Cassandra to 4.1.1, restbase2021-{a,b,c} — T339298
  • 19:34 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3081.mgmt.esams.wmnet with reboot policy FORCED
  • 19:31 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3079.mgmt.esams.wmnet with reboot policy FORCED
  • 19:24 urandom: upgrading Cassandra to 4.1.1, restbase2019-{a,b,c} — T339298
  • 19:16 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3081.mgmt.esams.wmnet with reboot policy FORCED
  • 19:11 urandom: upgrading Cassandra to 4.1.1, restbase2014-{a,b,c} — T339298
  • 18:45 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp3081.mgmt.esams.wmnet with reboot policy FORCED
  • 18:45 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3081.mgmt.esams.wmnet with reboot policy FORCED
  • 18:43 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp3081.mgmt.esams.wmnet with reboot policy FORCED
  • 18:43 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3081.mgmt.esams.wmnet with reboot policy FORCED
  • 18:38 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:38 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merge cp3081 and cp3079 - sukhe@cumin2002"
  • 18:37 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merge cp3081 and cp3079 - sukhe@cumin2002"
  • 18:23 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 17:41 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1095.eqiad.wmnet with OS bullseye
  • 17:41 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 17:39 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 17:18 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1095.eqiad.wmnet with reason: host reimage
  • 17:15 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1095.eqiad.wmnet with reason: host reimage
  • 17:02 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1095.eqiad.wmnet with OS bullseye
  • 16:58 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:57 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 16:47 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 100%: Repooling after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P50580 and previous config saved to /var/cache/conftool/dbconfig/20230814-164727-root.json
  • 16:42 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 16:37 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1094.eqiad.wmnet with OS bullseye
  • 16:32 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 75%: Repooling after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P50579 and previous config saved to /var/cache/conftool/dbconfig/20230814-163222-root.json
  • 16:28 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 16:28 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:28 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename cr3-knams to cr2-esams - cmooney@cumin1001"
  • 16:17 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 50%: Repooling after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P50578 and previous config saved to /var/cache/conftool/dbconfig/20230814-161718-root.json
  • 16:14 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1094.eqiad.wmnet with reason: host reimage
  • 16:11 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1094.eqiad.wmnet with reason: host reimage
  • 16:02 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 25%: Repooling after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P50577 and previous config saved to /var/cache/conftool/dbconfig/20230814-160213-root.json
  • 16:01 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cr2-esams.wikimedia.org on all recursors
  • 16:00 sukhe@cumin2002: START - Cookbook sre.dns.wipe-cache cr2-esams.wikimedia.org on all recursors
  • 15:58 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1094.eqiad.wmnet with OS bullseye
  • 15:55 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1093.eqiad.wmnet with OS bullseye
  • 15:53 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename cr3-knams to cr2-esams - cmooney@cumin1001"
  • 15:50 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 15:47 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 10%: Repooling after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P50576 and previous config saved to /var/cache/conftool/dbconfig/20230814-154708-root.json
  • 15:47 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 15:46 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124 (duration: 00m 15s)
  • 15:45 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124
  • 15:38 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 15:36 urandom: upgrading Cassandra to 4.1.1, restbase1016-{a,b,c} — T339298
  • 15:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1093.eqiad.wmnet with reason: host reimage
  • 15:32 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 5%: Repooling after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P50575 and previous config saved to /var/cache/conftool/dbconfig/20230814-153203-root.json
  • 15:30 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124 (duration: 00m 43s)
  • 15:29 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124
  • 15:29 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1093.eqiad.wmnet with reason: host reimage
  • 15:16 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 3%: Repooling after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P50574 and previous config saved to /var/cache/conftool/dbconfig/20230814-151659-root.json
  • 15:16 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1093.eqiad.wmnet with OS bullseye
  • 15:01 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 1%: Repooling after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P50572 and previous config saved to /var/cache/conftool/dbconfig/20230814-150154-root.json
  • 14:57 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2012.codfw.wmnet with OS bullseye
  • 14:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum1002.eqiad.wmnet with OS bookworm
  • 14:42 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1016.eqiad.wmnet with OS bullseye
  • 14:34 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@ee544cb] (eqiad): (no justification provided) (duration: 00m 00s)
  • 14:34 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@ee544cb] (eqiad): (no justification provided)
  • 14:33 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@ee544cb] (eqiad): (no justification provided) (duration: 00m 03s)
  • 14:33 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@ee544cb] (eqiad): (no justification provided)
  • 14:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum1002.eqiad.wmnet with reason: host reimage
  • 14:30 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@ee544cb] (eqiad): (no justification provided) (duration: 00m 00s)
  • 14:30 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@ee544cb] (eqiad): (no justification provided)
  • 14:27 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@ee544cb]: (no justification provided) (duration: 00m 01s)
  • 14:27 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@ee544cb]: (no justification provided)
  • 14:26 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum1002.eqiad.wmnet with reason: host reimage
  • 14:26 sukhe: running authdns-update for CR 948195: T344073
  • 14:26 sukhe: running authdns-update for CR 948195
  • 14:25 jgiannelos@deploy1002: deploy aborted: (no justification provided) (duration: 00m 10s)
  • 14:25 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@ee544cb]: (no justification provided)
  • 14:19 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1016.eqiad.wmnet with reason: host reimage
  • 14:16 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1016.eqiad.wmnet with reason: host reimage
  • 14:13 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum1002.eqiad.wmnet with OS bookworm
  • 14:05 urandom: upgrading Cassandra to 4.1.1, restbase2013-{a,b,c} — T339298
  • 14:04 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2012.codfw.wmnet with reason: host reimage
  • 14:01 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2012.codfw.wmnet with reason: host reimage
  • 13:53 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS bullseye
  • 13:40 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2012.codfw.wmnet with OS bullseye
  • 13:27 derick@deploy1002: Finished scap: Backport for wmf-config: Remove wgContentTranslationDefaultParsoidClient cleanup (duration: 16m 56s)
  • 13:20 derick@deploy1002: d3r1ck01 and derick: Continuing with sync
  • 13:19 derick@deploy1002: d3r1ck01 and derick: Backport for wmf-config: Remove wgContentTranslationDefaultParsoidClient cleanup synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:10 derick@deploy1002: Started scap: Backport for wmf-config: Remove wgContentTranslationDefaultParsoidClient cleanup
  • 13:08 derick@deploy1002: Backport cancelled.
  • 11:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mr1-esams oob - ayounsi@cumin1001"
  • 11:23 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mr1-esams oob - ayounsi@cumin1001"
  • 11:21 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 11:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mr1-esams oob - ayounsi@cumin1001"
  • 11:16 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mr1-esams oob - ayounsi@cumin1001"
  • 11:13 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 11:09 stevemunene@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-airflow1007.eqiad.wmnet
  • 11:09 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-airflow1007.eqiad.wmnet with OS buster
  • 10:54 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-airflow1007.eqiad.wmnet with reason: host reimage
  • 10:51 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-airflow1007.eqiad.wmnet with reason: host reimage
  • 10:40 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-airflow1007.eqiad.wmnet with OS buster
  • 10:39 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM an-airflow1007.eqiad.wmnet - stevemunene@cumin1001"
  • 10:39 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:39 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename mr1-esams-new to mr1-esams in dns. - cmooney@cumin1001"
  • 10:38 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM an-airflow1007.eqiad.wmnet - stevemunene@cumin1001"
  • 10:38 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename mr1-esams-new to mr1-esams in dns. - cmooney@cumin1001"
  • 10:38 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-airflow1007.eqiad.wmnet on all recursors
  • 10:38 stevemunene@cumin1001: START - Cookbook sre.dns.wipe-cache an-airflow1007.eqiad.wmnet on all recursors
  • 10:36 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 10:34 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:34 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename mr1-esams-new to mr1-esams in dns. - cmooney@cumin1001"
  • 10:33 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename mr1-esams-new to mr1-esams in dns. - cmooney@cumin1001"
  • 10:30 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 10:26 stevemunene@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 10:25 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:25 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename mr1-esams-new to mr1-esams in dns. - cmooney@cumin1001"
  • 10:24 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1008.eqiad.wmnet
  • 10:24 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename mr1-esams-new to mr1-esams in dns. - cmooney@cumin1001"
  • 10:20 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1009.eqiad.wmnet
  • 10:13 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
  • 10:13 stevemunene@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-airflow1007.eqiad.wmnet
  • 10:12 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1009.eqiad.wmnet
  • 09:59 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1124.eqiad.wmnet
  • 09:49 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1008.eqiad.wmnet
  • 09:48 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1007.eqiad.wmnet
  • 09:41 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 09:41 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 09:39 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1007.eqiad.wmnet
  • 09:37 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1006.eqiad.wmnet
  • 09:32 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 09:32 cmooney@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 09:32 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 09:28 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1006.eqiad.wmnet
  • 09:27 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1005.eqiad.wmnet
  • 09:26 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1124.eqiad.wmnet
  • 09:16 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1005.eqiad.wmnet
  • 09:13 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1004.eqiad.wmnet
  • 09:11 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:11 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename mr1-esams dns to mr1-eams-old. - cmooney@cumin1001"
  • 09:10 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename mr1-esams dns to mr1-eams-old. - cmooney@cumin1001"
  • 09:08 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 09:02 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1004.eqiad.wmnet

2023-08-13

  • 16:07 topranks: powering down cr3-esams
  • 16:05 topranks: powering down cr2-esams
  • 15:54 topranks: Disabling esams peering at AMS-IX prior to removing router
  • 15:45 topranks: Disable transport cct cr2-esams to cr2-eqiad prior to disconnect T329219
  • 15:26 topranks: disable transit and peering links on cr2-esams & cr3-esams before decom T329219

2023-08-12

  • 08:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 08:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 08:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T342617)', diff saved to https://phabricator.wikimedia.org/P50569 and previous config saved to /var/cache/conftool/dbconfig/20230812-082511-ladsgroup.json
  • 08:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P50568 and previous config saved to /var/cache/conftool/dbconfig/20230812-081005-ladsgroup.json
  • 07:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P50567 and previous config saved to /var/cache/conftool/dbconfig/20230812-075459-ladsgroup.json
  • 07:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T342617)', diff saved to https://phabricator.wikimedia.org/P50566 and previous config saved to /var/cache/conftool/dbconfig/20230812-073953-ladsgroup.json
  • 05:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1219 (T342617)', diff saved to https://phabricator.wikimedia.org/P50565 and previous config saved to /var/cache/conftool/dbconfig/20230812-055651-ladsgroup.json
  • 05:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 05:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 05:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T342617)', diff saved to https://phabricator.wikimedia.org/P50564 and previous config saved to /var/cache/conftool/dbconfig/20230812-050127-ladsgroup.json
  • 04:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P50563 and previous config saved to /var/cache/conftool/dbconfig/20230812-044621-ladsgroup.json
  • 04:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 04:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 04:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T342617)', diff saved to https://phabricator.wikimedia.org/P50562 and previous config saved to /var/cache/conftool/dbconfig/20230812-043724-ladsgroup.json
  • 04:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P50561 and previous config saved to /var/cache/conftool/dbconfig/20230812-043115-ladsgroup.json
  • 04:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P50560 and previous config saved to /var/cache/conftool/dbconfig/20230812-042217-ladsgroup.json
  • 04:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T342617)', diff saved to https://phabricator.wikimedia.org/P50559 and previous config saved to /var/cache/conftool/dbconfig/20230812-041608-ladsgroup.json
  • 04:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P50558 and previous config saved to /var/cache/conftool/dbconfig/20230812-040711-ladsgroup.json
  • 03:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T342617)', diff saved to https://phabricator.wikimedia.org/P50557 and previous config saved to /var/cache/conftool/dbconfig/20230812-035205-ladsgroup.json
  • 02:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T342617)', diff saved to https://phabricator.wikimedia.org/P50556 and previous config saved to /var/cache/conftool/dbconfig/20230812-023441-ladsgroup.json
  • 02:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 02:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 02:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T342617)', diff saved to https://phabricator.wikimedia.org/P50555 and previous config saved to /var/cache/conftool/dbconfig/20230812-023419-ladsgroup.json
  • 02:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P50554 and previous config saved to /var/cache/conftool/dbconfig/20230812-021913-ladsgroup.json
  • 02:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P50553 and previous config saved to /var/cache/conftool/dbconfig/20230812-020407-ladsgroup.json
  • 01:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1207 (T342617)', diff saved to https://phabricator.wikimedia.org/P50552 and previous config saved to /var/cache/conftool/dbconfig/20230812-015910-ladsgroup.json
  • 01:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 01:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 01:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T342617)', diff saved to https://phabricator.wikimedia.org/P50551 and previous config saved to /var/cache/conftool/dbconfig/20230812-015849-ladsgroup.json
  • 01:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T342617)', diff saved to https://phabricator.wikimedia.org/P50550 and previous config saved to /var/cache/conftool/dbconfig/20230812-014901-ladsgroup.json
  • 01:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P50549 and previous config saved to /var/cache/conftool/dbconfig/20230812-014342-ladsgroup.json
  • 01:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P50548 and previous config saved to /var/cache/conftool/dbconfig/20230812-012836-ladsgroup.json
  • 01:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T342617)', diff saved to https://phabricator.wikimedia.org/P50547 and previous config saved to /var/cache/conftool/dbconfig/20230812-011330-ladsgroup.json
  • 00:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T342617)', diff saved to https://phabricator.wikimedia.org/P50546 and previous config saved to /var/cache/conftool/dbconfig/20230812-000623-ladsgroup.json
  • 00:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 00:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 00:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T342617)', diff saved to https://phabricator.wikimedia.org/P50545 and previous config saved to /var/cache/conftool/dbconfig/20230812-000602-ladsgroup.json

2023-08-11

  • 23:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P50544 and previous config saved to /var/cache/conftool/dbconfig/20230811-235056-ladsgroup.json
  • 23:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P50543 and previous config saved to /var/cache/conftool/dbconfig/20230811-233549-ladsgroup.json
  • 23:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1206 (T342617)', diff saved to https://phabricator.wikimedia.org/P50542 and previous config saved to /var/cache/conftool/dbconfig/20230811-233320-ladsgroup.json
  • 23:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 23:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 23:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T342617)', diff saved to https://phabricator.wikimedia.org/P50541 and previous config saved to /var/cache/conftool/dbconfig/20230811-233259-ladsgroup.json
  • 23:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T342617)', diff saved to https://phabricator.wikimedia.org/P50540 and previous config saved to /var/cache/conftool/dbconfig/20230811-232043-ladsgroup.json
  • 23:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P50539 and previous config saved to /var/cache/conftool/dbconfig/20230811-231753-ladsgroup.json
  • 23:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P50538 and previous config saved to /var/cache/conftool/dbconfig/20230811-230247-ladsgroup.json
  • 22:49 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 22:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T342617)', diff saved to https://phabricator.wikimedia.org/P50537 and previous config saved to /var/cache/conftool/dbconfig/20230811-224741-ladsgroup.json
  • 22:06 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:04 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 22:04 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:04 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS additions esams move. - cmooney@cumin1001"
  • 22:03 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS additions esams move. - cmooney@cumin1001"
  • 22:02 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 22:01 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 22:00 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:57 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 21:49 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:48 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 21:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T342617)', diff saved to https://phabricator.wikimedia.org/P50536 and previous config saved to /var/cache/conftool/dbconfig/20230811-214142-ladsgroup.json
  • 21:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 21:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 21:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 21:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 21:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T342617)', diff saved to https://phabricator.wikimedia.org/P50535 and previous config saved to /var/cache/conftool/dbconfig/20230811-214105-ladsgroup.json
  • 21:28 andrewbogott: rebooting wikitech-static-ord via rackspace UI
  • 21:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P50534 and previous config saved to /var/cache/conftool/dbconfig/20230811-212559-ladsgroup.json
  • 21:17 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:15 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 21:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P50533 and previous config saved to /var/cache/conftool/dbconfig/20230811-211053-ladsgroup.json
  • 21:10 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:10 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS additions esams move. - cmooney@cumin1001"
  • 21:08 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS additions esams move. - cmooney@cumin1001"
  • 21:06 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 21:06 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:01 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 21:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1196 (T342617)', diff saved to https://phabricator.wikimedia.org/P50532 and previous config saved to /var/cache/conftool/dbconfig/20230811-210102-ladsgroup.json
  • 21:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 21:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 21:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 21:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 21:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T342617)', diff saved to https://phabricator.wikimedia.org/P50531 and previous config saved to /var/cache/conftool/dbconfig/20230811-210024-ladsgroup.json
  • 20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T342617)', diff saved to https://phabricator.wikimedia.org/P50530 and previous config saved to /var/cache/conftool/dbconfig/20230811-205546-ladsgroup.json
  • 20:48 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:46 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124 (duration: 00m 12s)
  • 20:46 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124
  • 20:46 bking@deploy1002: deploy aborted: deploying WDQS on newly-reimaged Bullseye hosts T343124 (duration: 02m 44s)
  • 20:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P50529 and previous config saved to /var/cache/conftool/dbconfig/20230811-204517-ladsgroup.json
  • 20:43 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124
  • 20:31 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs2011.codfw.wmnet with OS bullseye
  • 20:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P50528 and previous config saved to /var/cache/conftool/dbconfig/20230811-203011-ladsgroup.json
  • 20:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T342617)', diff saved to https://phabricator.wikimedia.org/P50527 and previous config saved to /var/cache/conftool/dbconfig/20230811-201505-ladsgroup.json
  • 20:08 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2011.codfw.wmnet with reason: host reimage
  • 20:05 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2011.codfw.wmnet with reason: host reimage
  • 20:02 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:02 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 20:02 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124 (duration: 00m 41s)
  • 20:02 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:01 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124
  • 19:44 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2011.codfw.wmnet with OS bullseye
  • 19:38 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:38 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS additions esams move. - cmooney@cumin1001"
  • 19:37 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS additions esams move. - cmooney@cumin1001"
  • 19:34 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 19:33 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2010.codfw.wmnet with OS bullseye
  • 19:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T342617)', diff saved to https://phabricator.wikimedia.org/P50526 and previous config saved to /var/cache/conftool/dbconfig/20230811-191548-ladsgroup.json
  • 19:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 19:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 19:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T342617)', diff saved to https://phabricator.wikimedia.org/P50525 and previous config saved to /var/cache/conftool/dbconfig/20230811-191527-ladsgroup.json
  • 19:06 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2010.codfw.wmnet with reason: host reimage
  • 19:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum1002.eqiad.wmnet with OS bookworm
  • 19:03 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2010.codfw.wmnet with reason: host reimage
  • 19:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 19:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 19:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T342617)', diff saved to https://phabricator.wikimedia.org/P50524 and previous config saved to /var/cache/conftool/dbconfig/20230811-190208-ladsgroup.json
  • 19:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P50523 and previous config saved to /var/cache/conftool/dbconfig/20230811-190021-ladsgroup.json
  • 18:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P50522 and previous config saved to /var/cache/conftool/dbconfig/20230811-184701-ladsgroup.json
  • 18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P50521 and previous config saved to /var/cache/conftool/dbconfig/20230811-184514-ladsgroup.json
  • 18:42 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2010.codfw.wmnet with OS bullseye
  • 18:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum1002.eqiad.wmnet with reason: host reimage
  • 18:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1186 (T342617)', diff saved to https://phabricator.wikimedia.org/P50520 and previous config saved to /var/cache/conftool/dbconfig/20230811-183431-ladsgroup.json
  • 18:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 18:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 18:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T342617)', diff saved to https://phabricator.wikimedia.org/P50519 and previous config saved to /var/cache/conftool/dbconfig/20230811-183410-ladsgroup.json
  • 18:31 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum1002.eqiad.wmnet with reason: host reimage
  • 18:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P50518 and previous config saved to /var/cache/conftool/dbconfig/20230811-183155-ladsgroup.json
  • 18:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T342617)', diff saved to https://phabricator.wikimedia.org/P50517 and previous config saved to /var/cache/conftool/dbconfig/20230811-183008-ladsgroup.json
  • 18:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P50516 and previous config saved to /var/cache/conftool/dbconfig/20230811-181904-ladsgroup.json
  • 18:17 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum1002.eqiad.wmnet with OS bookworm
  • 18:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T342617)', diff saved to https://phabricator.wikimedia.org/P50515 and previous config saved to /var/cache/conftool/dbconfig/20230811-181649-ladsgroup.json
  • 18:14 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:14 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS additions esams move. - cmooney@cumin1001"
  • 18:12 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS additions esams move. - cmooney@cumin1001"
  • 18:09 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 18:08 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 18:05 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 18:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P50514 and previous config saved to /var/cache/conftool/dbconfig/20230811-180358-ladsgroup.json
  • 18:02 sukhe: reload icinga on alert1001
  • 17:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T342617)', diff saved to https://phabricator.wikimedia.org/P50513 and previous config saved to /var/cache/conftool/dbconfig/20230811-174851-ladsgroup.json
  • 17:43 topranks: removing routing for former ns2.wikimedia.org IP 91.198.174.239 from esams CRs T343942
  • 17:33 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124 (duration: 00m 44s)
  • 17:32 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124
  • 17:20 sukhe@cumin2002: END (ERROR) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=97) rolling restart_daemons on A:wikidough and A:wikidough
  • 17:17 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 17:13 sukhe@cumin2002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough and A:wikidough
  • 17:07 sukhe: running agent on dns-rec to remove old ns2 IP
  • 16:52 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T342617)', diff saved to https://phabricator.wikimedia.org/P50512 and previous config saved to /var/cache/conftool/dbconfig/20230811-165033-ladsgroup.json
  • 16:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 16:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T342617)', diff saved to https://phabricator.wikimedia.org/P50511 and previous config saved to /var/cache/conftool/dbconfig/20230811-165013-ladsgroup.json
  • 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P50510 and previous config saved to /var/cache/conftool/dbconfig/20230811-163506-ladsgroup.json
  • 16:32 sukhe: running dummy authdns-update
  • 16:27 sukhe: running agent on A:dns-rec to remove ns2-v4 IP: T329219
  • 16:23 sukhe: running dummy authdns-update
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P50508 and previous config saved to /var/cache/conftool/dbconfig/20230811-161959-ladsgroup.json
  • 16:17 sukhe: running agent on A:cumin or A:dns-rec or A:netbox to remove dns300x from authdns_servers: T329219
  • 16:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum1001.eqiad.wmnet with OS bookworm
  • 16:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T342617)', diff saved to https://phabricator.wikimedia.org/P50507 and previous config saved to /var/cache/conftool/dbconfig/20230811-161025-ladsgroup.json
  • 16:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 16:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 16:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T342617)', diff saved to https://phabricator.wikimedia.org/P50506 and previous config saved to /var/cache/conftool/dbconfig/20230811-160953-ladsgroup.json
  • 16:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T342617)', diff saved to https://phabricator.wikimedia.org/P50505 and previous config saved to /var/cache/conftool/dbconfig/20230811-160453-ladsgroup.json
  • 15:54 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 15:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P50504 and previous config saved to /var/cache/conftool/dbconfig/20230811-155447-ladsgroup.json
  • 15:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P50503 and previous config saved to /var/cache/conftool/dbconfig/20230811-153941-ladsgroup.json
  • 15:37 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124 (duration: 00m 22s)
  • 15:37 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124
  • 15:27 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:27 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS additions esams move. - cmooney@cumin1001"
  • 15:26 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS additions esams move. - cmooney@cumin1001"
  • 15:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T342617)', diff saved to https://phabricator.wikimedia.org/P50502 and previous config saved to /var/cache/conftool/dbconfig/20230811-152433-ladsgroup.json
  • 15:24 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 15:23 inflatador: bking@deploy1002 'deploying WDQS on newly-reimaged Bullseye hosts T343124'
  • 15:18 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 15:18 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: f1a6177 (duration: 00m 42s)
  • 15:17 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: f1a6177
  • 15:09 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:08 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS additions esams move. - cmooney@cumin1001"
  • 15:08 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS additions esams move. - cmooney@cumin1001"
  • 15:07 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs2009.codfw.wmnet with OS bullseye
  • 15:05 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 15:05 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 15:03 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs2008.codfw.wmnet with OS bullseye
  • 15:02 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: f1a6177 (duration: 00m 50s)
  • 15:01 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: f1a6177
  • 15:01 bking@deploy1002: deploy aborted: f1a6177 (duration: 00m 05s)
  • 15:01 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: f1a6177
  • 14:53 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs[2008-2009].codfw.wmnet with reason: T343124
  • 14:53 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs[2008-2009].codfw.wmnet with reason: T343124
  • 14:49 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2009.codfw.wmnet with reason: host reimage
  • 14:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum1001.eqiad.wmnet with reason: host reimage
  • 14:44 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2008.codfw.wmnet with reason: host reimage
  • 14:42 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum1001.eqiad.wmnet with reason: host reimage
  • 14:41 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2009.codfw.wmnet with reason: host reimage
  • 14:40 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2008.codfw.wmnet with reason: host reimage
  • 14:31 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 14:29 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 14:29 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 14:28 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum1001.eqiad.wmnet with OS bookworm
  • 14:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T342617)', diff saved to https://phabricator.wikimedia.org/P50501 and previous config saved to /var/cache/conftool/dbconfig/20230811-142611-ladsgroup.json
  • 14:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 14:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 14:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T342617)', diff saved to https://phabricator.wikimedia.org/P50500 and previous config saved to /var/cache/conftool/dbconfig/20230811-142550-ladsgroup.json
  • 14:21 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2008.codfw.wmnet with OS bullseye
  • 14:21 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2009.codfw.wmnet with OS bullseye
  • 14:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P50496 and previous config saved to /var/cache/conftool/dbconfig/20230811-141043-ladsgroup.json
  • 13:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P50494 and previous config saved to /var/cache/conftool/dbconfig/20230811-135537-ladsgroup.json
  • 13:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T342617)', diff saved to https://phabricator.wikimedia.org/P50493 and previous config saved to /var/cache/conftool/dbconfig/20230811-134804-ladsgroup.json
  • 13:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 13:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 13:42 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 13:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T342617)', diff saved to https://phabricator.wikimedia.org/P50492 and previous config saved to /var/cache/conftool/dbconfig/20230811-134030-ladsgroup.json
  • 13:22 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 13:01 fabfur@cumin1001: conftool action : set/pooled=yes; selector: name=cp4045.ulsfo.wmnet
  • 12:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 12:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 12:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 12:05 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4045.ulsfo.wmnet with OS bullseye
  • 12:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 12:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T342617)', diff saved to https://phabricator.wikimedia.org/P50490 and previous config saved to /var/cache/conftool/dbconfig/20230811-120211-ladsgroup.json
  • 12:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 12:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 12:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T342617)', diff saved to https://phabricator.wikimedia.org/P50489 and previous config saved to /var/cache/conftool/dbconfig/20230811-120150-ladsgroup.json
  • 11:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P50486 and previous config saved to /var/cache/conftool/dbconfig/20230811-114644-ladsgroup.json
  • 11:44 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4045.ulsfo.wmnet with reason: host reimage
  • 11:41 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4045.ulsfo.wmnet with reason: host reimage
  • 11:36 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on 29 hosts with reason: Downtime esams hosts prior to migration week.
  • 11:35 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on 29 hosts with reason: Downtime esams hosts prior to migration week.
  • 11:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P50485 and previous config saved to /var/cache/conftool/dbconfig/20230811-113138-ladsgroup.json
  • 11:26 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on 16 hosts with reason: Downtime esams network kit prior to migration week.
  • 11:26 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on 16 hosts with reason: Downtime esams network kit prior to migration week.
  • 11:21 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
  • 11:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T342617)', diff saved to https://phabricator.wikimedia.org/P50484 and previous config saved to /var/cache/conftool/dbconfig/20230811-111631-ladsgroup.json
  • 11:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2005.codfw.wmnet
  • 10:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2005.codfw.wmnet
  • 10:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 10:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 10:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T342617)', diff saved to https://phabricator.wikimedia.org/P50482 and previous config saved to /var/cache/conftool/dbconfig/20230811-104210-ladsgroup.json
  • 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P50481 and previous config saved to /var/cache/conftool/dbconfig/20230811-102704-ladsgroup.json
  • 10:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1221 (T342617)', diff saved to https://phabricator.wikimedia.org/P50480 and previous config saved to /var/cache/conftool/dbconfig/20230811-102009-ladsgroup.json
  • 10:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 10:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T342617)', diff saved to https://phabricator.wikimedia.org/P50479 and previous config saved to /var/cache/conftool/dbconfig/20230811-101930-ladsgroup.json
  • 10:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P50478 and previous config saved to /var/cache/conftool/dbconfig/20230811-101157-ladsgroup.json
  • 10:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P50477 and previous config saved to /var/cache/conftool/dbconfig/20230811-100424-ladsgroup.json
  • 09:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T342617)', diff saved to https://phabricator.wikimedia.org/P50476 and previous config saved to /var/cache/conftool/dbconfig/20230811-095651-ladsgroup.json
  • 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P50475 and previous config saved to /var/cache/conftool/dbconfig/20230811-094918-ladsgroup.json
  • 09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T342617)', diff saved to https://phabricator.wikimedia.org/P50474 and previous config saved to /var/cache/conftool/dbconfig/20230811-094118-ladsgroup.json
  • 09:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 09:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 09:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T342617)', diff saved to https://phabricator.wikimedia.org/P50473 and previous config saved to /var/cache/conftool/dbconfig/20230811-093412-ladsgroup.json
  • 09:31 topranks: Withdrawing anycast prefixes 198.35.27.0/24 (authdns), 185.71.138.0/24 & 2001:67c:930::/48 (wikidough) from esams/knams in BGP
  • 09:06 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:05 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 09:00 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 09:00 topranks: depool esams site until next week for knams POP migration / rebuild
  • 09:00 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 08:59 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:59 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:34 moritzm: installing intel-microcode security updates on bookworm/bullseye
  • 08:32 elukey: expand kubelet partition on ml-serve2001 - T339231
  • 08:31 elukey: restart kubelet on ml-serve1001 - T343900
  • 08:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 08:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 08:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T342617)', diff saved to https://phabricator.wikimedia.org/P50472 and previous config saved to /var/cache/conftool/dbconfig/20230811-081815-ladsgroup.json
  • 08:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T342617)', diff saved to https://phabricator.wikimedia.org/P50471 and previous config saved to /var/cache/conftool/dbconfig/20230811-081139-ladsgroup.json
  • 08:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 08:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 08:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T342617)', diff saved to https://phabricator.wikimedia.org/P50470 and previous config saved to /var/cache/conftool/dbconfig/20230811-081118-ladsgroup.json
  • 08:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-serve2001.codfw.wmnet with reason: Expand the kubelet disk partition
  • 08:04 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on ml-serve2001.codfw.wmnet with reason: Expand the kubelet disk partition
  • 08:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P50469 and previous config saved to /var/cache/conftool/dbconfig/20230811-080309-ladsgroup.json
  • 07:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM rpki1001.eqiad.wmnet
  • 07:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P50468 and previous config saved to /var/cache/conftool/dbconfig/20230811-075612-ladsgroup.json
  • 07:54 ayounsi@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM rpki1001.eqiad.wmnet
  • 07:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM rpki2002.codfw.wmnet
  • 07:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P50467 and previous config saved to /var/cache/conftool/dbconfig/20230811-074803-ladsgroup.json
  • 07:47 ayounsi@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM rpki2002.codfw.wmnet
  • 07:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P50466 and previous config saved to /var/cache/conftool/dbconfig/20230811-074105-ladsgroup.json
  • 07:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T342617)', diff saved to https://phabricator.wikimedia.org/P50465 and previous config saved to /var/cache/conftool/dbconfig/20230811-073257-ladsgroup.json
  • 07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T342617)', diff saved to https://phabricator.wikimedia.org/P50464 and previous config saved to /var/cache/conftool/dbconfig/20230811-072559-ladsgroup.json
  • 06:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T342617)', diff saved to https://phabricator.wikimedia.org/P50463 and previous config saved to /var/cache/conftool/dbconfig/20230811-061250-ladsgroup.json
  • 05:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P50462 and previous config saved to /var/cache/conftool/dbconfig/20230811-055744-ladsgroup.json
  • 05:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T342617)', diff saved to https://phabricator.wikimedia.org/P50461 and previous config saved to /var/cache/conftool/dbconfig/20230811-054649-ladsgroup.json
  • 05:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 05:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 05:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T342617)', diff saved to https://phabricator.wikimedia.org/P50460 and previous config saved to /var/cache/conftool/dbconfig/20230811-054628-ladsgroup.json
  • 05:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P50459 and previous config saved to /var/cache/conftool/dbconfig/20230811-054238-ladsgroup.json
  • 05:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T342617)', diff saved to https://phabricator.wikimedia.org/P50458 and previous config saved to /var/cache/conftool/dbconfig/20230811-053847-ladsgroup.json
  • 05:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 05:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 05:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T342617)', diff saved to https://phabricator.wikimedia.org/P50457 and previous config saved to /var/cache/conftool/dbconfig/20230811-053826-ladsgroup.json
  • 05:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P50456 and previous config saved to /var/cache/conftool/dbconfig/20230811-053122-ladsgroup.json
  • 05:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T342617)', diff saved to https://phabricator.wikimedia.org/P50455 and previous config saved to /var/cache/conftool/dbconfig/20230811-052731-ladsgroup.json
  • 05:23 oblivian@deploy1002: Synchronized private/PrivateSettings.php: Adding proxy vendors (duration: 07m 33s)
  • 05:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P50454 and previous config saved to /var/cache/conftool/dbconfig/20230811-052320-ladsgroup.json
  • 05:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P50453 and previous config saved to /var/cache/conftool/dbconfig/20230811-051616-ladsgroup.json
  • 05:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P50452 and previous config saved to /var/cache/conftool/dbconfig/20230811-050814-ladsgroup.json
  • 05:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T342617)', diff saved to https://phabricator.wikimedia.org/P50451 and previous config saved to /var/cache/conftool/dbconfig/20230811-050110-ladsgroup.json
  • 04:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T342617)', diff saved to https://phabricator.wikimedia.org/P50450 and previous config saved to /var/cache/conftool/dbconfig/20230811-045307-ladsgroup.json
  • 03:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T342617)', diff saved to https://phabricator.wikimedia.org/P50449 and previous config saved to /var/cache/conftool/dbconfig/20230811-031400-ladsgroup.json
  • 03:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 03:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 03:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112 (T342617)', diff saved to https://phabricator.wikimedia.org/P50448 and previous config saved to /var/cache/conftool/dbconfig/20230811-031339-ladsgroup.json
  • 03:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1132 (T342617)', diff saved to https://phabricator.wikimedia.org/P50447 and previous config saved to /var/cache/conftool/dbconfig/20230811-030454-ladsgroup.json
  • 03:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 03:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 03:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T342617)', diff saved to https://phabricator.wikimedia.org/P50446 and previous config saved to /var/cache/conftool/dbconfig/20230811-030433-ladsgroup.json
  • 02:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112', diff saved to https://phabricator.wikimedia.org/P50445 and previous config saved to /var/cache/conftool/dbconfig/20230811-025833-ladsgroup.json
  • 02:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P50444 and previous config saved to /var/cache/conftool/dbconfig/20230811-024927-ladsgroup.json
  • 02:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112', diff saved to https://phabricator.wikimedia.org/P50443 and previous config saved to /var/cache/conftool/dbconfig/20230811-024327-ladsgroup.json
  • 02:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P50442 and previous config saved to /var/cache/conftool/dbconfig/20230811-023420-ladsgroup.json
  • 02:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112 (T342617)', diff saved to https://phabricator.wikimedia.org/P50441 and previous config saved to /var/cache/conftool/dbconfig/20230811-022820-ladsgroup.json
  • 02:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T342617)', diff saved to https://phabricator.wikimedia.org/P50440 and previous config saved to /var/cache/conftool/dbconfig/20230811-021914-ladsgroup.json
  • 02:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1199 (T342617)', diff saved to https://phabricator.wikimedia.org/P50439 and previous config saved to /var/cache/conftool/dbconfig/20230811-020724-ladsgroup.json
  • 02:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 02:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 02:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T342617)', diff saved to https://phabricator.wikimedia.org/P50438 and previous config saved to /var/cache/conftool/dbconfig/20230811-020703-ladsgroup.json
  • 01:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P50437 and previous config saved to /var/cache/conftool/dbconfig/20230811-015156-ladsgroup.json
  • 01:43 ryankemper: [WDQS] `ryankemper@wdqs2007:~$ sudo pool` (Caught up on lag)
  • 01:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P50436 and previous config saved to /var/cache/conftool/dbconfig/20230811-013650-ladsgroup.json
  • 01:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T342617)', diff saved to https://phabricator.wikimedia.org/P50435 and previous config saved to /var/cache/conftool/dbconfig/20230811-012144-ladsgroup.json
  • 00:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2112 (T342617)', diff saved to https://phabricator.wikimedia.org/P50434 and previous config saved to /var/cache/conftool/dbconfig/20230811-004036-ladsgroup.json
  • 00:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 00:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 00:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T342617)', diff saved to https://phabricator.wikimedia.org/P50433 and previous config saved to /var/cache/conftool/dbconfig/20230811-003243-ladsgroup.json
  • 00:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 00:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance

2023-08-10

  • 22:55 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@ff0a21b]: (no justification provided) (duration: 00m 20s)
  • 22:55 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@ff0a21b]: (no justification provided)
  • 22:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 22:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 22:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 22:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 22:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 22:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 22:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 22:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 22:12 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 22:10 urbanecm@deploy1002: Finished scap: Backport for GlobalRenameUser: Ensure old username is in canonical form (T343958) (duration: 09m 48s)
  • 22:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2179 (T342617)', diff saved to https://phabricator.wikimedia.org/P50432 and previous config saved to /var/cache/conftool/dbconfig/20230810-220820-ladsgroup.json
  • 22:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 22:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 22:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T342617)', diff saved to https://phabricator.wikimedia.org/P50431 and previous config saved to /var/cache/conftool/dbconfig/20230810-220759-ladsgroup.json
  • 22:03 urbanecm@deploy1002: urbanecm: Continuing with sync
  • 22:02 urbanecm@deploy1002: urbanecm: Backport for GlobalRenameUser: Ensure old username is in canonical form (T343958) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 22:00 urbanecm@deploy1002: Started scap: Backport for GlobalRenameUser: Ensure old username is in canonical form (T343958)
  • 21:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P50430 and previous config saved to /var/cache/conftool/dbconfig/20230810-215253-ladsgroup.json
  • 21:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P50429 and previous config saved to /var/cache/conftool/dbconfig/20230810-213747-ladsgroup.json
  • 21:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T342617)', diff saved to https://phabricator.wikimedia.org/P50428 and previous config saved to /var/cache/conftool/dbconfig/20230810-212241-ladsgroup.json
  • 21:21 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs2007.codfw.wmnet with OS bullseye
  • 20:40 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:38 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: f1a6177 (duration: 00m 42s)
  • 20:37 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: f1a6177
  • 20:34 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: f1a6177 (duration: 00m 16s)
  • 20:34 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: f1a6177
  • 19:24 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@b5a1d04]: (no justification provided) (duration: 00m 09s)
  • 19:24 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@b5a1d04]: (no justification provided)
  • 19:18 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:18 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merge ganeti changes - sukhe@cumin2002"
  • 19:16 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merge ganeti changes - sukhe@cumin2002"
  • 19:14 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 18:55 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@4312d99]: (no justification provided) (duration: 00m 20s)
  • 18:55 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@4312d99]: (no justification provided)
  • 18:43 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2007.codfw.wmnet with reason: host reimage
  • 18:40 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2007.codfw.wmnet with reason: host reimage
  • 18:25 urbanecm@deploy1002: Finished scap: Backport for ltwiki: Disable Growth features (duration: 10m 05s)
  • 18:21 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2007.codfw.wmnet with OS bullseye
  • 18:18 urbanecm@deploy1002: urbanecm: Continuing with sync
  • 18:17 urbanecm@deploy1002: urbanecm: Backport for ltwiki: Disable Growth features synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 18:15 urbanecm@deploy1002: Started scap: Backport for ltwiki: Disable Growth features
  • 18:12 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum6002.drmrs.wmnet with OS bookworm
  • 18:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1190 (T342617)', diff saved to https://phabricator.wikimedia.org/P50426 and previous config saved to /var/cache/conftool/dbconfig/20230810-180656-ladsgroup.json
  • 18:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 18:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 17:46 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum6002.drmrs.wmnet with reason: host reimage
  • 17:43 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum6002.drmrs.wmnet with reason: host reimage
  • 17:21 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum6002.drmrs.wmnet with OS bookworm
  • 17:06 cstone: payments-wiki upgraded from 5b250aed to e094ea1f
  • 16:15 sukhe: running authdns-update to update ns2 and point it to nsa.wikimedia.org
  • 15:30 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe
  • 15:20 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe
  • 14:35 jforrester@deploy1002: Finished scap: Backport for wikifunctions: Allow transwiki import from Wikidata (T343365) (duration: 09m 22s)
  • 14:28 jforrester@deploy1002: stang and jforrester: Continuing with sync
  • 14:27 jforrester@deploy1002: stang and jforrester: Backport for wikifunctions: Allow transwiki import from Wikidata (T343365) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 14:25 jforrester@deploy1002: Started scap: Backport for wikifunctions: Allow transwiki import from Wikidata (T343365)
  • 14:22 jforrester@deploy1002: Finished scap: Backport for Wikifunctions: Tell WikiLambda to stash results in our bespoke cache (T342753) (duration: 08m 15s)
  • 14:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2172 (T342617)', diff saved to https://phabricator.wikimedia.org/P50423 and previous config saved to /var/cache/conftool/dbconfig/20230810-142117-ladsgroup.json
  • 14:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 14:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T342617)', diff saved to https://phabricator.wikimedia.org/P50422 and previous config saved to /var/cache/conftool/dbconfig/20230810-142053-ladsgroup.json
  • 14:16 jforrester@deploy1002: jforrester: Continuing with sync
  • 14:16 jforrester@deploy1002: jforrester: Backport for Wikifunctions: Tell WikiLambda to stash results in our bespoke cache (T342753) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 14:14 jforrester@deploy1002: Started scap: Backport for Wikifunctions: Tell WikiLambda to stash results in our bespoke cache (T342753)
  • 14:12 jforrester@deploy1002: Finished scap: Backport for Add wikifunctions-staff to wmgPrivilegedGroups (T342868) (duration: 08m 35s)
  • 14:06 jforrester@deploy1002: jforrester: Continuing with sync
  • 14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P50421 and previous config saved to /var/cache/conftool/dbconfig/20230810-140546-ladsgroup.json
  • 14:05 jforrester@deploy1002: jforrester: Backport for Add wikifunctions-staff to wmgPrivilegedGroups (T342868) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 14:04 jforrester@deploy1002: Started scap: Backport for Add wikifunctions-staff to wmgPrivilegedGroups (T342868)
  • 14:01 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-test-coord1001.eqiad.wmnet with OS bullseye
  • 13:57 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:52 Emperor: restart puppet and repool ms-fe2009 after testing T211661
  • 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P50420 and previous config saved to /var/cache/conftool/dbconfig/20230810-135040-ladsgroup.json
  • 13:47 Emperor: depool and stop puppet on ms-fe2009 to test updated rewrite.py T211661
  • 13:45 oblivian@deploy1002: Finished scap: Backport for Add wikifunctions object cache (T297815) (duration: 09m 09s)
  • 13:38 oblivian@deploy1002: oblivian: Continuing with sync
  • 13:37 oblivian@deploy1002: oblivian: Backport for Add wikifunctions object cache (T297815) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:36 oblivian@deploy1002: Started scap: Backport for Add wikifunctions object cache (T297815)
  • 13:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T342617)', diff saved to https://phabricator.wikimedia.org/P50419 and previous config saved to /var/cache/conftool/dbconfig/20230810-133534-ladsgroup.json
  • 13:33 samtar@deploy1002: Finished scap: Backport for IS: Enable Phonos on medium projects (T336763) (duration: 10m 58s)
  • 13:26 samtar@deploy1002: samtar: Continuing with sync
  • 13:24 samtar@deploy1002: samtar: Backport for IS: Enable Phonos on medium projects (T336763) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:22 samtar@deploy1002: Started scap: Backport for IS: Enable Phonos on medium projects (T336763)
  • 13:22 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1092.eqiad.wmnet with OS bullseye
  • 13:14 TheresNoTime: `[samtar@mwmaint1002 ~]$ foreachwiki sql.php /srv/mediawiki-staging/php-1.41.0-wmf.20/extensions/CheckUser/schema/mysql/cu_useragent_clienthints_map.sql` for T258105
  • 13:09 TheresNoTime: `[samtar@mwmaint1002 ~]$ foreachwiki sql.php /srv/mediawiki-staging/php-1.41.0-wmf.20/extensions/CheckUser/schema/mysql/cu_useragent_clienthints.sql` for T258105
  • 12:57 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1092.eqiad.wmnet with reason: host reimage
  • 12:54 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1092.eqiad.wmnet with reason: host reimage
  • 12:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 12:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 12:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 12:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 12:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T342617)', diff saved to https://phabricator.wikimedia.org/P50418 and previous config saved to /var/cache/conftool/dbconfig/20230810-122626-ladsgroup.json
  • 12:22 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1092.eqiad.wmnet with OS bullseye
  • 12:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 12:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 12:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P50417 and previous config saved to /var/cache/conftool/dbconfig/20230810-121120-ladsgroup.json
  • 12:08 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1091.eqiad.wmnet with OS bullseye
  • 11:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast3007.wikimedia.org
  • 11:58 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:58 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3007.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 11:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 11:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P50416 and previous config saved to /var/cache/conftool/dbconfig/20230810-115614-ladsgroup.json
  • 11:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 11:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 11:48 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-coord1001.eqiad.wmnet with reason: host reimage
  • 11:45 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1091.eqiad.wmnet with reason: host reimage
  • 11:45 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-coord1001.eqiad.wmnet with reason: host reimage
  • 11:45 taavi@deploy1002: Finished scap: Backport for GlobalRename: Ensure status database rows use the normalized name (T343956) (duration: 10m 17s)
  • 11:44 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3007.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:42 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1091.eqiad.wmnet with reason: host reimage
  • 11:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T342617)', diff saved to https://phabricator.wikimedia.org/P50415 and previous config saved to /var/cache/conftool/dbconfig/20230810-114108-ladsgroup.json
  • 11:40 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add manufacture to network devices - jbond@cumin1001 - T329669"
  • 11:39 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add manufacture to network devices - jbond@cumin1001 - T329669"
  • 11:39 taavi@deploy1002: taavi and urbanecm: Continuing with sync
  • 11:36 taavi@deploy1002: taavi and urbanecm: Backport for GlobalRename: Ensure status database rows use the normalized name (T343956) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 11:35 taavi@deploy1002: Started scap: Backport for GlobalRename: Ensure status database rows use the normalized name (T343956)
  • 11:34 taavi@deploy1002: Finished scap: Backport for throttle: remove expired rules, throttle: add rules for Wikimania 2023 (T343595) (duration: 11m 30s)
  • 11:32 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-coord1001.eqiad.wmnet with OS bullseye
  • 11:27 taavi@deploy1002: taavi: Continuing with sync
  • 11:24 taavi@deploy1002: taavi: Backport for throttle: remove expired rules, throttle: add rules for Wikimania 2023 (T343595) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 11:23 taavi@deploy1002: Started scap: Backport for throttle: remove expired rules, throttle: add rules for Wikimania 2023 (T343595)
  • 11:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum6002.drmrs.wmnet with OS bookworm
  • 11:14 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1091.eqiad.wmnet with OS bullseye
  • 10:55 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1090.eqiad.wmnet with OS bullseye
  • 10:46 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
  • 10:45 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply
  • 10:36 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply
  • 10:36 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum6002.drmrs.wmnet with reason: host reimage
  • 10:34 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/blubberoid: apply
  • 10:33 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
  • 10:32 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum6002.drmrs.wmnet with reason: host reimage
  • 10:32 jiji@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
  • 10:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1090.eqiad.wmnet with reason: host reimage
  • 10:29 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1090.eqiad.wmnet with reason: host reimage
  • 10:23 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1010.eqiad.wmnet
  • 10:17 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-tool1010.eqiad.wmnet
  • 10:17 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1007.eqiad.wmnet
  • 10:16 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1090.eqiad.wmnet with OS bullseye
  • 10:13 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-tool1007.eqiad.wmnet
  • 10:10 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum6002.drmrs.wmnet with OS bookworm
  • 09:12 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host karapace1001.eqiad.wmnet
  • 09:09 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-client1002.eqiad.wmnet
  • 09:09 urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=arwiki --logwiki=metawiki 'Qwertyoruiop' '3h6 1'
  • 09:08 urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiki --logwiki=metawiki 'Mittzy' 'Mittzy (usurped)'
  • 09:08 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host karapace1001.eqiad.wmnet
  • 09:07 urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=amwiki --logwiki=metawiki 'Jean-Mahmood' 'User92259453'
  • 09:07 urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki 'Garciajaysonpinolkwani98' 'Ne_Shokot_Pinolkwane'
  • 09:06 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1004.eqiad.wmnet
  • 09:06 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1002.eqiad.wmnet
  • 09:06 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-airflow1004.eqiad.wmnet
  • 09:05 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-airflow1002.eqiad.wmnet
  • 09:04 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-client1002.eqiad.wmnet
  • 09:04 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-db1001.eqiad.wmnet
  • 09:03 urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki 'CHUniZH' 'Musik CH' # T343867
  • 08:57 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-db1001.eqiad.wmnet
  • 08:46 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:42 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast3007.wikimedia.org
  • 08:36 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 08:26 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 08:21 godog: put back business hours americas for sre business hours escalation - T343812
  • 08:21 godog: put back business hours americas for sre business hours escalation
  • 08:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast5004.wikimedia.org
  • 08:00 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:00 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast5004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 07:59 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast5004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 07:52 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 07:48 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast5004.wikimedia.org
  • 07:19 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host bast5004.wikimedia.org
  • 07:19 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host bast5004.wikimedia.org with OS bookworm
  • 06:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2155 (T342617)', diff saved to https://phabricator.wikimedia.org/P50414 and previous config saved to /var/cache/conftool/dbconfig/20230810-063611-ladsgroup.json
  • 06:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 06:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 06:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 06:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 06:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T342617)', diff saved to https://phabricator.wikimedia.org/P50413 and previous config saved to /var/cache/conftool/dbconfig/20230810-063523-ladsgroup.json
  • 06:23 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart (exit_code=0) rolling restart_daemons on A:maps-replica-eqiad
  • 06:20 jmm@cumin2002: START - Cookbook sre.maps.roll-restart rolling restart_daemons on A:maps-replica-eqiad
  • 06:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P50412 and previous config saved to /var/cache/conftool/dbconfig/20230810-062017-ladsgroup.json
  • 06:08 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart (exit_code=0) rolling restart_daemons on A:maps-replica-codfw
  • 06:05 jmm@cumin2002: START - Cookbook sre.maps.roll-restart rolling restart_daemons on A:maps-replica-codfw
  • 06:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P50411 and previous config saved to /var/cache/conftool/dbconfig/20230810-060511-ladsgroup.json
  • 05:59 moritzm: installing tiff security updates
  • 05:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T342617)', diff saved to https://phabricator.wikimedia.org/P50410 and previous config saved to /var/cache/conftool/dbconfig/20230810-055005-ladsgroup.json
  • 05:32 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast5004.wikimedia.org with OS bookworm
  • 05:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast5004.wikimedia.org - jmm@cumin2002"
  • 05:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast5004.wikimedia.org - jmm@cumin2002"
  • 05:30 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) bast5004.wikimedia.org on all recursors
  • 05:30 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast5004.wikimedia.org on all recursors
  • 05:30 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 05:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast5004.wikimedia.org - jmm@cumin2002"
  • 05:29 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast5004.wikimedia.org - jmm@cumin2002"
  • 05:27 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 05:27 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast5004.wikimedia.org
  • 05:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1015.eqiad.wmnet
  • 04:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T342617)', diff saved to https://phabricator.wikimedia.org/P50409 and previous config saved to /var/cache/conftool/dbconfig/20230810-044643-ladsgroup.json
  • 04:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 04:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 04:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T342617)', diff saved to https://phabricator.wikimedia.org/P50408 and previous config saved to /var/cache/conftool/dbconfig/20230810-044622-ladsgroup.json
  • 04:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P50407 and previous config saved to /var/cache/conftool/dbconfig/20230810-043116-ladsgroup.json
  • 04:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P50406 and previous config saved to /var/cache/conftool/dbconfig/20230810-041610-ladsgroup.json
  • 04:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T342617)', diff saved to https://phabricator.wikimedia.org/P50405 and previous config saved to /var/cache/conftool/dbconfig/20230810-040104-ladsgroup.json
  • 03:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 03:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 02:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 02:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 02:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T342617)', diff saved to https://phabricator.wikimedia.org/P50404 and previous config saved to /var/cache/conftool/dbconfig/20230810-024531-ladsgroup.json
  • 02:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P50403 and previous config saved to /var/cache/conftool/dbconfig/20230810-023025-ladsgroup.json
  • 02:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P50402 and previous config saved to /var/cache/conftool/dbconfig/20230810-021518-ladsgroup.json
  • 02:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T342617)', diff saved to https://phabricator.wikimedia.org/P50401 and previous config saved to /var/cache/conftool/dbconfig/20230810-020012-ladsgroup.json
  • 01:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T342617)', diff saved to https://phabricator.wikimedia.org/P50400 and previous config saved to /var/cache/conftool/dbconfig/20230810-014731-ladsgroup.json
  • 01:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P50399 and previous config saved to /var/cache/conftool/dbconfig/20230810-013225-ladsgroup.json
  • 01:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P50398 and previous config saved to /var/cache/conftool/dbconfig/20230810-011718-ladsgroup.json
  • 01:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1214 (T342617)', diff saved to https://phabricator.wikimedia.org/P50397 and previous config saved to /var/cache/conftool/dbconfig/20230810-011228-ladsgroup.json
  • 01:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 01:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 01:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T342617)', diff saved to https://phabricator.wikimedia.org/P50396 and previous config saved to /var/cache/conftool/dbconfig/20230810-011207-ladsgroup.json
  • 01:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T342617)', diff saved to https://phabricator.wikimedia.org/P50395 and previous config saved to /var/cache/conftool/dbconfig/20230810-010212-ladsgroup.json
  • 00:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P50394 and previous config saved to /var/cache/conftool/dbconfig/20230810-005701-ladsgroup.json
  • 00:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P50393 and previous config saved to /var/cache/conftool/dbconfig/20230810-004154-ladsgroup.json
  • 00:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T342617)', diff saved to https://phabricator.wikimedia.org/P50392 and previous config saved to /var/cache/conftool/dbconfig/20230810-002648-ladsgroup.json
  • 00:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2181 (T342617)', diff saved to https://phabricator.wikimedia.org/P50391 and previous config saved to /var/cache/conftool/dbconfig/20230810-001437-ladsgroup.json
  • 00:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 00:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 00:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T342617)', diff saved to https://phabricator.wikimedia.org/P50390 and previous config saved to /var/cache/conftool/dbconfig/20230810-001414-ladsgroup.json

2023-08-09

  • 23:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P50389 and previous config saved to /var/cache/conftool/dbconfig/20230809-235908-ladsgroup.json
  • 23:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P50388 and previous config saved to /var/cache/conftool/dbconfig/20230809-234402-ladsgroup.json
  • 23:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1211 (T342617)', diff saved to https://phabricator.wikimedia.org/P50387 and previous config saved to /var/cache/conftool/dbconfig/20230809-234146-ladsgroup.json
  • 23:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 23:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 23:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T342617)', diff saved to https://phabricator.wikimedia.org/P50386 and previous config saved to /var/cache/conftool/dbconfig/20230809-234125-ladsgroup.json
  • 23:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T342617)', diff saved to https://phabricator.wikimedia.org/P50385 and previous config saved to /var/cache/conftool/dbconfig/20230809-232855-ladsgroup.json
  • 23:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P50384 and previous config saved to /var/cache/conftool/dbconfig/20230809-232619-ladsgroup.json
  • 23:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P50383 and previous config saved to /var/cache/conftool/dbconfig/20230809-231112-ladsgroup.json
  • 23:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2147 (T342617)', diff saved to https://phabricator.wikimedia.org/P50382 and previous config saved to /var/cache/conftool/dbconfig/20230809-230339-ladsgroup.json
  • 23:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 23:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 22:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T342617)', diff saved to https://phabricator.wikimedia.org/P50381 and previous config saved to /var/cache/conftool/dbconfig/20230809-225605-ladsgroup.json
  • 22:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3318 (T342617)', diff saved to https://phabricator.wikimedia.org/P50380 and previous config saved to /var/cache/conftool/dbconfig/20230809-224114-ladsgroup.json
  • 22:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 22:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 22:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T342617)', diff saved to https://phabricator.wikimedia.org/P50379 and previous config saved to /var/cache/conftool/dbconfig/20230809-224053-ladsgroup.json
  • 22:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P50378 and previous config saved to /var/cache/conftool/dbconfig/20230809-222547-ladsgroup.json
  • 22:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P50377 and previous config saved to /var/cache/conftool/dbconfig/20230809-221041-ladsgroup.json
  • 22:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1209 (T342617)', diff saved to https://phabricator.wikimedia.org/P50376 and previous config saved to /var/cache/conftool/dbconfig/20230809-220433-ladsgroup.json
  • 22:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 22:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 22:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T342617)', diff saved to https://phabricator.wikimedia.org/P50375 and previous config saved to /var/cache/conftool/dbconfig/20230809-220412-ladsgroup.json
  • 21:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T342617)', diff saved to https://phabricator.wikimedia.org/P50373 and previous config saved to /var/cache/conftool/dbconfig/20230809-215535-ladsgroup.json
  • 21:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P50372 and previous config saved to /var/cache/conftool/dbconfig/20230809-214905-ladsgroup.json
  • 21:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P50371 and previous config saved to /var/cache/conftool/dbconfig/20230809-213359-ladsgroup.json
  • 21:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T342617)', diff saved to https://phabricator.wikimedia.org/P50369 and previous config saved to /var/cache/conftool/dbconfig/20230809-212042-ladsgroup.json
  • 21:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 21:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 21:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T342617)', diff saved to https://phabricator.wikimedia.org/P50368 and previous config saved to /var/cache/conftool/dbconfig/20230809-212021-ladsgroup.json
  • 21:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T342617)', diff saved to https://phabricator.wikimedia.org/P50367 and previous config saved to /var/cache/conftool/dbconfig/20230809-211853-ladsgroup.json
  • 21:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3318 (T342617)', diff saved to https://phabricator.wikimedia.org/P50366 and previous config saved to /var/cache/conftool/dbconfig/20230809-210856-ladsgroup.json
  • 21:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 21:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 21:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T342617)', diff saved to https://phabricator.wikimedia.org/P50365 and previous config saved to /var/cache/conftool/dbconfig/20230809-210835-ladsgroup.json
  • 21:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P50364 and previous config saved to /var/cache/conftool/dbconfig/20230809-210514-ladsgroup.json
  • 20:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P50363 and previous config saved to /var/cache/conftool/dbconfig/20230809-205329-ladsgroup.json
  • 20:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P50362 and previous config saved to /var/cache/conftool/dbconfig/20230809-205008-ladsgroup.json
  • 20:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P50361 and previous config saved to /var/cache/conftool/dbconfig/20230809-203822-ladsgroup.json
  • 20:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T343718)', diff saved to https://phabricator.wikimedia.org/P50360 and previous config saved to /var/cache/conftool/dbconfig/20230809-203731-ladsgroup.json
  • 20:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T342617)', diff saved to https://phabricator.wikimedia.org/P50359 and previous config saved to /var/cache/conftool/dbconfig/20230809-203502-ladsgroup.json
  • 20:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1203 (T342617)', diff saved to https://phabricator.wikimedia.org/P50358 and previous config saved to /var/cache/conftool/dbconfig/20230809-203041-ladsgroup.json
  • 20:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 20:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 20:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T342617)', diff saved to https://phabricator.wikimedia.org/P50357 and previous config saved to /var/cache/conftool/dbconfig/20230809-203020-ladsgroup.json
  • 20:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T342617)', diff saved to https://phabricator.wikimedia.org/P50356 and previous config saved to /var/cache/conftool/dbconfig/20230809-202316-ladsgroup.json
  • 20:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P50355 and previous config saved to /var/cache/conftool/dbconfig/20230809-202225-ladsgroup.json
  • 20:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P50354 and previous config saved to /var/cache/conftool/dbconfig/20230809-201514-ladsgroup.json
  • 20:09 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts contint2001.wikimedia.org
  • 20:09 aokoth@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:09 aokoth@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: contint2001.wikimedia.org decommissioned, removing all IPs except the asset tag one - aokoth@cumin1001"
  • 20:08 aokoth@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: contint2001.wikimedia.org decommissioned, removing all IPs except the asset tag one - aokoth@cumin1001"
  • 20:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P50353 and previous config saved to /var/cache/conftool/dbconfig/20230809-200718-ladsgroup.json
  • 20:05 aokoth@cumin1001: START - Cookbook sre.dns.netbox
  • 20:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P50352 and previous config saved to /var/cache/conftool/dbconfig/20230809-200007-ladsgroup.json
  • 19:59 aokoth@cumin1001: START - Cookbook sre.hosts.decommission for hosts contint2001.wikimedia.org
  • 19:58 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on contint2001.wikimedia.org with reason: Decommissioning
  • 19:58 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on contint2001.wikimedia.org with reason: Decommissioning
  • 19:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T343718)', diff saved to https://phabricator.wikimedia.org/P50351 and previous config saved to /var/cache/conftool/dbconfig/20230809-195212-ladsgroup.json
  • 19:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T342617)', diff saved to https://phabricator.wikimedia.org/P50350 and previous config saved to /var/cache/conftool/dbconfig/20230809-194501-ladsgroup.json
  • 19:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2166 (T342617)', diff saved to https://phabricator.wikimedia.org/P50349 and previous config saved to /var/cache/conftool/dbconfig/20230809-193623-ladsgroup.json
  • 19:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 19:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 19:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T342617)', diff saved to https://phabricator.wikimedia.org/P50348 and previous config saved to /var/cache/conftool/dbconfig/20230809-193559-ladsgroup.json
  • 19:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T343718)', diff saved to https://phabricator.wikimedia.org/P50347 and previous config saved to /var/cache/conftool/dbconfig/20230809-192818-ladsgroup.json
  • 19:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 19:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 19:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T343718)', diff saved to https://phabricator.wikimedia.org/P50346 and previous config saved to /var/cache/conftool/dbconfig/20230809-192746-ladsgroup.json
  • 19:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P50345 and previous config saved to /var/cache/conftool/dbconfig/20230809-192053-ladsgroup.json
  • 19:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P50344 and previous config saved to /var/cache/conftool/dbconfig/20230809-191240-ladsgroup.json
  • 19:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P50343 and previous config saved to /var/cache/conftool/dbconfig/20230809-190547-ladsgroup.json
  • 18:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1193 (T342617)', diff saved to https://phabricator.wikimedia.org/P50342 and previous config saved to /var/cache/conftool/dbconfig/20230809-185805-ladsgroup.json
  • 18:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 18:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 18:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T342617)', diff saved to https://phabricator.wikimedia.org/P50341 and previous config saved to /var/cache/conftool/dbconfig/20230809-185745-ladsgroup.json
  • 18:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P50340 and previous config saved to /var/cache/conftool/dbconfig/20230809-185734-ladsgroup.json
  • 18:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T342617)', diff saved to https://phabricator.wikimedia.org/P50339 and previous config saved to /var/cache/conftool/dbconfig/20230809-185040-ladsgroup.json
  • 18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P50338 and previous config saved to /var/cache/conftool/dbconfig/20230809-184238-ladsgroup.json
  • 18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T343718)', diff saved to https://phabricator.wikimedia.org/P50337 and previous config saved to /var/cache/conftool/dbconfig/20230809-184228-ladsgroup.json
  • 18:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T343718)', diff saved to https://phabricator.wikimedia.org/P50336 and previous config saved to /var/cache/conftool/dbconfig/20230809-184018-ladsgroup.json
  • 18:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 18:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 18:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 18:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 18:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T343718)', diff saved to https://phabricator.wikimedia.org/P50335 and previous config saved to /var/cache/conftool/dbconfig/20230809-183952-ladsgroup.json
  • 18:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P50334 and previous config saved to /var/cache/conftool/dbconfig/20230809-182726-ladsgroup.json
  • 18:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P50333 and previous config saved to /var/cache/conftool/dbconfig/20230809-182446-ladsgroup.json
  • 18:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T342617)', diff saved to https://phabricator.wikimedia.org/P50332 and previous config saved to /var/cache/conftool/dbconfig/20230809-181219-ladsgroup.json
  • 18:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P50331 and previous config saved to /var/cache/conftool/dbconfig/20230809-180940-ladsgroup.json
  • 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2165 (T342617)', diff saved to https://phabricator.wikimedia.org/P50330 and previous config saved to /var/cache/conftool/dbconfig/20230809-180143-ladsgroup.json
  • 18:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 18:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T342617)', diff saved to https://phabricator.wikimedia.org/P50329 and previous config saved to /var/cache/conftool/dbconfig/20230809-180122-ladsgroup.json
  • 17:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T343718)', diff saved to https://phabricator.wikimedia.org/P50328 and previous config saved to /var/cache/conftool/dbconfig/20230809-175434-ladsgroup.json
  • 17:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P50327 and previous config saved to /var/cache/conftool/dbconfig/20230809-174616-ladsgroup.json
  • 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P50326 and previous config saved to /var/cache/conftool/dbconfig/20230809-173110-ladsgroup.json
  • 17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T343718)', diff saved to https://phabricator.wikimedia.org/P50325 and previous config saved to /var/cache/conftool/dbconfig/20230809-172803-ladsgroup.json
  • 17:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 17:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 17:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1192 (T342617)', diff saved to https://phabricator.wikimedia.org/P50324 and previous config saved to /var/cache/conftool/dbconfig/20230809-172507-ladsgroup.json
  • 17:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 17:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 17:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T342617)', diff saved to https://phabricator.wikimedia.org/P50323 and previous config saved to /var/cache/conftool/dbconfig/20230809-172447-ladsgroup.json
  • 17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T342617)', diff saved to https://phabricator.wikimedia.org/P50322 and previous config saved to /var/cache/conftool/dbconfig/20230809-171604-ladsgroup.json
  • 17:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 17:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 17:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T342617)', diff saved to https://phabricator.wikimedia.org/P50321 and previous config saved to /var/cache/conftool/dbconfig/20230809-171533-ladsgroup.json
  • 17:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P50320 and previous config saved to /var/cache/conftool/dbconfig/20230809-170940-ladsgroup.json
  • 17:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 17:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 17:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T343718)', diff saved to https://phabricator.wikimedia.org/P50319 and previous config saved to /var/cache/conftool/dbconfig/20230809-170351-ladsgroup.json
  • 17:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P50318 and previous config saved to /var/cache/conftool/dbconfig/20230809-170027-ladsgroup.json
  • 16:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P50317 and previous config saved to /var/cache/conftool/dbconfig/20230809-165434-ladsgroup.json
  • 16:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P50316 and previous config saved to /var/cache/conftool/dbconfig/20230809-164844-ladsgroup.json
  • 16:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P50315 and previous config saved to /var/cache/conftool/dbconfig/20230809-164520-ladsgroup.json
  • 16:44 elukey: temporarly bump miscweb bugzilla pods from 4 to 8 in k8s wikikube codfw
  • 16:42 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 16:41 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 16:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T342617)', diff saved to https://phabricator.wikimedia.org/P50314 and previous config saved to /var/cache/conftool/dbconfig/20230809-163928-ladsgroup.json
  • 16:38 elukey: temporarly bump miscweb bugzilla pods from 2 to 4 in k8s wikikube codfw
  • 16:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P50313 and previous config saved to /var/cache/conftool/dbconfig/20230809-163338-ladsgroup.json
  • 16:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T342617)', diff saved to https://phabricator.wikimedia.org/P50312 and previous config saved to /var/cache/conftool/dbconfig/20230809-163014-ladsgroup.json
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2164 (T342617)', diff saved to https://phabricator.wikimedia.org/P50311 and previous config saved to /var/cache/conftool/dbconfig/20230809-162913-ladsgroup.json
  • 16:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 16:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 16:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 16:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 16:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T342617)', diff saved to https://phabricator.wikimedia.org/P50310 and previous config saved to /var/cache/conftool/dbconfig/20230809-162836-ladsgroup.json
  • 16:22 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 16:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T343718)', diff saved to https://phabricator.wikimedia.org/P50308 and previous config saved to /var/cache/conftool/dbconfig/20230809-161832-ladsgroup.json
  • 16:17 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 16:15 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-test-master1001.eqiad.wmnet with OS bullseye
  • 16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P50307 and previous config saved to /var/cache/conftool/dbconfig/20230809-161330-ladsgroup.json
  • 15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P50306 and previous config saved to /var/cache/conftool/dbconfig/20230809-155824-ladsgroup.json
  • 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T343718)', diff saved to https://phabricator.wikimedia.org/P50305 and previous config saved to /var/cache/conftool/dbconfig/20230809-155137-ladsgroup.json
  • 15:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T342617)', diff saved to https://phabricator.wikimedia.org/P50304 and previous config saved to /var/cache/conftool/dbconfig/20230809-155127-ladsgroup.json
  • 15:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 15:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T343718)', diff saved to https://phabricator.wikimedia.org/P50303 and previous config saved to /var/cache/conftool/dbconfig/20230809-155116-ladsgroup.json
  • 15:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T342617)', diff saved to https://phabricator.wikimedia.org/P50302 and previous config saved to /var/cache/conftool/dbconfig/20230809-155106-ladsgroup.json
  • 15:50 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-master1001.eqiad.wmnet with reason: host reimage
  • 15:49 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:48 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 15:47 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 15:47 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 15:47 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-master1001.eqiad.wmnet with reason: host reimage
  • 15:45 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:44 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:43 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1089.eqiad.wmnet with OS bullseye
  • 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T342617)', diff saved to https://phabricator.wikimedia.org/P50301 and previous config saved to /var/cache/conftool/dbconfig/20230809-154317-ladsgroup.json
  • 15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P50300 and previous config saved to /var/cache/conftool/dbconfig/20230809-153610-ladsgroup.json
  • 15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P50299 and previous config saved to /var/cache/conftool/dbconfig/20230809-153600-ladsgroup.json
  • 15:29 hnowlan: disabling puppet on A:cp to test r/947372
  • 15:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P50298 and previous config saved to /var/cache/conftool/dbconfig/20230809-152103-ladsgroup.json
  • 15:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P50297 and previous config saved to /var/cache/conftool/dbconfig/20230809-152053-ladsgroup.json
  • 15:20 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1089.eqiad.wmnet with reason: host reimage
  • 15:17 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1089.eqiad.wmnet with reason: host reimage
  • 15:06 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-master1001.eqiad.wmnet with OS bullseye
  • 15:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T343718)', diff saved to https://phabricator.wikimedia.org/P50295 and previous config saved to /var/cache/conftool/dbconfig/20230809-150557-ladsgroup.json
  • 15:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T342617)', diff saved to https://phabricator.wikimedia.org/P50294 and previous config saved to /var/cache/conftool/dbconfig/20230809-150547-ladsgroup.json
  • 15:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T343718)', diff saved to https://phabricator.wikimedia.org/P50293 and previous config saved to /var/cache/conftool/dbconfig/20230809-150443-ladsgroup.json
  • 15:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 15:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 14:58 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum6002.drmrs.wmnet with OS bookworm
  • 14:57 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1089.eqiad.wmnet with OS bullseye
  • 14:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2163 (T342617)', diff saved to https://phabricator.wikimedia.org/P50292 and previous config saved to /var/cache/conftool/dbconfig/20230809-145714-ladsgroup.json
  • 14:57 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1088.eqiad.wmnet with OS bullseye
  • 14:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 14:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 14:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T342617)', diff saved to https://phabricator.wikimedia.org/P50291 and previous config saved to /var/cache/conftool/dbconfig/20230809-145653-ladsgroup.json
  • 14:49 TheresNoTime: `[samtar@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki wikifunctionswiki --fix` for T342964
  • 14:48 samtar@deploy1002: Finished scap: Backport for core-namespaces: Remove dupe wikifunctions alias (T342964) (duration: 14m 21s)
  • 14:42 samtar@deploy1002: samtar: Continuing with sync
  • 14:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P50290 and previous config saved to /var/cache/conftool/dbconfig/20230809-144147-ladsgroup.json
  • 14:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 14:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 14:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T343718)', diff saved to https://phabricator.wikimedia.org/P50289 and previous config saved to /var/cache/conftool/dbconfig/20230809-144022-ladsgroup.json
  • 14:36 samtar@deploy1002: samtar: Backport for core-namespaces: Remove dupe wikifunctions alias (T342964) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 14:34 samtar@deploy1002: Started scap: Backport for core-namespaces: Remove dupe wikifunctions alias (T342964)
  • 14:34 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1088.eqiad.wmnet with reason: host reimage
  • 14:31 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1088.eqiad.wmnet with reason: host reimage
  • 14:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P50288 and previous config saved to /var/cache/conftool/dbconfig/20230809-142640-ladsgroup.json
  • 14:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P50287 and previous config saved to /var/cache/conftool/dbconfig/20230809-142515-ladsgroup.json
  • 14:24 moritzm: installing sudo bugfix updates from Bookworm 12.1 point release
  • 14:21 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum6002.drmrs.wmnet with reason: host reimage
  • 14:18 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1088.eqiad.wmnet with OS bullseye
  • 14:17 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum6002.drmrs.wmnet with reason: host reimage
  • 14:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T342617)', diff saved to https://phabricator.wikimedia.org/P50285 and previous config saved to /var/cache/conftool/dbconfig/20230809-141134-ladsgroup.json
  • 14:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P50284 and previous config saved to /var/cache/conftool/dbconfig/20230809-141009-ladsgroup.json
  • 14:09 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-test-master1002.eqiad.wmnet with OS bullseye
  • 14:07 moritzm: restarting FPM on mediawiki canaries to pick up tiff update
  • 14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T342617)', diff saved to https://phabricator.wikimedia.org/P50283 and previous config saved to /var/cache/conftool/dbconfig/20230809-140551-ladsgroup.json
  • 14:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 14:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T342617)', diff saved to https://phabricator.wikimedia.org/P50282 and previous config saved to /var/cache/conftool/dbconfig/20230809-140531-ladsgroup.json
  • 13:56 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1087.eqiad.wmnet with OS bullseye
  • 13:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T343718)', diff saved to https://phabricator.wikimedia.org/P50281 and previous config saved to /var/cache/conftool/dbconfig/20230809-135503-ladsgroup.json
  • 13:54 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 13:54 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum6002.drmrs.wmnet with OS bookworm
  • 13:54 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1223 (T343718)', diff saved to https://phabricator.wikimedia.org/P50280 and previous config saved to /var/cache/conftool/dbconfig/20230809-135356-ladsgroup.json
  • 13:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 13:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T343718)', diff saved to https://phabricator.wikimedia.org/P50279 and previous config saved to /var/cache/conftool/dbconfig/20230809-135324-ladsgroup.json
  • 13:52 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 13:52 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 13:52 moritzm: installing tiff security updates
  • 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P50278 and previous config saved to /var/cache/conftool/dbconfig/20230809-135024-ladsgroup.json
  • 13:49 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-master1002.eqiad.wmnet with reason: host reimage
  • 13:47 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-master1002.eqiad.wmnet with reason: host reimage
  • 13:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T342617)', diff saved to https://phabricator.wikimedia.org/P50277 and previous config saved to /var/cache/conftool/dbconfig/20230809-134136-ladsgroup.json
  • 13:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 13:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 13:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T342617)', diff saved to https://phabricator.wikimedia.org/P50276 and previous config saved to /var/cache/conftool/dbconfig/20230809-134115-ladsgroup.json
  • 13:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P50275 and previous config saved to /var/cache/conftool/dbconfig/20230809-133818-ladsgroup.json
  • 13:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P50274 and previous config saved to /var/cache/conftool/dbconfig/20230809-133518-ladsgroup.json
  • 13:33 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-master1002.eqiad.wmnet with OS bullseye
  • 13:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1087.eqiad.wmnet with reason: host reimage
  • 13:29 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1087.eqiad.wmnet with reason: host reimage
  • 13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P50273 and previous config saved to /var/cache/conftool/dbconfig/20230809-132609-ladsgroup.json
  • 13:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2162 (T342617)', diff saved to https://phabricator.wikimedia.org/P50272 and previous config saved to /var/cache/conftool/dbconfig/20230809-132446-ladsgroup.json
  • 13:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 13:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 13:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T342617)', diff saved to https://phabricator.wikimedia.org/P50271 and previous config saved to /var/cache/conftool/dbconfig/20230809-132424-ladsgroup.json
  • 13:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P50270 and previous config saved to /var/cache/conftool/dbconfig/20230809-132312-ladsgroup.json
  • 13:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T342617)', diff saved to https://phabricator.wikimedia.org/P50269 and previous config saved to /var/cache/conftool/dbconfig/20230809-132012-ladsgroup.json
  • 13:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 13:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 13:12 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1087.eqiad.wmnet with OS bullseye
  • 13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P50268 and previous config saved to /var/cache/conftool/dbconfig/20230809-131103-ladsgroup.json
  • 13:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P50267 and previous config saved to /var/cache/conftool/dbconfig/20230809-130918-ladsgroup.json
  • 13:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T343718)', diff saved to https://phabricator.wikimedia.org/P50266 and previous config saved to /var/cache/conftool/dbconfig/20230809-130805-ladsgroup.json
  • 13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1212 (T343718)', diff saved to https://phabricator.wikimedia.org/P50265 and previous config saved to /var/cache/conftool/dbconfig/20230809-130557-ladsgroup.json
  • 13:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 13:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 13:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 13:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T343718)', diff saved to https://phabricator.wikimedia.org/P50264 and previous config saved to /var/cache/conftool/dbconfig/20230809-130518-ladsgroup.json
  • 12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T342617)', diff saved to https://phabricator.wikimedia.org/P50263 and previous config saved to /var/cache/conftool/dbconfig/20230809-125555-ladsgroup.json
  • 12:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P50262 and previous config saved to /var/cache/conftool/dbconfig/20230809-125412-ladsgroup.json
  • 12:53 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/apertium: apply
  • 12:53 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/apertium: apply
  • 12:52 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
  • 12:51 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/apertium: apply
  • 12:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P50261 and previous config saved to /var/cache/conftool/dbconfig/20230809-125012-ladsgroup.json
  • 12:49 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 12:49 dcausse: restarting blazegraph on wdqs1007 (BlazegraphFreeAllocatorsDecreasingRapidly)
  • 12:48 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 12:48 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 12:48 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 12:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 12:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 12:40 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1086.eqiad.wmnet with OS bullseye
  • 12:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T342617)', diff saved to https://phabricator.wikimedia.org/P50260 and previous config saved to /var/cache/conftool/dbconfig/20230809-123906-ladsgroup.json
  • 12:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P50259 and previous config saved to /var/cache/conftool/dbconfig/20230809-123506-ladsgroup.json
  • 12:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T343718)', diff saved to https://phabricator.wikimedia.org/P50258 and previous config saved to /var/cache/conftool/dbconfig/20230809-122000-ladsgroup.json
  • 12:19 jayme@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 12:19 jayme@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 12:18 jayme@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 12:18 jayme@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 12:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1198 (T343718)', diff saved to https://phabricator.wikimedia.org/P50257 and previous config saved to /var/cache/conftool/dbconfig/20230809-121852-ladsgroup.json
  • 12:18 jayme@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 12:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 12:18 jayme@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 12:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 12:18 jayme@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 12:18 jayme@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 12:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T343718)', diff saved to https://phabricator.wikimedia.org/P50256 and previous config saved to /var/cache/conftool/dbconfig/20230809-121831-ladsgroup.json
  • 12:17 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1086.eqiad.wmnet with reason: host reimage
  • 12:14 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1086.eqiad.wmnet with reason: host reimage
  • 12:13 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 12:12 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 12:12 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 12:11 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 12:11 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 12:11 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 12:11 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 12:11 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 12:10 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 12:09 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 12:08 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 12:08 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 12:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P50255 and previous config saved to /var/cache/conftool/dbconfig/20230809-120325-ladsgroup.json
  • 12:01 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1086.eqiad.wmnet with OS bullseye
  • 12:01 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1085.eqiad.wmnet with OS bullseye
  • 11:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T342617)', diff saved to https://phabricator.wikimedia.org/P50254 and previous config saved to /var/cache/conftool/dbconfig/20230809-115534-ladsgroup.json
  • 11:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 11:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 11:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2154 (T342617)', diff saved to https://phabricator.wikimedia.org/P50253 and previous config saved to /var/cache/conftool/dbconfig/20230809-115227-ladsgroup.json
  • 11:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 11:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 11:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T342617)', diff saved to https://phabricator.wikimedia.org/P50252 and previous config saved to /var/cache/conftool/dbconfig/20230809-115206-ladsgroup.json
  • 11:49 ladsgroup@deploy1002: Finished scap: Backport for sdwiki: set 'wgTranslateNumerals' to false (T268203) (duration: 09m 22s)
  • 11:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P50251 and previous config saved to /var/cache/conftool/dbconfig/20230809-114819-ladsgroup.json
  • 11:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 11:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 11:41 ladsgroup@deploy1002: kaleembhatti and ladsgroup: Continuing with sync
  • 11:41 ladsgroup@deploy1002: kaleembhatti and ladsgroup: Backport for sdwiki: set 'wgTranslateNumerals' to false (T268203) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 11:39 ladsgroup@deploy1002: Started scap: Backport for sdwiki: set 'wgTranslateNumerals' to false (T268203)
  • 11:38 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1085.eqiad.wmnet with reason: host reimage
  • 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P50250 and previous config saved to /var/cache/conftool/dbconfig/20230809-113659-ladsgroup.json
  • 11:35 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1085.eqiad.wmnet with reason: host reimage
  • 11:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T343718)', diff saved to https://phabricator.wikimedia.org/P50249 and previous config saved to /var/cache/conftool/dbconfig/20230809-113312-ladsgroup.json
  • 11:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T343718)', diff saved to https://phabricator.wikimedia.org/P50248 and previous config saved to /var/cache/conftool/dbconfig/20230809-113205-ladsgroup.json
  • 11:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 11:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 11:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T343718)', diff saved to https://phabricator.wikimedia.org/P50247 and previous config saved to /var/cache/conftool/dbconfig/20230809-113144-ladsgroup.json
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P50246 and previous config saved to /var/cache/conftool/dbconfig/20230809-112153-ladsgroup.json
  • 11:20 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1085.eqiad.wmnet with OS bullseye
  • 11:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P50245 and previous config saved to /var/cache/conftool/dbconfig/20230809-111638-ladsgroup.json
  • 11:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 11:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T342617)', diff saved to https://phabricator.wikimedia.org/P50244 and previous config saved to /var/cache/conftool/dbconfig/20230809-111141-ladsgroup.json
  • 11:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T342617)', diff saved to https://phabricator.wikimedia.org/P50243 and previous config saved to /var/cache/conftool/dbconfig/20230809-110647-ladsgroup.json
  • 11:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 11:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 11:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P50242 and previous config saved to /var/cache/conftool/dbconfig/20230809-110132-ladsgroup.json
  • 10:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P50241 and previous config saved to /var/cache/conftool/dbconfig/20230809-105635-ladsgroup.json
  • 10:56 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 10:55 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 10:55 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 10:55 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 10:54 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 10:54 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 10:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T343718)', diff saved to https://phabricator.wikimedia.org/P50240 and previous config saved to /var/cache/conftool/dbconfig/20230809-104625-ladsgroup.json
  • 10:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T343718)', diff saved to https://phabricator.wikimedia.org/P50239 and previous config saved to /var/cache/conftool/dbconfig/20230809-104518-ladsgroup.json
  • 10:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 10:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 10:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T343718)', diff saved to https://phabricator.wikimedia.org/P50238 and previous config saved to /var/cache/conftool/dbconfig/20230809-104457-ladsgroup.json
  • 10:44 _joe_: ran requestctl commit, which removed the comma removal from the requestctl output as per T305582
  • 10:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P50237 and previous config saved to /var/cache/conftool/dbconfig/20230809-104128-ladsgroup.json
  • 10:36 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1084.eqiad.wmnet with OS bullseye
  • 10:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P50236 and previous config saved to /var/cache/conftool/dbconfig/20230809-102951-ladsgroup.json
  • 10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T342617)', diff saved to https://phabricator.wikimedia.org/P50235 and previous config saved to /var/cache/conftool/dbconfig/20230809-102622-ladsgroup.json
  • 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2152 (T342617)', diff saved to https://phabricator.wikimedia.org/P50234 and previous config saved to /var/cache/conftool/dbconfig/20230809-101946-ladsgroup.json
  • 10:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 10:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P50233 and previous config saved to /var/cache/conftool/dbconfig/20230809-101444-ladsgroup.json
  • 10:14 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-coord1002.eqiad.wmnet
  • 10:12 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1084.eqiad.wmnet with reason: host reimage
  • 10:09 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1084.eqiad.wmnet with reason: host reimage
  • 10:08 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-coord1002.eqiad.wmnet
  • 10:07 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 10:07 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 10:05 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-master1002.eqiad.wmnet
  • 09:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T343718)', diff saved to https://phabricator.wikimedia.org/P50232 and previous config saved to /var/cache/conftool/dbconfig/20230809-095938-ladsgroup.json
  • 09:58 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-master1002.eqiad.wmnet
  • 09:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T343718)', diff saved to https://phabricator.wikimedia.org/P50231 and previous config saved to /var/cache/conftool/dbconfig/20230809-095730-ladsgroup.json
  • 09:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 09:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 09:55 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 09:55 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 09:55 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 09:55 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 09:54 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1084.eqiad.wmnet with OS bullseye
  • 09:48 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/apertium: apply
  • 09:48 jayme@deploy1002: helmfile [staging] START helmfile.d/services/apertium: apply
  • 09:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T342617)', diff saved to https://phabricator.wikimedia.org/P50230 and previous config saved to /var/cache/conftool/dbconfig/20230809-093715-ladsgroup.json
  • 09:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 09:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 09:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 09:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 09:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T342617)', diff saved to https://phabricator.wikimedia.org/P50229 and previous config saved to /var/cache/conftool/dbconfig/20230809-093341-ladsgroup.json
  • 09:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 09:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 09:31 hnowlan: disabling puppet on A:cp to test 945558
  • 09:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 09:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 09:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 09:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3314 (T342617)', diff saved to https://phabricator.wikimedia.org/P50228 and previous config saved to /var/cache/conftool/dbconfig/20230809-092319-ladsgroup.json
  • 09:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 09:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T342617)', diff saved to https://phabricator.wikimedia.org/P50227 and previous config saved to /var/cache/conftool/dbconfig/20230809-092258-ladsgroup.json
  • 09:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P50226 and previous config saved to /var/cache/conftool/dbconfig/20230809-090750-ladsgroup.json
  • 09:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 09:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 09:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 09:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 09:02 ladsgroup@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 09:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 08:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P50225 and previous config saved to /var/cache/conftool/dbconfig/20230809-085244-ladsgroup.json
  • 08:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T342617)', diff saved to https://phabricator.wikimedia.org/P50224 and previous config saved to /var/cache/conftool/dbconfig/20230809-083738-ladsgroup.json
  • 08:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 08:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 08:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T342617)', diff saved to https://phabricator.wikimedia.org/P50223 and previous config saved to /var/cache/conftool/dbconfig/20230809-083319-ladsgroup.json
  • 08:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 08:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 08:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 08:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 07:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1003.eqiad.wmnet
  • 07:52 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1003.eqiad.wmnet
  • 07:12 kartik@deploy1002: Finished scap: Backport for testwiki: Enable Section Translation for 7 Wikipedias (T343211) (duration: 09m 58s)
  • 07:05 kartik@deploy1002: kartik: Continuing with sync
  • 07:03 kartik@deploy1002: kartik: Backport for testwiki: Enable Section Translation for 7 Wikipedias (T343211) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:02 kartik@deploy1002: Started scap: Backport for testwiki: Enable Section Translation for 7 Wikipedias (T343211)
  • 06:52 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jkieserman out of all services on: 33 hosts
  • 06:51 root@cumin2002: START - Cookbook sre.idm.logout Logging Jkieserman out of all services on: 33 hosts
  • 06:51 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jkieserman out of all services on: 716 hosts
  • 06:51 root@cumin2002: START - Cookbook sre.idm.logout Logging Jkieserman out of all services on: 716 hosts
  • 06:47 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jkieserman out of all services on: 1309 hosts
  • 06:46 root@cumin2002: START - Cookbook sre.idm.logout Logging Jkieserman out of all services on: 1309 hosts
  • 06:46 root@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Jmads out of all services on: 1309 hosts
  • 06:46 root@cumin2002: START - Cookbook sre.idm.logout Logging Jmads out of all services on: 1309 hosts
  • 06:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T342617)', diff saved to https://phabricator.wikimedia.org/P50222 and previous config saved to /var/cache/conftool/dbconfig/20230809-061826-ladsgroup.json
  • 06:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 06:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 01:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3314 (T342617)', diff saved to https://phabricator.wikimedia.org/P50219 and previous config saved to /var/cache/conftool/dbconfig/20230809-013145-ladsgroup.json
  • 01:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 01:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 01:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T342617)', diff saved to https://phabricator.wikimedia.org/P50218 and previous config saved to /var/cache/conftool/dbconfig/20230809-013124-ladsgroup.json
  • 01:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P50217 and previous config saved to /var/cache/conftool/dbconfig/20230809-011618-ladsgroup.json
  • 01:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P50216 and previous config saved to /var/cache/conftool/dbconfig/20230809-010112-ladsgroup.json
  • 00:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T342617)', diff saved to https://phabricator.wikimedia.org/P50215 and previous config saved to /var/cache/conftool/dbconfig/20230809-004605-ladsgroup.json
  • 00:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 00:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 00:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T342617)', diff saved to https://phabricator.wikimedia.org/P50214 and previous config saved to /var/cache/conftool/dbconfig/20230809-003817-ladsgroup.json
  • 00:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P50213 and previous config saved to /var/cache/conftool/dbconfig/20230809-002310-ladsgroup.json
  • 00:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P50212 and previous config saved to /var/cache/conftool/dbconfig/20230809-000804-ladsgroup.json

2023-08-08

  • 23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T342617)', diff saved to https://phabricator.wikimedia.org/P50211 and previous config saved to /var/cache/conftool/dbconfig/20230808-235258-ladsgroup.json
  • 22:33 urbanecm: mwmaint1002: stop persistRevisionThreadItems.php frwiki instance because of T343859 (cc T315510)
  • 22:04 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177] (wcqs): f1a6177 (duration: 00m 17s)
  • 22:03 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177] (wcqs): f1a6177
  • 21:57 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 21:46 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:46 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wcqs1003.eqiad.wmnet with OS bullseye
  • 21:22 brett: Exported varnish-modules 0.15.0-4 for bookworm-wikimedia (T342154)
  • 21:18 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage
  • 21:15 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage
  • 21:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 108
  • 21:06 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 108
  • 21:04 bking@cumin1001: conftool action : set/pooled=no; selector: name=wcqs1003.eqiad.wmnet,service=wcqs
  • 21:02 bking@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wcqs,name=eqiad
  • 21:02 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye
  • 20:58 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177] (wcqs): f1a6177 (duration: 00m 17s)
  • 20:58 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177] (wcqs): f1a6177
  • 20:57 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wcqs1002.eqiad.wmnet with OS bullseye
  • 20:52 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177] (wcqs): f1a6177 (duration: 00m 18s)
  • 20:52 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177] (wcqs): f1a6177
  • 20:43 urbanecm@deploy1002: Finished scap: Backport for Deploy to CN language wikis (T335886) (duration: 09m 08s)
  • 20:41 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: whitelist new qlever endpoints take 4 (forgot git pull) T339347 (duration: 10m 44s)
  • 20:37 urbanecm@deploy1002: ksarabia and urbanecm: Continuing with sync
  • 20:36 urbanecm@deploy1002: ksarabia and urbanecm: Backport for Deploy to CN language wikis (T335886) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:34 urbanecm@deploy1002: Started scap: Backport for Deploy to CN language wikis (T335886)
  • 20:31 urbanecm: mwmaint1002: `foreachwikiindblist 'group2 & s6' extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --current --all --touched-after=20230615000000` (T315510)
  • 20:30 urbanecm: mwmaint1002: `foreachwikiindblist 'group2 & s5' extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --current --all --touched-after=20230615000000` (T315353)
  • 20:30 ryankemper@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: whitelist new qlever endpoints take 4 (forgot git pull) T339347
  • 20:30 urbanecm: mwmaint1002: `foreachwikiindblist 'group2 & s3' extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --current --all --touched-after=20230615000000` (T315353)
  • 20:29 urbanecm: mwmaint1002: `foreachwikiindblist 'group2 & s2' extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --current --all --touched-after=20230615000000` (T315353)
  • 20:24 urbanecm@deploy1002: Finished scap: Backport for Enable wgDiscussionToolsEnablePermalinksBackend on s2/s3/s5/s6 group2 (T315353) (duration: 10m 55s)
  • 20:17 urbanecm@deploy1002: urbanecm and matmarex: Continuing with sync
  • 20:16 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@aa5f5b7]: whitelist new qlever endpoints take 3 T339347 (duration: 02m 54s)
  • 20:14 urbanecm@deploy1002: urbanecm and matmarex: Backport for Enable wgDiscussionToolsEnablePermalinksBackend on s2/s3/s5/s6 group2 (T315353) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:14 ryankemper: [WDQS] Lag caught up on `wdqs1006`; repooled -> `ryankemper@wdqs1006:~$ sudo pool`
  • 20:13 urbanecm@deploy1002: Started scap: Backport for Enable wgDiscussionToolsEnablePermalinksBackend on s2/s3/s5/s6 group2 (T315353)
  • 20:13 ryankemper@deploy1002: Started deploy [wdqs/wdqs@aa5f5b7]: whitelist new qlever endpoints take 3 T339347
  • 19:28 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wcqs[1001-1003].eqiad.wmnet with reason: T331300
  • 19:28 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wcqs[1001-1003].eqiad.wmnet with reason: T331300
  • 19:23 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 19:06 ryankemper: [WDQS] Depooled `wdqs1006` while it catches up on 7 hours of lag
  • 19:05 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@aa5f5b7]: whitelist new qlever endpoints take 2 (duration: 11m 34s)
  • 18:54 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum4001.ulsfo.wmnet with OS bullseye
  • 18:54 ryankemper@deploy1002: Started deploy [wdqs/wdqs@aa5f5b7]: whitelist new qlever endpoints take 2
  • 18:49 bking@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=wcqs,name=eqiad
  • 18:48 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: whitelist new qlever endpoints (duration: 03m 08s)
  • 18:45 ryankemper@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: whitelist new qlever endpoints
  • 18:45 ryankemper@deploy1002: deploy aborted: 0.3.124 (duration: 01m 50s)
  • 18:43 ryankemper@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 18:38 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 18:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum4001.ulsfo.wmnet with reason: host reimage
  • 18:27 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum4001.ulsfo.wmnet with reason: host reimage
  • 18:12 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum4001.ulsfo.wmnet with OS bullseye
  • 18:12 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host durum4001.ulsfo.wmnet with OS bookworm
  • 17:56 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1001.eqiad.wmnet with OS bullseye
  • 17:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum4001.ulsfo.wmnet with reason: host reimage
  • 17:52 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum4001.ulsfo.wmnet with reason: host reimage
  • 17:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T342617)', diff saved to https://phabricator.wikimedia.org/P50209 and previous config saved to /var/cache/conftool/dbconfig/20230808-175101-ladsgroup.json
  • 17:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 17:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 17:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T342617)', diff saved to https://phabricator.wikimedia.org/P50208 and previous config saved to /var/cache/conftool/dbconfig/20230808-175040-ladsgroup.json
  • 17:41 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1002.eqiad.wmnet with reason: host reimage
  • 17:38 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1001.eqiad.wmnet with reason: host reimage
  • 17:37 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1002.eqiad.wmnet with reason: host reimage
  • 17:35 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1001.eqiad.wmnet with reason: host reimage
  • 17:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P50207 and previous config saved to /var/cache/conftool/dbconfig/20230808-173534-ladsgroup.json
  • 17:31 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum4001.ulsfo.wmnet with OS bookworm
  • 17:24 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1083.eqiad.wmnet with OS bullseye
  • 17:24 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wcqs1002.eqiad.wmnet with OS bullseye
  • 17:22 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wcqs1001.eqiad.wmnet with OS bullseye
  • 17:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P50206 and previous config saved to /var/cache/conftool/dbconfig/20230808-172027-ladsgroup.json
  • 17:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T342617)', diff saved to https://phabricator.wikimedia.org/P50205 and previous config saved to /var/cache/conftool/dbconfig/20230808-170521-ladsgroup.json
  • 17:01 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1083.eqiad.wmnet with reason: host reimage
  • 16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T342617)', diff saved to https://phabricator.wikimedia.org/P50204 and previous config saved to /var/cache/conftool/dbconfig/20230808-165824-ladsgroup.json
  • 16:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 16:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T342617)', diff saved to https://phabricator.wikimedia.org/P50203 and previous config saved to /var/cache/conftool/dbconfig/20230808-165803-ladsgroup.json
  • 16:58 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1083.eqiad.wmnet with reason: host reimage
  • 16:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P50202 and previous config saved to /var/cache/conftool/dbconfig/20230808-164256-ladsgroup.json
  • 16:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P50201 and previous config saved to /var/cache/conftool/dbconfig/20230808-162750-ladsgroup.json
  • 16:13 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum6002.drmrs.wmnet with OS bookworm
  • 16:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T342617)', diff saved to https://phabricator.wikimedia.org/P50200 and previous config saved to /var/cache/conftool/dbconfig/20230808-161244-ladsgroup.json
  • 15:53 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1083.eqiad.wmnet with OS bullseye
  • 15:44 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1082.eqiad.wmnet with OS bullseye
  • 15:41 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum6002.drmrs.wmnet with reason: host reimage
  • 15:37 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum6002.drmrs.wmnet with reason: host reimage
  • 15:22 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1082.eqiad.wmnet with reason: host reimage
  • 15:19 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1082.eqiad.wmnet with reason: host reimage
  • 15:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T343718)', diff saved to https://phabricator.wikimedia.org/P50197 and previous config saved to /var/cache/conftool/dbconfig/20230808-151637-ladsgroup.json
  • 15:14 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum6002.drmrs.wmnet with OS bookworm
  • 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P50196 and previous config saved to /var/cache/conftool/dbconfig/20230808-150131-ladsgroup.json
  • 14:54 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum6001.drmrs.wmnet with OS bookworm
  • 14:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P50195 and previous config saved to /var/cache/conftool/dbconfig/20230808-144625-ladsgroup.json
  • 14:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T343718)', diff saved to https://phabricator.wikimedia.org/P50194 and previous config saved to /var/cache/conftool/dbconfig/20230808-143119-ladsgroup.json
  • 14:10 _joe_: updated conftool, requestctl on puppetmasters to 2.3.1 to fix bugs with requestctl log
  • 14:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T343718)', diff saved to https://phabricator.wikimedia.org/P50192 and previous config saved to /var/cache/conftool/dbconfig/20230808-140331-ladsgroup.json
  • 14:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 14:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 14:03 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1082.eqiad.wmnet with OS bullseye
  • 13:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T343718)', diff saved to https://phabricator.wikimedia.org/P50190 and previous config saved to /var/cache/conftool/dbconfig/20230808-135847-ladsgroup.json
  • 13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T343718)', diff saved to https://phabricator.wikimedia.org/P50189 and previous config saved to /var/cache/conftool/dbconfig/20230808-135636-ladsgroup.json
  • 13:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 13:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 13:47 ladsgroup@deploy1002: Finished scap: Backport for Stop writing to old columns of externallinks in ruwikinews (T342683) (duration: 10m 00s)
  • 13:46 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
  • 13:43 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
  • 13:41 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 13:39 ladsgroup@deploy1002: ladsgroup: Backport for Stop writing to old columns of externallinks in ruwikinews (T342683) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:37 ladsgroup@deploy1002: Started scap: Backport for Stop writing to old columns of externallinks in ruwikinews (T342683)
  • 13:36 taavi@deploy1002: Finished scap: Backport for newiki: Fix templateeditor config (T343257) (duration: 09m 49s)
  • 13:36 volans: set platform to null on all devices and VMs in Netbox - T336623
  • 13:29 taavi@deploy1002: taavi and stang: Continuing with sync
  • 13:27 taavi@deploy1002: taavi and stang: Backport for newiki: Fix templateeditor config (T343257) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:26 taavi@deploy1002: Started scap: Backport for newiki: Fix templateeditor config (T343257)
  • 13:21 sukhe: reprepro -C main include bookworm-wikimedia gdnsd_3.99.0~alpha2-2_amd64.changes: T342154
  • 13:19 taavi@deploy1002: Finished scap: Backport for Update piwiki legacy vector logo (T305950), Update idwiktionary old vector logo (T341175) (duration: 10m 48s)
  • 13:18 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 13:18 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 13:18 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum6001.drmrs.wmnet with OS bookworm
  • 13:17 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host durum6001.drmrs.wmnet with OS bookworm
  • 13:12 taavi@deploy1002: anzx and taavi: Continuing with sync
  • 13:09 taavi@deploy1002: anzx and taavi: Backport for Update piwiki legacy vector logo (T305950), Update idwiktionary old vector logo (T341175) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:08 taavi@deploy1002: Started scap: Backport for Update piwiki legacy vector logo (T305950), Update idwiktionary old vector logo (T341175)
  • 13:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum6001.drmrs.wmnet with OS bookworm
  • 13:02 sukhe: reprepro -C main include bookworm-wikimedia anycast-healthchecker_0.9.1-1+wmf12u1_amd64.changes: T342154
  • 12:57 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7] (wcqs): 0.3.124 (duration: 00m 46s)
  • 12:57 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7] (wcqs): 0.3.124
  • 12:40 samtar@deploy1002: Finished scap: Backport for IS: Ensure edit recovery is disabled (T342858) (duration: 08m 18s)
  • 12:34 samtar@deploy1002: samtar: Continuing with sync
  • 12:34 samtar@deploy1002: samtar: Backport for IS: Ensure edit recovery is disabled (T342858) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 12:32 samtar@deploy1002: Started scap: Backport for IS: Ensure edit recovery is disabled (T342858)
  • 12:28 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:28 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 12:26 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:25 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:25 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 12:24 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 10:36 claime: deploying mw-on-k8s - https://gerrit.wikimedia.org/r/945798
  • 10:21 taavi: update T343294 mitigations
  • 10:00 volans: restart ferm on mirror1001 to pick new IP address for debian syncproxy2
  • 09:52 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 09:52 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 09:44 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 09:43 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 09:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T342617)', diff saved to https://phabricator.wikimedia.org/P50188 and previous config saved to /var/cache/conftool/dbconfig/20230808-093835-ladsgroup.json
  • 09:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 09:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 09:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T342617)', diff saved to https://phabricator.wikimedia.org/P50187 and previous config saved to /var/cache/conftool/dbconfig/20230808-093814-ladsgroup.json
  • 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P50186 and previous config saved to /var/cache/conftool/dbconfig/20230808-092308-ladsgroup.json
  • 09:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2119 (T342617)', diff saved to https://phabricator.wikimedia.org/P50185 and previous config saved to /var/cache/conftool/dbconfig/20230808-091119-ladsgroup.json
  • 09:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 09:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 09:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T342617)', diff saved to https://phabricator.wikimedia.org/P50184 and previous config saved to /var/cache/conftool/dbconfig/20230808-091058-ladsgroup.json
  • 09:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P50183 and previous config saved to /var/cache/conftool/dbconfig/20230808-090801-ladsgroup.json
  • 08:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P50182 and previous config saved to /var/cache/conftool/dbconfig/20230808-085551-ladsgroup.json
  • 08:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T342617)', diff saved to https://phabricator.wikimedia.org/P50181 and previous config saved to /var/cache/conftool/dbconfig/20230808-085255-ladsgroup.json
  • 08:45 jynus: restart debmonitor2003 services
  • 08:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P50180 and previous config saved to /var/cache/conftool/dbconfig/20230808-084045-ladsgroup.json
  • 08:33 elukey: powercycle ml-serve2004 - mgmt console without tty available, DIMM errors in getsel
  • 08:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T342617)', diff saved to https://phabricator.wikimedia.org/P50179 and previous config saved to /var/cache/conftool/dbconfig/20230808-082539-ladsgroup.json
  • 07:07 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:07 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:07 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 07:07 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 07:06 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:06 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 02:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T342617)', diff saved to https://phabricator.wikimedia.org/P50178 and previous config saved to /var/cache/conftool/dbconfig/20230808-022547-ladsgroup.json
  • 02:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 02:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 02:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T342617)', diff saved to https://phabricator.wikimedia.org/P50177 and previous config saved to /var/cache/conftool/dbconfig/20230808-022526-ladsgroup.json
  • 02:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P50176 and previous config saved to /var/cache/conftool/dbconfig/20230808-021020-ladsgroup.json
  • 01:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P50175 and previous config saved to /var/cache/conftool/dbconfig/20230808-015513-ladsgroup.json
  • 01:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T342617)', diff saved to https://phabricator.wikimedia.org/P50174 and previous config saved to /var/cache/conftool/dbconfig/20230808-014007-ladsgroup.json
  • 00:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2110 (T342617)', diff saved to https://phabricator.wikimedia.org/P50173 and previous config saved to /var/cache/conftool/dbconfig/20230808-005439-ladsgroup.json
  • 00:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 00:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 00:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T342617)', diff saved to https://phabricator.wikimedia.org/P50172 and previous config saved to /var/cache/conftool/dbconfig/20230808-005418-ladsgroup.json
  • 00:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P50171 and previous config saved to /var/cache/conftool/dbconfig/20230808-003911-ladsgroup.json
  • 00:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P50170 and previous config saved to /var/cache/conftool/dbconfig/20230808-002405-ladsgroup.json
  • 00:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T342617)', diff saved to https://phabricator.wikimedia.org/P50169 and previous config saved to /var/cache/conftool/dbconfig/20230808-000859-ladsgroup.json

2023-08-07

  • 23:28 krinkle@deploy1002: Finished scap: Backport for api: Fix broken /api/index.html rendering (T113114) (duration: 09m 00s)
  • 23:23 krinkle@deploy1002: krinkle: Continuing with sync
  • 23:21 krinkle@deploy1002: krinkle: Backport for api: Fix broken /api/index.html rendering (T113114) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 23:19 krinkle@deploy1002: Started scap: Backport for api: Fix broken /api/index.html rendering (T113114)
  • 22:56 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-jumbo1015.eqiad.wmnet
  • 22:49 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-jumbo1015.eqiad.wmnet
  • 22:49 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-jumbo1014.eqiad.wmnet
  • 22:43 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-jumbo1014.eqiad.wmnet
  • 22:43 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-jumbo1013.eqiad.wmnet
  • 22:38 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-jumbo1013.eqiad.wmnet
  • 22:38 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-jumbo1012.eqiad.wmnet
  • 22:30 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-jumbo1012.eqiad.wmnet
  • 22:30 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-jumbo1011.eqiad.wmnet
  • 22:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-jumbo1011.eqiad.wmnet
  • 22:24 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-jumbo1010.eqiad.wmnet
  • 22:17 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-jumbo1010.eqiad.wmnet
  • 22:04 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=no; selector: name=wcqs2003.codfw.wmnet
  • 21:50 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 21:43 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1081.eqiad.wmnet with OS bullseye
  • 21:20 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1081.eqiad.wmnet with reason: host reimage
  • 21:17 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1081.eqiad.wmnet with reason: host reimage
  • 21:05 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:03 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1081.eqiad.wmnet with OS bullseye
  • 21:03 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wcqs2003.codfw.wmnet with OS bullseye
  • 21:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1080.eqiad.wmnet with OS bullseye
  • 20:53 urbanecm@deploy1002: Finished scap: Backport for unset orwikisource logo and resize pawikisource logo (T341255) (duration: 08m 09s)
  • 20:47 urbanecm@deploy1002: jdlrobson and urbanecm: Continuing with sync
  • 20:46 urbanecm@deploy1002: jdlrobson and urbanecm: Backport for unset orwikisource logo and resize pawikisource logo (T341255) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:45 urbanecm@deploy1002: Started scap: Backport for unset orwikisource logo and resize pawikisource logo (T341255)
  • 20:41 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1080.eqiad.wmnet with reason: host reimage
  • 20:38 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1080.eqiad.wmnet with reason: host reimage
  • 20:24 urbanecm: mwscript extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --wiki=enwiki --current --all --start '["18618299"]' # T315510
  • 20:24 urbanecm@deploy1002: Finished scap: Backport for ThreadItemStore: Ignore duplicates caused by duplicate executions (T323080 T341811), Update wikisource wordmarks and taglines (T341255), update idwiktionary legacy vector logo (T341175) (duration: 10m 22s)
  • 20:21 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1080.eqiad.wmnet with OS bullseye
  • 20:19 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs2003.codfw.wmnet with reason: host reimage
  • 20:18 urbanecm@deploy1002: urbanecm and jdlrobson and anzx and matmarex: Continuing with sync
  • 20:16 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs2003.codfw.wmnet with reason: host reimage
  • 20:15 urbanecm@deploy1002: urbanecm and jdlrobson and anzx and matmarex: Backport for ThreadItemStore: Ignore duplicates caused by duplicate executions (T323080 T341811), Update wikisource wordmarks and taglines (T341255), update idwiktionary legacy vector logo (T341175) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet,
  • 20:14 urbanecm@deploy1002: Started scap: Backport for ThreadItemStore: Ignore duplicates caused by duplicate executions (T323080 T341811), Update wikisource wordmarks and taglines (T341255), update idwiktionary legacy vector logo (T341175)
  • 20:13 urbanecm@deploy1002: Finished scap: Backport for Fix finnish projects, remove unused SVG/PNGs, resize wikiversity (T343278), Wikivoyage logos should always be on a single line (T343279) (duration: 11m 18s)
  • 20:08 urbanecm@deploy1002: jdlrobson and urbanecm: Continuing with sync
  • 20:04 urbanecm@deploy1002: jdlrobson and urbanecm: Backport for Fix finnish projects, remove unused SVG/PNGs, resize wikiversity (T343278), Wikivoyage logos should always be on a single line (T343279) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimen
  • 20:02 urbanecm@deploy1002: Started scap: Backport for Fix finnish projects, remove unused SVG/PNGs, resize wikiversity (T343278), Wikivoyage logos should always be on a single line (T343279)
  • 20:00 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wcqs2003.codfw.wmnet with OS bullseye
  • 19:18 cstone: payments-wiki upgraded from 32fe72a9 to 5b250aed
  • 19:15 jgreen@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:15 jgreen@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove frbast2001.frack.codfw.wmnet from DNS for decommissioning - jgreen@cumin1001"
  • 19:14 jgreen@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove frbast2001.frack.codfw.wmnet from DNS for decommissioning - jgreen@cumin1001"
  • 19:12 jgreen@cumin1001: START - Cookbook sre.dns.netbox
  • 19:12 jgreen@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:12 jgreen@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove frbast1001.frack.eqiad.wmnet from DNS for decommissioning - jgreen@cumin1001"
  • 19:11 jgreen@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove frbast1001.frack.eqiad.wmnet from DNS for decommissioning - jgreen@cumin1001"
  • 19:09 jgreen@cumin1001: START - Cookbook sre.dns.netbox
  • 18:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T342617)', diff saved to https://phabricator.wikimedia.org/P50168 and previous config saved to /var/cache/conftool/dbconfig/20230807-185732-ladsgroup.json
  • 18:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 18:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 18:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T342617)', diff saved to https://phabricator.wikimedia.org/P50167 and previous config saved to /var/cache/conftool/dbconfig/20230807-185710-ladsgroup.json
  • 18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P50166 and previous config saved to /var/cache/conftool/dbconfig/20230807-184204-ladsgroup.json
  • 18:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P50165 and previous config saved to /var/cache/conftool/dbconfig/20230807-182657-ladsgroup.json
  • 18:21 krinkle@deploy1002: Finished scap: Backport for mc: Remove mcrouter-with-onhost-tier from ParserCache (T264604) (duration: 09m 07s)
  • 18:16 krinkle@deploy1002: krinkle: Continuing with sync
  • 18:14 krinkle@deploy1002: krinkle: Backport for mc: Remove mcrouter-with-onhost-tier from ParserCache (T264604) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 18:12 krinkle@deploy1002: Started scap: Backport for mc: Remove mcrouter-with-onhost-tier from ParserCache (T264604)
  • 18:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T342617)', diff saved to https://phabricator.wikimedia.org/P50164 and previous config saved to /var/cache/conftool/dbconfig/20230807-181151-ladsgroup.json
  • 17:59 jgreen@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:59 jgreen@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove frmon2001.frack.codfw.wmnet from DNS for decommissioning - jgreen@cumin1001"
  • 17:58 jgreen@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove frmon2001.frack.codfw.wmnet from DNS for decommissioning - jgreen@cumin1001"
  • 17:56 jgreen@cumin1001: START - Cookbook sre.dns.netbox
  • 17:55 jgreen@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 17:54 jgreen@cumin1001: START - Cookbook sre.dns.netbox
  • 17:47 jgreen@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:46 jgreen@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove frmon1001.frack.eqiad.wmnet from DNS for decommissioning - jgreen@cumin1001"
  • 17:46 jgreen@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove frmon1001.frack.eqiad.wmnet from DNS for decommissioning - jgreen@cumin1001"
  • 17:42 jgreen@cumin1001: START - Cookbook sre.dns.netbox
  • 17:36 jgreen@cumin1001: START - Cookbook sre.dns.netbox
  • 17:34 jgreen@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:34 jgreen@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove frdev1001 from DNS for decommissioning - jgreen@cumin1001"
  • 17:33 jgreen@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove frdev1001 from DNS for decommissioning - jgreen@cumin1001"
  • 17:31 jgreen@cumin1001: START - Cookbook sre.dns.netbox
  • 17:22 jgreen@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:22 jgreen@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: civi1001.frack.eqiad.wmnet - jgreen@cumin1001"
  • 17:22 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1079.eqiad.wmnet with OS bullseye
  • 17:22 jgreen@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: civi1001.frack.eqiad.wmnet - jgreen@cumin1001"
  • 17:19 jgreen@cumin1001: START - Cookbook sre.dns.netbox
  • 17:02 inflatador: bking@puppetmaster1001 removing unused(?) puppet cert search.svc.eqiad.wmnet T343319
  • 16:59 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1079.eqiad.wmnet with reason: host reimage
  • 16:56 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1079.eqiad.wmnet with reason: host reimage
  • 16:47 inflatador: bking@puppetmaster1001 removing unused(?) puppet cert search.svc.codfw.wmnet T343319
  • 16:40 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1079.eqiad.wmnet with OS bullseye
  • 16:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2106 (T342617)', diff saved to https://phabricator.wikimedia.org/P50163 and previous config saved to /var/cache/conftool/dbconfig/20230807-163421-ladsgroup.json
  • 16:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 16:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 16:22 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1078.eqiad.wmnet with OS bullseye
  • 16:18 jforrester@deploy1002: Finished scap: Backport for Wikifunctions: Allow logged-in users to edit object labels, aliases, and descriptions (T343400) (duration: 07m 11s)
  • 16:13 jforrester@deploy1002: jforrester: Continuing with sync
  • 16:13 jforrester@deploy1002: jforrester: Backport for Wikifunctions: Allow logged-in users to edit object labels, aliases, and descriptions (T343400) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 16:11 jforrester@deploy1002: Started scap: Backport for Wikifunctions: Allow logged-in users to edit object labels, aliases, and descriptions (T343400)
  • 15:58 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1078.eqiad.wmnet with reason: host reimage
  • 15:55 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1078.eqiad.wmnet with reason: host reimage
  • 15:53 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1078.eqiad.wmnet with OS bullseye
  • 15:50 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1078.eqiad.wmnet with OS bullseye
  • 15:42 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:41 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 15:35 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 15:35 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1078.eqiad.wmnet with OS bullseye
  • 15:35 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 15:35 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1078.eqiad.wmnet with OS bullseye
  • 15:34 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:34 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:36 zabe@deploy1002: Finished scap: T343294 (duration: 07m 13s)
  • 14:29 zabe@deploy1002: Started scap: T343294
  • 14:14 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1078.eqiad.wmnet with OS bullseye
  • 14:14 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host an-worker1078.eqiad.wmnet
  • 14:10 btullis@cumin1001: START - Cookbook sre.hosts.dhcp for host an-worker1078.eqiad.wmnet
  • 14:08 btullis@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1078.eqiad.wmnet']
  • 14:08 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1078.eqiad.wmnet']
  • 14:07 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1078.eqiad.wmnet with OS bullseye
  • 14:02 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-ctrl1002.eqiad.wmnet
  • 13:59 elukey@deploy1002: Finished scap: Backport for ext-ORES: revert all wikis to use ORES instead of Lift Wing (T343308) (duration: 06m 49s)
  • 13:58 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1078.eqiad.wmnet with OS bullseye
  • 13:56 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host dse-k8s-ctrl1002.eqiad.wmnet
  • 13:53 elukey@deploy1002: elukey: Continuing with sync
  • 13:53 elukey@deploy1002: elukey: Backport for ext-ORES: revert all wikis to use ORES instead of Lift Wing (T343308) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:52 elukey@deploy1002: Started scap: Backport for ext-ORES: revert all wikis to use ORES instead of Lift Wing (T343308)
  • 13:51 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php idwiktionary --fix --add-prefix=BROKEN # T341175
  • 13:51 urbanecm@deploy1002: Finished scap: Backport for idwiktionary change wgSiteName, wgMetaNamespace and add project namespace alias (T341175) (duration: 09m 12s)
  • 13:45 urbanecm@deploy1002: urbanecm and anzx: Continuing with sync
  • 13:43 urbanecm@deploy1002: urbanecm and anzx: Backport for idwiktionary change wgSiteName, wgMetaNamespace and add project namespace alias (T341175) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:41 urbanecm@deploy1002: Started scap: Backport for idwiktionary change wgSiteName, wgMetaNamespace and add project namespace alias (T341175)
  • 13:26 urbanecm@deploy1002: Finished scap: Backport for Revert "enwiki: temp enable emergencyCaptcha" (duration: 06m 59s)
  • 13:19 urbanecm@deploy1002: Started scap: Backport for Revert "enwiki: temp enable emergencyCaptcha"
  • 13:19 urbanecm@deploy1002: Finished scap: Backport for Update knwiktionary logos (T343662), Write new for event table migration on all wikis (T330158), zhwiki: Grant "suppressredirect"to autoreviewer (T343711) (duration: 13m 54s)
  • 13:13 urbanecm@deploy1002: anzx and dreamyjazz and stang and urbanecm: Continuing with sync
  • 13:06 urbanecm@deploy1002: anzx and dreamyjazz and stang and urbanecm: Backport for Update knwiktionary logos (T343662), Write new for event table migration on all wikis (T330158), zhwiki: Grant "suppressredirect"to autoreviewer (T343711) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-d
  • 13:05 urbanecm@deploy1002: Started scap: Backport for Update knwiktionary logos (T343662), Write new for event table migration on all wikis (T330158), zhwiki: Grant "suppressredirect"to autoreviewer (T343711)
  • 12:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 12:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 12:17 dcausse: repooling wdqs1004
  • 11:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 11:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 10:53 ladsgroup@deploy1002: Finished scap: Backport for Stop writing to the old externallinks columns in testwiki (T342683) (duration: 08m 06s)
  • 10:48 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 10:47 ladsgroup@deploy1002: ladsgroup: Backport for Stop writing to the old externallinks columns in testwiki (T342683) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 10:45 ladsgroup@deploy1002: Started scap: Backport for Stop writing to the old externallinks columns in testwiki (T342683)
  • 10:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 10:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 10:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 10:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 10:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 10:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 10:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 10:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 10:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 10:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 10:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 10:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1138 (T342617)', diff saved to https://phabricator.wikimedia.org/P50158 and previous config saved to /var/cache/conftool/dbconfig/20230807-100805-ladsgroup.json
  • 10:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 10:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 10:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 10:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 09:23 dcausse: restarting blazegraph on wdqs1004
  • 08:31 elukey@deploy1002: Finished scap: Backport for ext-ORES: force cswiki to use the ORES settings/backend (T343308) (duration: 14m 50s)
  • 08:25 elukey@deploy1002: elukey: Continuing with sync
  • 08:24 elukey@deploy1002: elukey: Backport for ext-ORES: force cswiki to use the ORES settings/backend (T343308) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 100%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50157 and previous config saved to /var/cache/conftool/dbconfig/20230807-081639-root.json
  • 08:16 elukey@deploy1002: Started scap: Backport for ext-ORES: force cswiki to use the ORES settings/backend (T343308)
  • 08:08 godog: start docker-image-prune-old on alert hosts - T329939
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 75%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50156 and previous config saved to /var/cache/conftool/dbconfig/20230807-080133-root.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 50%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50155 and previous config saved to /var/cache/conftool/dbconfig/20230807-074628-root.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 25%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50154 and previous config saved to /var/cache/conftool/dbconfig/20230807-073123-root.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 10%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50153 and previous config saved to /var/cache/conftool/dbconfig/20230807-071618-root.json
  • 07:11 marostegui: Depool clouddb1015 T334650
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 5%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50152 and previous config saved to /var/cache/conftool/dbconfig/20230807-070113-root.json
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 3%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50151 and previous config saved to /var/cache/conftool/dbconfig/20230807-064608-root.json
  • 06:33 kart_: Updated cxserver to 2023-08-03-132800-production (T338602, T333969, T343211)
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 1%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50150 and previous config saved to /var/cache/conftool/dbconfig/20230807-063104-root.json
  • 06:28 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 06:28 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 06:26 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 06:25 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 06:22 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 06:22 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1224 upgrade to mariadb 10.6', diff saved to https://phabricator.wikimedia.org/P50149 and previous config saved to /var/cache/conftool/dbconfig/20230807-061653-root.json
  • 06:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Update wheels for Aerleon 1.6.0 upgrade - ayounsi@cumin1001
  • 06:09 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Update wheels for Aerleon 1.6.0 upgrade - ayounsi@cumin1001

2023-08-05

  • 05:57 _joe_: mounting the volume under /srv/dataimport on both puppetmaster frontends
  • 05:53 _joe_: creating logical volume "dataimport" on the puppetmaster frontends
  • 02:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 02:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 01:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 01:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 01:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T342617)', diff saved to https://phabricator.wikimedia.org/P50148 and previous config saved to /var/cache/conftool/dbconfig/20230805-013831-ladsgroup.json
  • 01:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P50147 and previous config saved to /var/cache/conftool/dbconfig/20230805-012325-ladsgroup.json
  • 01:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P50146 and previous config saved to /var/cache/conftool/dbconfig/20230805-010819-ladsgroup.json
  • 00:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T342617)', diff saved to https://phabricator.wikimedia.org/P50145 and previous config saved to /var/cache/conftool/dbconfig/20230805-005312-ladsgroup.json
  • 00:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T342617)', diff saved to https://phabricator.wikimedia.org/P50144 and previous config saved to /var/cache/conftool/dbconfig/20230805-003155-ladsgroup.json
  • 00:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P50143 and previous config saved to /var/cache/conftool/dbconfig/20230805-001649-ladsgroup.json
  • 00:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P50142 and previous config saved to /var/cache/conftool/dbconfig/20230805-000143-ladsgroup.json

2023-08-04

  • 23:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T342617)', diff saved to https://phabricator.wikimedia.org/P50141 and previous config saved to /var/cache/conftool/dbconfig/20230804-234637-ladsgroup.json
  • 23:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1222 (T342617)', diff saved to https://phabricator.wikimedia.org/P50140 and previous config saved to /var/cache/conftool/dbconfig/20230804-234121-ladsgroup.json
  • 23:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 23:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 23:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T342617)', diff saved to https://phabricator.wikimedia.org/P50139 and previous config saved to /var/cache/conftool/dbconfig/20230804-234101-ladsgroup.json
  • 23:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P50138 and previous config saved to /var/cache/conftool/dbconfig/20230804-232555-ladsgroup.json
  • 23:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P50137 and previous config saved to /var/cache/conftool/dbconfig/20230804-231048-ladsgroup.json
  • 23:00 tzatziki: removing 1 file for legal compliance
  • 22:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T342617)', diff saved to https://phabricator.wikimedia.org/P50136 and previous config saved to /var/cache/conftool/dbconfig/20230804-225542-ladsgroup.json
  • 22:33 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7] (wcqs): 0.3.124 (duration: 00m 54s)
  • 22:32 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7] (wcqs): 0.3.124
  • 22:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T342617)', diff saved to https://phabricator.wikimedia.org/P50135 and previous config saved to /var/cache/conftool/dbconfig/20230804-222905-ladsgroup.json
  • 22:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 22:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 22:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T342617)', diff saved to https://phabricator.wikimedia.org/P50134 and previous config saved to /var/cache/conftool/dbconfig/20230804-222845-ladsgroup.json
  • 22:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1197 (T342617)', diff saved to https://phabricator.wikimedia.org/P50133 and previous config saved to /var/cache/conftool/dbconfig/20230804-221915-ladsgroup.json
  • 22:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 22:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 22:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T342617)', diff saved to https://phabricator.wikimedia.org/P50132 and previous config saved to /var/cache/conftool/dbconfig/20230804-221855-ladsgroup.json
  • 22:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P50131 and previous config saved to /var/cache/conftool/dbconfig/20230804-221338-ladsgroup.json
  • 22:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P50130 and previous config saved to /var/cache/conftool/dbconfig/20230804-220348-ladsgroup.json
  • 21:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P50129 and previous config saved to /var/cache/conftool/dbconfig/20230804-215832-ladsgroup.json
  • 21:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P50128 and previous config saved to /var/cache/conftool/dbconfig/20230804-214842-ladsgroup.json
  • 21:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T342617)', diff saved to https://phabricator.wikimedia.org/P50127 and previous config saved to /var/cache/conftool/dbconfig/20230804-214326-ladsgroup.json
  • 21:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T342617)', diff saved to https://phabricator.wikimedia.org/P50126 and previous config saved to /var/cache/conftool/dbconfig/20230804-213336-ladsgroup.json
  • 21:20 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7] (wcqs): 0.3.124 (duration: 00m 44s)
  • 21:19 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7] (wcqs): 0.3.124
  • 21:16 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7] (wcqs): 0.3.124 (duration: 00m 09s)
  • 21:16 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7] (wcqs): 0.3.124
  • 21:16 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7] (wcqs): 0.3.124 (duration: 00m 15s)
  • 21:15 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7] (wcqs): 0.3.124
  • 20:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1188 (T342617)', diff saved to https://phabricator.wikimedia.org/P50125 and previous config saved to /var/cache/conftool/dbconfig/20230804-205647-ladsgroup.json
  • 20:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 20:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 20:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T342617)', diff saved to https://phabricator.wikimedia.org/P50124 and previous config saved to /var/cache/conftool/dbconfig/20230804-205626-ladsgroup.json
  • 20:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P50123 and previous config saved to /var/cache/conftool/dbconfig/20230804-204120-ladsgroup.json
  • 20:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T342617)', diff saved to https://phabricator.wikimedia.org/P50122 and previous config saved to /var/cache/conftool/dbconfig/20230804-203351-ladsgroup.json
  • 20:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 20:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 20:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T342617)', diff saved to https://phabricator.wikimedia.org/P50121 and previous config saved to /var/cache/conftool/dbconfig/20230804-203330-ladsgroup.json
  • 20:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P50120 and previous config saved to /var/cache/conftool/dbconfig/20230804-202613-ladsgroup.json
  • 20:21 brett: imported libvmod-querysort package in bookworm-wikimedia (T342154)
  • 20:18 jforrester@deploy1002: Finished scap: Backport for ApiFunctionCall: Check calls for Z16K2 and deny those too (duration: 34m 04s)
  • 20:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P50119 and previous config saved to /var/cache/conftool/dbconfig/20230804-201824-ladsgroup.json
  • 20:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T342617)', diff saved to https://phabricator.wikimedia.org/P50118 and previous config saved to /var/cache/conftool/dbconfig/20230804-201107-ladsgroup.json
  • 20:08 jforrester@deploy1002: jforrester: Continuing with sync
  • 20:04 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on wcqs2002.codfw.wmnet with reason: T323921
  • 20:04 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on wcqs2002.codfw.wmnet with reason: T323921
  • 20:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P50116 and previous config saved to /var/cache/conftool/dbconfig/20230804-200317-ladsgroup.json
  • 20:02 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 19:58 jforrester@deploy1002: jforrester: Backport for ApiFunctionCall: Check calls for Z16K2 and deny those too synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 19:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T342617)', diff saved to https://phabricator.wikimedia.org/P50115 and previous config saved to /var/cache/conftool/dbconfig/20230804-194811-ladsgroup.json
  • 19:44 jforrester@deploy1002: Started scap: Backport for ApiFunctionCall: Check calls for Z16K2 and deny those too
  • 19:17 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:12 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on wcqs2002.codfw.wmnet with reason: T323921
  • 19:12 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on wcqs2002.codfw.wmnet with reason: T323921
  • 19:11 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wcqs2002.codfw.wmnet with OS bullseye
  • 19:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T342617)', diff saved to https://phabricator.wikimedia.org/P50114 and previous config saved to /var/cache/conftool/dbconfig/20230804-190152-ladsgroup.json
  • 19:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 19:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 19:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T342617)', diff saved to https://phabricator.wikimedia.org/P50113 and previous config saved to /var/cache/conftool/dbconfig/20230804-190131-ladsgroup.json
  • 18:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P50112 and previous config saved to /var/cache/conftool/dbconfig/20230804-184625-ladsgroup.json
  • 18:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T342617)', diff saved to https://phabricator.wikimedia.org/P50111 and previous config saved to /var/cache/conftool/dbconfig/20230804-183927-ladsgroup.json
  • 18:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 18:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 18:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T342617)', diff saved to https://phabricator.wikimedia.org/P50110 and previous config saved to /var/cache/conftool/dbconfig/20230804-183906-ladsgroup.json
  • 18:34 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs2002.codfw.wmnet with reason: host reimage
  • 18:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P50109 and previous config saved to /var/cache/conftool/dbconfig/20230804-183118-ladsgroup.json
  • 18:31 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs2002.codfw.wmnet with reason: host reimage
  • 18:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P50108 and previous config saved to /var/cache/conftool/dbconfig/20230804-182400-ladsgroup.json
  • 18:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T342617)', diff saved to https://phabricator.wikimedia.org/P50107 and previous config saved to /var/cache/conftool/dbconfig/20230804-181612-ladsgroup.json
  • 18:15 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wcqs2002.codfw.wmnet with OS bullseye
  • 18:14 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on wcqs2001.codfw.wmnet with reason: T323921
  • 18:13 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on wcqs2001.codfw.wmnet with reason: T323921
  • 18:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P50106 and previous config saved to /var/cache/conftool/dbconfig/20230804-180854-ladsgroup.json
  • 18:08 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T342617)', diff saved to https://phabricator.wikimedia.org/P50105 and previous config saved to /var/cache/conftool/dbconfig/20230804-175348-ladsgroup.json
  • 17:27 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 17:24 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wcqs2001.codfw.wmnet with OS bullseye
  • 16:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T342617)', diff saved to https://phabricator.wikimedia.org/P50104 and previous config saved to /var/cache/conftool/dbconfig/20230804-165753-ladsgroup.json
  • 16:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 16:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 16:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T342617)', diff saved to https://phabricator.wikimedia.org/P50103 and previous config saved to /var/cache/conftool/dbconfig/20230804-165731-ladsgroup.json
  • 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3312 (T342617)', diff saved to https://phabricator.wikimedia.org/P50102 and previous config saved to /var/cache/conftool/dbconfig/20230804-164356-ladsgroup.json
  • 16:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 16:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T342617)', diff saved to https://phabricator.wikimedia.org/P50101 and previous config saved to /var/cache/conftool/dbconfig/20230804-164335-ladsgroup.json
  • 16:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P50100 and previous config saved to /var/cache/conftool/dbconfig/20230804-164225-ladsgroup.json
  • 16:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P50099 and previous config saved to /var/cache/conftool/dbconfig/20230804-162829-ladsgroup.json
  • 16:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P50098 and previous config saved to /var/cache/conftool/dbconfig/20230804-162719-ladsgroup.json
  • 16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P50097 and previous config saved to /var/cache/conftool/dbconfig/20230804-161322-ladsgroup.json
  • 16:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T342617)', diff saved to https://phabricator.wikimedia.org/P50096 and previous config saved to /var/cache/conftool/dbconfig/20230804-161212-ladsgroup.json
  • 15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T342617)', diff saved to https://phabricator.wikimedia.org/P50095 and previous config saved to /var/cache/conftool/dbconfig/20230804-155816-ladsgroup.json
  • 15:18 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs2001.codfw.wmnet with reason: host reimage
  • 15:16 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs2001.codfw.wmnet with reason: host reimage
  • 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T342617)', diff saved to https://phabricator.wikimedia.org/P50094 and previous config saved to /var/cache/conftool/dbconfig/20230804-151435-ladsgroup.json
  • 15:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 15:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 15:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 15:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T342617)', diff saved to https://phabricator.wikimedia.org/P50093 and previous config saved to /var/cache/conftool/dbconfig/20230804-151409-ladsgroup.json
  • 15:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T342617)', diff saved to https://phabricator.wikimedia.org/P50092 and previous config saved to /var/cache/conftool/dbconfig/20230804-150310-ladsgroup.json
  • 15:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 15:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 15:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T342617)', diff saved to https://phabricator.wikimedia.org/P50091 and previous config saved to /var/cache/conftool/dbconfig/20230804-150232-ladsgroup.json
  • 15:00 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wcqs2001.codfw.wmnet with OS bullseye
  • 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P50090 and previous config saved to /var/cache/conftool/dbconfig/20230804-145903-ladsgroup.json
  • 14:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2195.codfw.wmnet with OS bullseye
  • 14:54 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P50089 and previous config saved to /var/cache/conftool/dbconfig/20230804-144726-ladsgroup.json
  • 14:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P50088 and previous config saved to /var/cache/conftool/dbconfig/20230804-144357-ladsgroup.json
  • 14:40 sbassett: Deployed updated mitigation for T336027
  • 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P50087 and previous config saved to /var/cache/conftool/dbconfig/20230804-143219-ladsgroup.json
  • 14:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2190.codfw.wmnet with OS bullseye
  • 14:31 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2195.codfw.wmnet with reason: host reimage
  • 14:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T342617)', diff saved to https://phabricator.wikimedia.org/P50086 and previous config saved to /var/cache/conftool/dbconfig/20230804-142851-ladsgroup.json
  • 14:27 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 14:27 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 14:26 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast4005.wikimedia.org
  • 14:25 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync Hiera after adding bast4005 - jmm@cumin2002"
  • 14:25 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2195.codfw.wmnet with reason: host reimage
  • 14:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2193.codfw.wmnet with OS bullseye
  • 14:25 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:23 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:23 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync Hiera after adding bast4005 - jmm@cumin2002"
  • 14:22 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast4005.wikimedia.org
  • 14:20 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 14:20 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 14:18 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 14:17 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T342617)', diff saved to https://phabricator.wikimedia.org/P50085 and previous config saved to /var/cache/conftool/dbconfig/20230804-141713-ladsgroup.json
  • 14:17 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast4005.wikimedia.org
  • 14:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast4005.wikimedia.org with OS bookworm
  • 14:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2190.codfw.wmnet with reason: host reimage
  • 14:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2193.codfw.wmnet with reason: host reimage
  • 14:08 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2190.codfw.wmnet with reason: host reimage
  • 14:07 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 14:07 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 14:05 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2193.codfw.wmnet with reason: host reimage
  • 14:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2195.codfw.wmnet with OS bullseye
  • 14:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast4005.wikimedia.org with reason: host reimage
  • 14:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2195']
  • 13:57 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast4005.wikimedia.org with reason: host reimage
  • 13:50 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2195']
  • 13:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2195.mgmt.codfw.wmnet with reboot policy FORCED
  • 13:48 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2190.codfw.wmnet with OS bullseye
  • 13:46 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2190.codfw.wmnet with OS bullseye
  • 13:46 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2190.codfw.wmnet with OS bullseye
  • 13:45 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2193.codfw.wmnet with OS bullseye
  • 13:39 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast4005.wikimedia.org with OS bookworm
  • 13:30 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2195.mgmt.codfw.wmnet with reboot policy FORCED
  • 13:13 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast4005.wikimedia.org - jmm@cumin2002"
  • 13:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast4005.wikimedia.org - jmm@cumin2002"
  • 13:12 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) bast4005.wikimedia.org on all recursors
  • 13:12 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast4005.wikimedia.org on all recursors
  • 13:12 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:12 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast4005.wikimedia.org - jmm@cumin2002"
  • 13:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast4005.wikimedia.org - jmm@cumin2002"
  • 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T342617)', diff saved to https://phabricator.wikimedia.org/P50084 and previous config saved to /var/cache/conftool/dbconfig/20230804-130622-ladsgroup.json
  • 13:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 13:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T342617)', diff saved to https://phabricator.wikimedia.org/P50083 and previous config saved to /var/cache/conftool/dbconfig/20230804-130601-ladsgroup.json
  • 13:02 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:01 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T342617)', diff saved to https://phabricator.wikimedia.org/P50082 and previous config saved to /var/cache/conftool/dbconfig/20230804-130142-ladsgroup.json
  • 13:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 13:01 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 13:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast4005.wikimedia.org
  • 13:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast3007.wikimedia.org
  • 12:59 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:59 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:58 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 12:57 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 12:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P50081 and previous config saved to /var/cache/conftool/dbconfig/20230804-125055-ladsgroup.json
  • 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast3007.wikimedia.org
  • 12:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast3007.wikimedia.org
  • 12:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P50080 and previous config saved to /var/cache/conftool/dbconfig/20230804-123548-ladsgroup.json
  • 12:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast3007.wikimedia.org
  • 12:32 godog: bounce prometheus@k8s on prometheus100[56] to test failure to reload certs
  • 12:25 jforrester@deploy1002: Synchronized php-1.41.0-wmf.20/extensions/WikiLambda: T343380 and T343400 (duration: 10m 12s)
  • 12:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T342617)', diff saved to https://phabricator.wikimedia.org/P50079 and previous config saved to /var/cache/conftool/dbconfig/20230804-122042-ladsgroup.json
  • 12:16 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 12:14 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 12:14 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 12:13 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 12:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast3007.wikimedia.org
  • 12:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast3007.wikimedia.org with OS bookworm
  • 12:05 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 12:04 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 11:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 11:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 11:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T342617)', diff saved to https://phabricator.wikimedia.org/P50077 and previous config saved to /var/cache/conftool/dbconfig/20230804-115224-ladsgroup.json
  • 11:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast3007.wikimedia.org with reason: host reimage
  • 11:48 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast3007.wikimedia.org with reason: host reimage
  • 11:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T342617)', diff saved to https://phabricator.wikimedia.org/P50076 and previous config saved to /var/cache/conftool/dbconfig/20230804-113848-ladsgroup.json
  • 11:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 11:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P50075 and previous config saved to /var/cache/conftool/dbconfig/20230804-113718-ladsgroup.json
  • 11:30 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on contint2001.wikimedia.org with reason: Decommissioning
  • 11:30 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on contint2001.wikimedia.org with reason: Decommissioning
  • 11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P50074 and previous config saved to /var/cache/conftool/dbconfig/20230804-112212-ladsgroup.json
  • 11:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T342617)', diff saved to https://phabricator.wikimedia.org/P50073 and previous config saved to /var/cache/conftool/dbconfig/20230804-110705-ladsgroup.json
  • 11:02 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast3007.wikimedia.org with OS bookworm
  • 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast3007.wikimedia.org - jmm@cumin2002"
  • 10:38 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast3007.wikimedia.org - jmm@cumin2002"
  • 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) bast3007.wikimedia.org on all recursors
  • 10:38 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast3007.wikimedia.org on all recursors
  • 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast3007.wikimedia.org - jmm@cumin2002"
  • 10:37 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast3007.wikimedia.org - jmm@cumin2002"
  • 10:33 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 10:33 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast3007.wikimedia.org
  • 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
  • 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
  • 10:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T342617)', diff saved to https://phabricator.wikimedia.org/P50072 and previous config saved to /var/cache/conftool/dbconfig/20230804-102347-ladsgroup.json
  • 10:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 10:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 10:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 10:15 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin1001.eqiad.wmnet
  • 10:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cumin1001.eqiad.wmnet
  • 08:00 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1026.eqiad.wmnet with OS bullseye
  • 07:51 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 398203
  • 07:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 398203
  • 07:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 139901
  • 07:43 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 139901
  • 07:37 moritzm: installing Django security updates
  • 07:34 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1026.eqiad.wmnet with reason: host reimage
  • 07:31 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1026.eqiad.wmnet with reason: host reimage
  • 07:19 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1026.eqiad.wmnet with OS bullseye
  • 03:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2194.codfw.wmnet with OS bullseye
  • 03:20 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 03:12 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 03:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2192.codfw.wmnet with OS bullseye
  • 03:03 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 03:00 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2194.codfw.wmnet with reason: host reimage
  • 02:53 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2194.codfw.wmnet with reason: host reimage
  • 02:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2192.codfw.wmnet with reason: host reimage
  • 02:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2192.codfw.wmnet with reason: host reimage
  • 02:33 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2194.codfw.wmnet with OS bullseye
  • 02:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2194']
  • 02:32 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2194']
  • 02:30 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db2194']
  • 02:26 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host db2193.codfw.wmnet with OS bullseye
  • 02:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2193.codfw.wmnet with OS bullseye
  • 02:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2192.codfw.wmnet with OS bullseye
  • 02:19 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2194']
  • 02:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2193']
  • 01:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2194.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2192']
  • 00:51 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2193']
  • 00:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2193.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:45 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2192']
  • 00:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db2192']
  • 00:45 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2192']
  • 00:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2192.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:39 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2190.codfw.wmnet with OS bullseye
  • 00:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2190.codfw.wmnet with OS bullseye
  • 00:38 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2194.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:33 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2190.codfw.wmnet with OS bullseye
  • 00:27 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2193.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:26 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2192.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2191.codfw.wmnet with OS bullseye
  • 00:25 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:24 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2189.codfw.wmnet with OS bullseye
  • 00:18 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:15 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2188.codfw.wmnet with OS bullseye
  • 00:09 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2191.codfw.wmnet with reason: host reimage
  • 00:07 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:06 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2191.codfw.wmnet with reason: host reimage

2023-08-03

  • 23:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2189.codfw.wmnet with reason: host reimage
  • 23:56 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2189.codfw.wmnet with reason: host reimage
  • 23:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2188.codfw.wmnet with reason: host reimage
  • 23:47 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2188.codfw.wmnet with reason: host reimage
  • 23:46 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2191.codfw.wmnet with OS bullseye
  • 23:41 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2190.codfw.wmnet with OS bullseye
  • 23:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2190']
  • 23:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2191']
  • 23:36 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2189.codfw.wmnet with OS bullseye
  • 23:27 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2191']
  • 23:27 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2190']
  • 23:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2188.codfw.wmnet with OS bullseye
  • 23:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2190.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2191.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2189']
  • 23:19 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2189']
  • 23:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2188']
  • 23:18 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2188']
  • 23:15 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db2188']
  • 23:15 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db2189']
  • 23:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2189']
  • 23:04 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2188']
  • 22:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2188.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2189.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:39 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2191.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:38 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2190.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:22 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2189.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:22 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2188.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:19 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:19 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: setup switch port and DNS for db2188-db2195 - pt1979@cumin2002"
  • 22:18 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: setup switch port and DNS for db2188-db2195 - pt1979@cumin2002"
  • 22:16 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 20:59 jforrester@deploy1002: Synchronized php-1.41.0-wmf.20/extensions/WikiLambda/: T343402 and T343380 (duration: 07m 50s)
  • 20:56 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 20:55 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 20:55 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 20:54 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 20:52 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 20:51 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 20:49 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 20:49 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 20:49 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 20:49 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 20:39 thcipriani: end UTC late backport
  • 20:36 thcipriani@deploy1002: Finished scap: Backport for pawikisource: add audiobook namespace alias (T343410) (duration: 10m 39s)
  • 20:30 thcipriani@deploy1002: anzx and thcipriani: Continuing with sync
  • 20:27 thcipriani@deploy1002: anzx and thcipriani: Backport for pawikisource: add audiobook namespace alias (T343410) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:26 thcipriani@deploy1002: Started scap: Backport for pawikisource: add audiobook namespace alias (T343410)
  • 20:23 thcipriani@deploy1002: Finished scap: Backport for Write new on group1 except wikidatawiki for event table migration (T330158) (duration: 15m 54s)
  • 20:17 thcipriani@deploy1002: dreamyjazz and thcipriani: Continuing with sync
  • 20:09 thcipriani@deploy1002: dreamyjazz and thcipriani: Backport for Write new on group1 except wikidatawiki for event table migration (T330158) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:07 thcipriani@deploy1002: Started scap: Backport for Write new on group1 except wikidatawiki for event table migration (T330158)
  • 20:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lists2001.codfw.wmnet with OS bookworm
  • 20:04 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:54 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:53 dancy: dancy@deploy1002 rebuilt and synchronized wikiversions files group2 wikis to 1.41.0-wmf.20 refs T340248
  • 19:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lists2001.codfw.wmnet with reason: host reimage
  • 19:35 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lists2001.codfw.wmnet with reason: host reimage
  • 19:31 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 19:30 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
  • 19:26 dancy@deploy1002: Finished scap: Backport for Fix mobile search text overlapping (T343397) (duration: 09m 33s)
  • 19:20 dancy@deploy1002: jdlrobson and dancy: Continuing with sync
  • 19:20 dancy@deploy1002: jdlrobson and dancy: Backport for Fix mobile search text overlapping (T343397) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 19:16 dancy@deploy1002: Started scap: Backport for Fix mobile search text overlapping (T343397)
  • 19:12 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 19:12 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
  • 19:11 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lists2001.codfw.wmnet with OS bookworm
  • 17:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host titan2002.codfw.wmnet with OS bookworm
  • 17:35 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 17:17 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:17 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:17 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:14 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:12 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:11 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 16:56 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1025.eqiad.wmnet with OS bullseye
  • 16:56 cgoubert@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cgoubert@cumin1001"
  • 16:40 cgoubert@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cgoubert@cumin1001"
  • 16:28 jforrester@deploy1002: Finished scap: Backport for Fix unsafe validator to not reach into undefined keys (T343393) (duration: 10m 57s)
  • 16:26 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:22 jforrester@deploy1002: jforrester: Continuing with sync
  • 16:19 jforrester@deploy1002: jforrester: Backport for Fix unsafe validator to not reach into undefined keys (T343393) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 16:19 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1025.eqiad.wmnet with reason: host reimage
  • 16:18 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 16:18 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 16:17 jforrester@deploy1002: Started scap: Backport for Fix unsafe validator to not reach into undefined keys (T343393)
  • 16:15 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1025.eqiad.wmnet with reason: host reimage
  • 16:14 cgoubert@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Rename kubernetes10[25-26] - cgoubert@cumin1001 - T343306"
  • 16:13 cgoubert@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Rename kubernetes10[25-26] - cgoubert@cumin1001 - T343306"
  • 16:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on titan2002.codfw.wmnet with reason: host reimage
  • 16:04 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on titan2002.codfw.wmnet with reason: host reimage
  • 16:02 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
  • 16:01 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
  • 15:47 moritzm: installing pandoc security updates
  • 15:43 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host titan2002.codfw.wmnet with OS bookworm
  • 15:40 fabfur: imported `varnishkafka` package in bookworm-wikimedia (T342154)
  • 15:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['titan2002']
  • 15:30 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['titan2002']
  • 15:24 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 15:24 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 15:23 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 15:23 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 15:23 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 15:22 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 15:22 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 15:22 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 15:22 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 15:21 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 15:21 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 15:20 moritzm: installing glibc security updates on bookworm
  • 15:20 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 15:20 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 15:19 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 15:19 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 15:13 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 15:13 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 15:13 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 15:13 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 15:12 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 15:11 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 15:11 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 15:11 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 15:10 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 15:10 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 15:10 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 15:09 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 15:09 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 15:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['titan2002']
  • 15:07 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['titan2002']
  • 15:05 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1025.eqiad.wmnet with OS bullseye
  • 15:02 claime: Run homer on lsw1-f3-eqiad for kubernetes102[5-6] imaging - T343306
  • 14:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host titan2001.codfw.wmnet with OS bookworm
  • 14:46 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:22 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
  • 14:22 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
  • 14:21 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
  • 14:21 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
  • 14:19 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on titan2001.codfw.wmnet with reason: host reimage
  • 14:02 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on titan2001.codfw.wmnet with reason: host reimage
  • 13:58 jforrester@deploy1002: Finished scap: Backport for [Wikifunctions] Allow logged-in users to make function calls again (duration: 08m 24s)
  • 13:51 jforrester@deploy1002: jforrester: Continuing with sync
  • 13:51 jforrester@deploy1002: jforrester: Backport for [Wikifunctions] Allow logged-in users to make function calls again synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:49 jforrester@deploy1002: Started scap: Backport for [Wikifunctions] Allow logged-in users to make function calls again
  • 13:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['titan2002']
  • 13:46 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['titan2002']
  • 13:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['titan2002']
  • 13:45 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['titan2002']
  • 13:40 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['titan2001']
  • 13:35 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['titan2001']
  • 13:30 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host titan2001.codfw.wmnet with OS bookworm
  • 13:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['titan2002']
  • 13:26 taavi: taavi@mwmaint1002 ~ $ mwscript namespaceDupes.php pawikisource --fix --add-prefix "BROKEN " # T343410
  • 13:23 taavi@deploy1002: Finished scap: Backport for pawikisource: create audiobook namespace (T343410) (duration: 13m 01s)
  • 13:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['titan2001']
  • 13:17 taavi@deploy1002: taavi and anzx: Continuing with sync
  • 13:13 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['titan2002']
  • 13:12 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['titan2001']
  • 13:12 taavi@deploy1002: taavi and anzx: Backport for pawikisource: create audiobook namespace (T343410) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:12 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 13:12 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 13:10 taavi@deploy1002: Started scap: Backport for pawikisource: create audiobook namespace (T343410)
  • 12:41 jforrester@deploy1002: Finished scap: Backport for WikiLambda: Add PHP code for Z2K5/'short descriptions' (T343396) (duration: 09m 41s)
  • 12:34 jforrester@deploy1002: jforrester: Continuing with sync
  • 12:33 jforrester@deploy1002: jforrester: Backport for WikiLambda: Add PHP code for Z2K5/'short descriptions' (T343396) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 12:31 taavi: updated T343294 migitations
  • 12:31 jforrester@deploy1002: Started scap: Backport for WikiLambda: Add PHP code for Z2K5/'short descriptions' (T343396)
  • 12:15 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@54c0898] (releasing): (no justification provided) (duration: 00m 42s)
  • 12:15 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@54c0898] (releasing): (no justification provided)
  • 12:02 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 12:02 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 12:02 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 12:02 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 12:02 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 12:02 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 12:02 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 12:02 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 12:02 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 12:02 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 12:01 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 12:01 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 12:01 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 12:01 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 12:01 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 12:01 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 12:01 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 12:01 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 12:01 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 12:00 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:49 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 11:48 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 11:48 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 11:48 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 11:48 cgoubert@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 11:48 cgoubert@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 11:48 cgoubert@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 11:48 cgoubert@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 11:48 cgoubert@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 11:48 cgoubert@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 11:48 cgoubert@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 11:48 cgoubert@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 11:47 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 11:47 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 11:47 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 11:47 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 11:47 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 11:47 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 11:46 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 11:46 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 11:46 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 11:46 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 11:45 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 11:45 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 11:45 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 11:45 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 11:44 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:44 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:44 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:44 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:23 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:23 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 11:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T342617)', diff saved to https://phabricator.wikimedia.org/P50070 and previous config saved to /var/cache/conftool/dbconfig/20230803-110028-ladsgroup.json
  • 11:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T342617)', diff saved to https://phabricator.wikimedia.org/P50069 and previous config saved to /var/cache/conftool/dbconfig/20230803-110000-ladsgroup.json
  • 10:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P50068 and previous config saved to /var/cache/conftool/dbconfig/20230803-104521-ladsgroup.json
  • 10:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P50067 and previous config saved to /var/cache/conftool/dbconfig/20230803-104454-ladsgroup.json
  • 10:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P50066 and previous config saved to /var/cache/conftool/dbconfig/20230803-103015-ladsgroup.json
  • 10:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P50065 and previous config saved to /var/cache/conftool/dbconfig/20230803-102948-ladsgroup.json
  • 10:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T342617)', diff saved to https://phabricator.wikimedia.org/P50062 and previous config saved to /var/cache/conftool/dbconfig/20230803-101509-ladsgroup.json
  • 10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T342617)', diff saved to https://phabricator.wikimedia.org/P50061 and previous config saved to /var/cache/conftool/dbconfig/20230803-101441-ladsgroup.json
  • 10:13 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 10:11 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 10:11 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 10:11 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 10:11 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 10:10 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 10:10 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 10:09 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 10:01 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 09:59 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2129 (T342617)', diff saved to https://phabricator.wikimedia.org/P50059 and previous config saved to /var/cache/conftool/dbconfig/20230803-092338-ladsgroup.json
  • 09:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 09:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 09:21 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:17 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 09:16 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 09:03 claime: Deploying rename changes for mw149[7-8] to kubernetes102[5-6] - T343306
  • 09:03 moritzm: installing systemd bugfix updates from Bookworm 12.1 point release
  • 08:55 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
  • 08:55 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
  • 08:53 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
  • 08:53 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
  • 08:44 moritzm: installing yajl security updates
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 100%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50058 and previous config saved to /var/cache/conftool/dbconfig/20230803-084103-root.json
  • 08:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T342617)', diff saved to https://phabricator.wikimedia.org/P50057 and previous config saved to /var/cache/conftool/dbconfig/20230803-083845-ladsgroup.json
  • 08:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 08:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 08:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T342617)', diff saved to https://phabricator.wikimedia.org/P50056 and previous config saved to /var/cache/conftool/dbconfig/20230803-083824-ladsgroup.json
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 75%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50055 and previous config saved to /var/cache/conftool/dbconfig/20230803-082558-root.json
  • 08:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P50054 and previous config saved to /var/cache/conftool/dbconfig/20230803-082318-ladsgroup.json
  • 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 50%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50053 and previous config saved to /var/cache/conftool/dbconfig/20230803-081053-root.json
  • 08:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P50052 and previous config saved to /var/cache/conftool/dbconfig/20230803-080812-ladsgroup.json
  • 07:59 moritzm: installing Linux 5.10.179 on Buster hosts with Linux 5.10
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 25%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50051 and previous config saved to /var/cache/conftool/dbconfig/20230803-075548-root.json
  • 07:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T342617)', diff saved to https://phabricator.wikimedia.org/P50050 and previous config saved to /var/cache/conftool/dbconfig/20230803-075305-ladsgroup.json
  • 07:51 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: GitLab 16 major version upgrade
  • 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 10%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50049 and previous config saved to /var/cache/conftool/dbconfig/20230803-074044-root.json
  • 07:39 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
  • 07:38 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
  • 07:36 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync
  • 07:36 elukey@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: sync
  • 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 5%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50048 and previous config saved to /var/cache/conftool/dbconfig/20230803-072539-root.json
  • 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 3%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50047 and previous config saved to /var/cache/conftool/dbconfig/20230803-071034-root.json
  • 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 1%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50046 and previous config saved to /var/cache/conftool/dbconfig/20230803-065529-root.json
  • 06:35 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: GitLab 16 major version upgrade
  • 06:33 oblivian@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 06:33 oblivian@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 06:33 oblivian@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 06:33 oblivian@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 06:31 kart_: Updated MinT to 2023-08-02-142037-production (T338292)
  • 06:30 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 06:25 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 06:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T342617)', diff saved to https://phabricator.wikimedia.org/P50045 and previous config saved to /var/cache/conftool/dbconfig/20230803-061827-ladsgroup.json
  • 06:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 06:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 06:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T342617)', diff saved to https://phabricator.wikimedia.org/P50044 and previous config saved to /var/cache/conftool/dbconfig/20230803-061817-ladsgroup.json
  • 06:17 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 06:11 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 06:07 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 06:05 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 06:04 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 06:03 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 06:03 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 06:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P50043 and previous config saved to /var/cache/conftool/dbconfig/20230803-060311-ladsgroup.json
  • 06:02 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 06:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2129 T343296', diff saved to https://phabricator.wikimedia.org/P50042 and previous config saved to /var/cache/conftool/dbconfig/20230803-060241-marostegui.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2114 to s6 primary T343296', diff saved to https://phabricator.wikimedia.org/P50041 and previous config saved to /var/cache/conftool/dbconfig/20230803-060055-marostegui.json
  • 06:00 marostegui: Starting s6 codfw failover from db2129 to db2114 - T343296
  • 05:52 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 05:52 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 05:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P50040 and previous config saved to /var/cache/conftool/dbconfig/20230803-054805-ladsgroup.json
  • 05:46 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 05:46 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2114 with weight 0 T343296', diff saved to https://phabricator.wikimedia.org/P50039 and previous config saved to /var/cache/conftool/dbconfig/20230803-054418-marostegui.json
  • 05:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s6 T343296
  • 05:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Primary switchover s6 T343296
  • 05:34 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 05:34 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 05:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T342617)', diff saved to https://phabricator.wikimedia.org/P50038 and previous config saved to /var/cache/conftool/dbconfig/20230803-053259-ladsgroup.json
  • 03:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T342617)', diff saved to https://phabricator.wikimedia.org/P50037 and previous config saved to /var/cache/conftool/dbconfig/20230803-035940-ladsgroup.json
  • 03:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 03:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 03:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T342617)', diff saved to https://phabricator.wikimedia.org/P50036 and previous config saved to /var/cache/conftool/dbconfig/20230803-035917-ladsgroup.json
  • 03:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P50035 and previous config saved to /var/cache/conftool/dbconfig/20230803-034411-ladsgroup.json
  • 03:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P50034 and previous config saved to /var/cache/conftool/dbconfig/20230803-032905-ladsgroup.json
  • 03:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T342617)', diff saved to https://phabricator.wikimedia.org/P50033 and previous config saved to /var/cache/conftool/dbconfig/20230803-031359-ladsgroup.json
  • 02:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host titan2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 02:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 02:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T342617)', diff saved to https://phabricator.wikimedia.org/P50032 and previous config saved to /var/cache/conftool/dbconfig/20230803-021643-ladsgroup.json
  • 02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P50031 and previous config saved to /var/cache/conftool/dbconfig/20230803-020137-ladsgroup.json
  • 01:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P50030 and previous config saved to /var/cache/conftool/dbconfig/20230803-014629-ladsgroup.json
  • 01:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T342617)', diff saved to https://phabricator.wikimedia.org/P50029 and previous config saved to /var/cache/conftool/dbconfig/20230803-014503-ladsgroup.json
  • 01:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 01:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 01:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 01:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 01:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T342617)', diff saved to https://phabricator.wikimedia.org/P50028 and previous config saved to /var/cache/conftool/dbconfig/20230803-014426-ladsgroup.json
  • 01:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T342617)', diff saved to https://phabricator.wikimedia.org/P50027 and previous config saved to /var/cache/conftool/dbconfig/20230803-013123-ladsgroup.json
  • 01:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P50026 and previous config saved to /var/cache/conftool/dbconfig/20230803-012920-ladsgroup.json
  • 01:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P50025 and previous config saved to /var/cache/conftool/dbconfig/20230803-011414-ladsgroup.json
  • 00:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T342617)', diff saved to https://phabricator.wikimedia.org/P50024 and previous config saved to /var/cache/conftool/dbconfig/20230803-005908-ladsgroup.json
  • 00:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1202 (T342617)', diff saved to https://phabricator.wikimedia.org/P50023 and previous config saved to /var/cache/conftool/dbconfig/20230803-003939-ladsgroup.json
  • 00:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 00:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 00:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T342617)', diff saved to https://phabricator.wikimedia.org/P50022 and previous config saved to /var/cache/conftool/dbconfig/20230803-003916-ladsgroup.json
  • 00:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P50021 and previous config saved to /var/cache/conftool/dbconfig/20230803-002410-ladsgroup.json
  • 00:13 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host titan2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host titan2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P50020 and previous config saved to /var/cache/conftool/dbconfig/20230803-000904-ladsgroup.json

2023-08-02

  • 23:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T342617)', diff saved to https://phabricator.wikimedia.org/P50019 and previous config saved to /var/cache/conftool/dbconfig/20230802-235358-ladsgroup.json
  • 23:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T342617)', diff saved to https://phabricator.wikimedia.org/P50018 and previous config saved to /var/cache/conftool/dbconfig/20230802-232528-ladsgroup.json
  • 23:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 23:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 23:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T342617)', diff saved to https://phabricator.wikimedia.org/P50017 and previous config saved to /var/cache/conftool/dbconfig/20230802-232507-ladsgroup.json
  • 23:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P50016 and previous config saved to /var/cache/conftool/dbconfig/20230802-231001-ladsgroup.json
  • 23:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T342617)', diff saved to https://phabricator.wikimedia.org/P50015 and previous config saved to /var/cache/conftool/dbconfig/20230802-230127-ladsgroup.json
  • 23:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 23:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 23:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T342617)', diff saved to https://phabricator.wikimedia.org/P50014 and previous config saved to /var/cache/conftool/dbconfig/20230802-230106-ladsgroup.json
  • 22:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P50013 and previous config saved to /var/cache/conftool/dbconfig/20230802-225454-ladsgroup.json
  • 22:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P50012 and previous config saved to /var/cache/conftool/dbconfig/20230802-224559-ladsgroup.json
  • 22:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lists2001']
  • 22:45 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lists2001']
  • 22:44 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['lists2001']
  • 22:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T342617)', diff saved to https://phabricator.wikimedia.org/P50011 and previous config saved to /var/cache/conftool/dbconfig/20230802-223948-ladsgroup.json
  • 22:39 krinkle@deploy1002: Finished scap: Backport for noc: Remove ?blame=1 from highlight.php URLs (duration: 08m 07s)
  • 22:36 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host titan2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:35 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lists2001']
  • 22:34 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:34 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: setup switch port and DNS for titan200[1-2] - pt1979@cumin2002"
  • 22:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lists2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: setup switch port and DNS for titan200[1-2] - pt1979@cumin2002"
  • 22:32 krinkle@deploy1002: reedy and krinkle: Continuing with sync
  • 22:32 krinkle@deploy1002: reedy and krinkle: Backport for noc: Remove ?blame=1 from highlight.php URLs synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 22:31 krinkle@deploy1002: Started scap: Backport for noc: Remove ?blame=1 from highlight.php URLs
  • 22:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P50010 and previous config saved to /var/cache/conftool/dbconfig/20230802-223053-ladsgroup.json
  • 22:30 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 22:21 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host lists2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:20 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:20 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add switch interface and DNS for lists2001 - pt1979@cumin2002"
  • 22:19 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add switch interface and DNS for lists2001 - pt1979@cumin2002"
  • 22:18 krinkle@deploy1002: Finished scap: Backport for Profiler: Sync minor changes with arc-lamp.git package (T337873) (duration: 11m 02s)
  • 22:17 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 22:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T342617)', diff saved to https://phabricator.wikimedia.org/P50009 and previous config saved to /var/cache/conftool/dbconfig/20230802-221547-ladsgroup.json
  • 22:12 krinkle@deploy1002: krinkle: Continuing with sync
  • 22:09 krinkle@deploy1002: krinkle: Backport for Profiler: Sync minor changes with arc-lamp.git package (T337873) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 22:07 krinkle@deploy1002: Started scap: Backport for Profiler: Sync minor changes with arc-lamp.git package (T337873)
  • 21:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1191 (T342617)', diff saved to https://phabricator.wikimedia.org/P50008 and previous config saved to /var/cache/conftool/dbconfig/20230802-212412-ladsgroup.json
  • 21:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 21:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 21:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T342617)', diff saved to https://phabricator.wikimedia.org/P50007 and previous config saved to /var/cache/conftool/dbconfig/20230802-212352-ladsgroup.json
  • 21:10 dancy@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.20 refs T340248 (duration: 06m 21s)
  • 21:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P50006 and previous config saved to /var/cache/conftool/dbconfig/20230802-210846-ladsgroup.json
  • 21:04 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.20 refs T340248
  • 20:55 dancy@deploy1002: Finished scap: Backport for Revert "LocalisationCache: Load only core data if possible" (T342418 T343375) (duration: 08m 47s)
  • 20:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P50005 and previous config saved to /var/cache/conftool/dbconfig/20230802-205339-ladsgroup.json
  • 20:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T342617)', diff saved to https://phabricator.wikimedia.org/P50004 and previous config saved to /var/cache/conftool/dbconfig/20230802-204941-ladsgroup.json
  • 20:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 20:49 dancy@deploy1002: dancy: Continuing with sync
  • 20:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 20:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T342617)', diff saved to https://phabricator.wikimedia.org/P50003 and previous config saved to /var/cache/conftool/dbconfig/20230802-204919-ladsgroup.json
  • 20:48 dancy@deploy1002: dancy: Backport for Revert "LocalisationCache: Load only core data if possible" (T342418 T343375) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:46 dancy@deploy1002: Started scap: Backport for Revert "LocalisationCache: Load only core data if possible" (T342418 T343375)
  • 20:41 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@cbce175]: Deploy latest for Airflow analytics instance. (duration: 00m 20s)
  • 20:41 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@cbce175]: Deploy latest for Airflow analytics instance.
  • 20:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T342617)', diff saved to https://phabricator.wikimedia.org/P50002 and previous config saved to /var/cache/conftool/dbconfig/20230802-203833-ladsgroup.json
  • 20:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P50001 and previous config saved to /var/cache/conftool/dbconfig/20230802-203413-ladsgroup.json
  • 20:29 dancy@deploy1002: Finished scap: Backport for Add validator userright for pawikisource (T341428) (duration: 20m 49s)
  • 20:23 dancy@deploy1002: dancy and soda: Continuing with sync
  • 20:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P50000 and previous config saved to /var/cache/conftool/dbconfig/20230802-201907-ladsgroup.json
  • 20:10 dancy@deploy1002: dancy and soda: Backport for Add validator userright for pawikisource (T341428) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:08 dancy@deploy1002: Started scap: Backport for Add validator userright for pawikisource (T341428)
  • 20:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T342617)', diff saved to https://phabricator.wikimedia.org/P49999 and previous config saved to /var/cache/conftool/dbconfig/20230802-200401-ladsgroup.json
  • 19:46 xcollazo@deploy1002: Finished deploy [analytics/refinery@27def33] (hadoop-test): Special refinery deploy to fix mediwiki_history_denormalize TEST [analytics/refinery@27def33] (duration: 01m 59s)
  • 19:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T342617)', diff saved to https://phabricator.wikimedia.org/P49998 and previous config saved to /var/cache/conftool/dbconfig/20230802-194518-ladsgroup.json
  • 19:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 19:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 19:44 xcollazo@deploy1002: Started deploy [analytics/refinery@27def33] (hadoop-test): Special refinery deploy to fix mediwiki_history_denormalize TEST [analytics/refinery@27def33]
  • 19:43 xcollazo@deploy1002: Finished deploy [analytics/refinery@27def33] (thin): Special refinery deploy to fix mediwiki_history_denormalize THIN [analytics/refinery@27def33] (duration: 00m 04s)
  • 19:43 xcollazo@deploy1002: Started deploy [analytics/refinery@27def33] (thin): Special refinery deploy to fix mediwiki_history_denormalize THIN [analytics/refinery@27def33]
  • 19:41 xcollazo@deploy1002: Finished deploy [analytics/refinery@27def33]: Special refinery deploy to fix mediwiki_history_denormalize [analytics/refinery@27def33] (duration: 07m 48s)
  • 19:39 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
  • 19:39 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
  • 19:34 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
  • 19:34 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
  • 19:34 xcollazo@deploy1002: Started deploy [analytics/refinery@27def33]: Special refinery deploy to fix mediwiki_history_denormalize [analytics/refinery@27def33]
  • 18:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['es2025']
  • 18:32 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.20 refs T340248
  • 18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T342617)', diff saved to https://phabricator.wikimedia.org/P49997 and previous config saved to /var/cache/conftool/dbconfig/20230802-182059-ladsgroup.json
  • 18:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 18:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 18:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T342617)', diff saved to https://phabricator.wikimedia.org/P49996 and previous config saved to /var/cache/conftool/dbconfig/20230802-182038-ladsgroup.json
  • 18:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 18:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 18:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T342617)', diff saved to https://phabricator.wikimedia.org/P49995 and previous config saved to /var/cache/conftool/dbconfig/20230802-181724-ladsgroup.json
  • 18:16 dancy@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.20 refs T340248 (duration: 06m 38s)
  • 18:10 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.20 refs T340248
  • 18:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P49994 and previous config saved to /var/cache/conftool/dbconfig/20230802-180532-ladsgroup.json
  • 18:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P49993 and previous config saved to /var/cache/conftool/dbconfig/20230802-180218-ladsgroup.json
  • 17:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P49991 and previous config saved to /var/cache/conftool/dbconfig/20230802-175026-ladsgroup.json
  • 17:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P49990 and previous config saved to /var/cache/conftool/dbconfig/20230802-174712-ladsgroup.json
  • 17:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T342617)', diff saved to https://phabricator.wikimedia.org/P49989 and previous config saved to /var/cache/conftool/dbconfig/20230802-173520-ladsgroup.json
  • 17:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T342617)', diff saved to https://phabricator.wikimedia.org/P49988 and previous config saved to /var/cache/conftool/dbconfig/20230802-173206-ladsgroup.json
  • 16:58 samtar@deploy1002: Finished scap: Backport for enwiki: temp enable emergencyCaptcha (duration: 07m 48s)
  • 16:52 samtar@deploy1002: samtar: Continuing with sync
  • 16:52 samtar@deploy1002: samtar: Backport for enwiki: temp enable emergencyCaptcha synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 16:51 samtar@deploy1002: Started scap: Backport for enwiki: temp enable emergencyCaptcha
  • 16:46 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 16:46 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 16:46 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 16:46 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 16:41 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['es2025']
  • 16:02 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudservices1006.eqiad.wmnet']
  • 16:02 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudservices1006.eqiad.wmnet']
  • 15:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2016.codfw.wmnet with OS bullseye
  • 15:59 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:58 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T342617)', diff saved to https://phabricator.wikimedia.org/P49985 and previous config saved to /var/cache/conftool/dbconfig/20230802-155618-ladsgroup.json
  • 15:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 15:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 15:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T342617)', diff saved to https://phabricator.wikimedia.org/P49984 and previous config saved to /var/cache/conftool/dbconfig/20230802-155558-ladsgroup.json
  • 15:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T342617)', diff saved to https://phabricator.wikimedia.org/P49983 and previous config saved to /var/cache/conftool/dbconfig/20230802-155319-ladsgroup.json
  • 15:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 15:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 15:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T342617)', diff saved to https://phabricator.wikimedia.org/P49982 and previous config saved to /var/cache/conftool/dbconfig/20230802-155258-ladsgroup.json
  • 15:51 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: GitLab minor version upgrade
  • 15:45 cgoubert@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1026
  • 15:45 cgoubert@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1026
  • 15:45 cgoubert@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1025
  • 15:45 cgoubert@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1025
  • 15:43 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:43 cgoubert@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix kubernetes10[25-26] main interfaces - cgoubert@cumin1001"
  • 15:43 cgoubert@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix kubernetes10[25-26] main interfaces - cgoubert@cumin1001"
  • 15:42 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns3002.wikimedia.org
  • 15:41 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
  • 15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P49981 and previous config saved to /var/cache/conftool/dbconfig/20230802-154051-ladsgroup.json
  • 15:40 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
  • 15:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc2016.codfw.wmnet with reason: host reimage
  • 15:38 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host dns3002.wikimedia.org
  • 15:37 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
  • 15:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P49980 and previous config saved to /var/cache/conftool/dbconfig/20230802-153751-ladsgroup.json
  • 15:36 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2016.codfw.wmnet with reason: host reimage
  • 15:30 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['es2025']
  • 15:30 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['es2025']
  • 15:28 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['es2025']
  • 15:27 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['es2025']
  • 15:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P49979 and previous config saved to /var/cache/conftool/dbconfig/20230802-152545-ladsgroup.json
  • 15:25 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
  • 15:24 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
  • 15:24 brett: Remove dns3002 from cr2-esams and cr3-esams routes in prep for reboot - T335835
  • 15:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P49978 and previous config saved to /var/cache/conftool/dbconfig/20230802-152245-ladsgroup.json
  • 15:16 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host pc2016.codfw.wmnet with OS bullseye
  • 15:16 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8 days, 0:00:00 on config-master2001.codfw.wmnet,config-master1001.eqiad.wmnet with reason: WIP hosts to be setup
  • 15:15 volans@cumin1001: START - Cookbook sre.hosts.downtime for 8 days, 0:00:00 on config-master2001.codfw.wmnet,config-master1001.eqiad.wmnet with reason: WIP hosts to be setup
  • 15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T342617)', diff saved to https://phabricator.wikimedia.org/P49977 and previous config saved to /var/cache/conftool/dbconfig/20230802-151038-ladsgroup.json
  • 15:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T342617)', diff saved to https://phabricator.wikimedia.org/P49976 and previous config saved to /var/cache/conftool/dbconfig/20230802-150739-ladsgroup.json
  • 15:07 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
  • 15:04 moritzm: installing gst-plugins-base1.0 security updates
  • 14:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['pc2016']
  • 14:58 elukey@deploy1002: Finished scap: Backport for ext-ORES: avoid Lift Wing calls for fiwiki (T343308) (duration: 09m 08s)
  • 14:54 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudservices1006.eqiad.wmnet']
  • 14:52 elukey@deploy1002: elukey: Continuing with sync
  • 14:52 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1026.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:51 elukey@deploy1002: elukey: Backport for ext-ORES: avoid Lift Wing calls for fiwiki (T343308) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 14:50 volans@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1026.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:49 elukey@deploy1002: Started scap: Backport for ext-ORES: avoid Lift Wing calls for fiwiki (T343308)
  • 14:48 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1025.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:46 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pc2016']
  • 14:44 moritzm: installing iperf3 security updates
  • 14:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudservices1006.eqiad.wmnet']
  • 14:43 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudservices1006.eqiad.wmnet']
  • 14:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudservices1006.eqiad.wmnet']
  • 14:43 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudservices1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:42 volans@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1025.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:41 volans@cumin1001: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host kubernetes1025.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:41 volans@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1025.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:39 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:39 cgoubert@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename mw[1497-1498] to kubernetes[1025-1026] - cgoubert@cumin1001"
  • 14:38 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: GitLab minor version upgrade
  • 14:38 cgoubert@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename mw[1497-1498] to kubernetes[1025-1026] - cgoubert@cumin1001"
  • 14:35 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: GitLab minor version upgrade
  • 14:35 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
  • 14:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc2016.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:26 sbassett: Deployed updated mitigation for T336027
  • 14:19 fabfur: importing python-logstash in bookworm-wikimedia (T342154)
  • 14:19 cgoubert@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw[1497-1498].eqiad.wmnet
  • 14:19 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:19 cgoubert@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1497-1498].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1001"
  • 14:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['es2025']
  • 14:19 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['es2025']
  • 14:18 cgoubert@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1497-1498].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1001"
  • 14:17 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on people1004.eqiad.wmnet with reason: Resizing disk
  • 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T342617)', diff saved to https://phabricator.wikimedia.org/P49975 and previous config saved to /var/cache/conftool/dbconfig/20230802-141719-ladsgroup.json
  • 14:17 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on people1004.eqiad.wmnet with reason: Resizing disk
  • 14:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 14:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 14:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T342617)', diff saved to https://phabricator.wikimedia.org/P49974 and previous config saved to /var/cache/conftool/dbconfig/20230802-141640-ladsgroup.json
  • 14:15 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
  • 14:15 fabfur: importing varnish and libvarnishapi2 in bookworm-wikimedia (T342154)
  • 14:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2015.codfw.wmnet with OS bullseye
  • 14:13 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:08 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:06 cgoubert@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1497-1498].eqiad.wmnet
  • 14:05 cgoubert@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw[1497-1498].eqiad.wment
  • 14:05 cgoubert@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:03 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
  • 14:03 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti2014']
  • 14:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P49973 and previous config saved to /var/cache/conftool/dbconfig/20230802-140134-ladsgroup.json
  • 13:57 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['es2025']
  • 13:57 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['es2025']
  • 13:56 claime: Decomissioning mw1497 and mw1498 - T343306
  • 13:55 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudservices1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:54 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudservices1006
  • 13:54 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudservices1006
  • 13:52 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:52 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudservices1006 - jclark@cumin1001"
  • 13:51 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudservices1006 - jclark@cumin1001"
  • 13:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc2015.codfw.wmnet with reason: host reimage
  • 13:49 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 13:46 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2015.codfw.wmnet with reason: host reimage
  • 13:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P49971 and previous config saved to /var/cache/conftool/dbconfig/20230802-134628-ladsgroup.json
  • 13:36 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:35 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Inject LanguageNameLookupFactory into WikibaseValueFormatterBuilders (T281726) (duration: 08m 39s)
  • 13:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T342617)', diff saved to https://phabricator.wikimedia.org/P49970 and previous config saved to /var/cache/conftool/dbconfig/20230802-133122-ladsgroup.json
  • 13:29 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde: Continuing with sync
  • 13:28 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde: Backport for Inject LanguageNameLookupFactory into WikibaseValueFormatterBuilders (T281726) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P49969 and previous config saved to /var/cache/conftool/dbconfig/20230802-132819-ladsgroup.json
  • 13:26 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host pc2016.mgmt.codfw.wmnet with reboot policy FORCED
  • 13:26 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Inject LanguageNameLookupFactory into WikibaseValueFormatterBuilders (T281726)
  • 13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T342617)', diff saved to https://phabricator.wikimedia.org/P49968 and previous config saved to /var/cache/conftool/dbconfig/20230802-132632-ladsgroup.json
  • 13:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 13:26 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for uzwiki: Install WikiLove (T343270) (duration: 09m 58s)
  • 13:26 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: GitLab minor version upgrade
  • 13:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 13:25 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host pc2015.codfw.wmnet with OS bullseye
  • 13:21 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2014']
  • 13:20 lucaswerkmeister-wmde@deploy1002: stang and lucaswerkmeister-wmde: Continuing with sync
  • 13:17 lucaswerkmeister-wmde@deploy1002: stang and lucaswerkmeister-wmde: Backport for uzwiki: Install WikiLove (T343270) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:16 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for uzwiki: Install WikiLove (T343270)
  • 13:13 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php uzwiki wikilove # Create extension tables for Wikilove on uzwiki (T343270)
  • 13:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P49967 and previous config saved to /var/cache/conftool/dbconfig/20230802-131314-ladsgroup.json
  • 13:12 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ printf 'https://en.wikipedia.org/static/images/project-logos/simplewiktionary.png\n' | mwscript purgeList.php # T343084
  • 13:11 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for simplewiktionary: Update project logo (T343084) (duration: 08m 13s)
  • 13:06 lucaswerkmeister-wmde@deploy1002: stang and lucaswerkmeister-wmde: Continuing with sync
  • 13:05 lucaswerkmeister-wmde@deploy1002: stang and lucaswerkmeister-wmde: Backport for simplewiktionary: Update project logo (T343084) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:03 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for simplewiktionary: Update project logo (T343084)
  • 12:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P49965 and previous config saved to /var/cache/conftool/dbconfig/20230802-125810-ladsgroup.json
  • 12:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P49964 and previous config saved to /var/cache/conftool/dbconfig/20230802-124305-ladsgroup.json
  • 12:42 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics_product@8bba01c]: Redeploy of analytics_product Airflow instance (duration: 00m 08s)
  • 12:42 xcollazo@deploy1002: Started deploy [airflow-dags/analytics_product@8bba01c]: Redeploy of analytics_product Airflow instance
  • 12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1184 T342284', diff saved to https://phabricator.wikimedia.org/P49963 and previous config saved to /var/cache/conftool/dbconfig/20230802-123228-ladsgroup.json
  • 12:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1136 (T342617)', diff saved to https://phabricator.wikimedia.org/P49962 and previous config saved to /var/cache/conftool/dbconfig/20230802-122816-ladsgroup.json
  • 12:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 12:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 12:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T342617)', diff saved to https://phabricator.wikimedia.org/P49961 and previous config saved to /var/cache/conftool/dbconfig/20230802-122756-ladsgroup.json
  • 12:21 dcausse@deploy1002: Finished deploy [airflow-dags/search@8bba01c]: search: do not use hive partitions to wait for wmf_raw.mediawiki_page (duration: 00m 11s)
  • 12:21 dcausse@deploy1002: Started deploy [airflow-dags/search@8bba01c]: search: do not use hive partitions to wait for wmf_raw.mediawiki_page
  • 12:19 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: dc=eqiad,name=mw1498.eqiad.wmnet
  • 12:19 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: dc=eqiad,name=mw1497.eqiad.wmnet
  • 12:19 claime: Depool mw1497 and mw1498 for reimage as wikikube nodes - T343306
  • 12:18 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on people2003.codfw.wmnet with reason: Resizing disk
  • 12:17 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on people2003.codfw.wmnet with reason: Resizing disk
  • 12:13 claime: Repool mw1451 and mw1452, more recent servers will be used - T343306
  • 12:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P49960 and previous config saved to /var/cache/conftool/dbconfig/20230802-121249-ladsgroup.json
  • 12:11 jelto: update gitlab-ce package to 16.0.8-ce.0
  • 12:09 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: dc=eqiad,name=mw1452.eqiad.wmnet
  • 12:09 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: dc=eqiad,name=mw1451.eqiad.wmnet
  • 12:09 claime: Depool mw1451 and mw1452 for reimage as wikikube nodes - T343306
  • 11:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P49959 and previous config saved to /var/cache/conftool/dbconfig/20230802-115743-ladsgroup.json
  • 11:57 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: GitLab minor version upgrade
  • 11:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T342617)', diff saved to https://phabricator.wikimedia.org/P49958 and previous config saved to /var/cache/conftool/dbconfig/20230802-114237-ladsgroup.json
  • 11:41 moritzm: installing libxml2 security updates
  • 11:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 11:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 11:40 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest2002.codfw.wmnet
  • 11:40 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest2002.codfw.wmnet
  • 11:18 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 11:18 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 11:17 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 11:17 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 10:41 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 10:40 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: GitLab minor version upgrade
  • 10:40 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 10:37 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: GitLab minor version upgrade
  • 10:30 samtar@deploy1002: Finished scap: Backport for Revert "enwiki: temp enable emergencyCaptcha" (duration: 07m 33s)
  • 10:24 samtar@deploy1002: samtar: Continuing with sync
  • 10:24 samtar@deploy1002: samtar: Backport for Revert "enwiki: temp enable emergencyCaptcha" synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 10:22 samtar@deploy1002: Started scap: Backport for Revert "enwiki: temp enable emergencyCaptcha"
  • 10:02 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: GitLab minor version upgrade
  • 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T342617)', diff saved to https://phabricator.wikimedia.org/P49954 and previous config saved to /var/cache/conftool/dbconfig/20230802-095428-ladsgroup.json
  • 09:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 09:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 09:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 09:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 09:39 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:37 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 09:24 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: GitLab minor version upgrade
  • 09:20 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:18 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 09:17 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 09:15 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 09:13 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 09:12 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 09:02 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 09:02 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 09:01 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 09:01 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 08:53 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: GitLab minor version upgrade
  • 08:39 jelto: downgrade gitlab-ce package to 15.11.13-ce.0
  • 08:15 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 08:07 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 07:28 taavi: mwscript namespaceDupes.php idwikisource --fix --add-prefix "BROKEN " # T341173
  • 07:19 taavi@deploy1002: Finished scap: Backport for idwikisource change wgSiteName, wgMetaNamespace and add project namespace alias (T341173), Change idwikisource logos (T341173) (duration: 11m 43s)
  • 07:18 moritzm: installing Linux 5.10.179-3 on bullseye hosts
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Repooling after replacing its memory', diff saved to https://phabricator.wikimedia.org/P49951 and previous config saved to /var/cache/conftool/dbconfig/20230802-071441-root.json
  • 07:13 taavi@deploy1002: anzx and taavi: Continuing with sync
  • 07:09 taavi@deploy1002: anzx and taavi: Backport for idwikisource change wgSiteName, wgMetaNamespace and add project namespace alias (T341173), Change idwikisource logos (T341173) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:07 taavi@deploy1002: Started scap: Backport for idwikisource change wgSiteName, wgMetaNamespace and add project namespace alias (T341173), Change idwikisource logos (T341173)
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Repooling after replacing its memory', diff saved to https://phabricator.wikimedia.org/P49950 and previous config saved to /var/cache/conftool/dbconfig/20230802-065936-root.json
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 50%: Repooling after replacing its memory', diff saved to https://phabricator.wikimedia.org/P49949 and previous config saved to /var/cache/conftool/dbconfig/20230802-064431-root.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Repooling after replacing its memory', diff saved to https://phabricator.wikimedia.org/P49948 and previous config saved to /var/cache/conftool/dbconfig/20230802-062925-root.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 10%: Repooling after replacing its memory', diff saved to https://phabricator.wikimedia.org/P49947 and previous config saved to /var/cache/conftool/dbconfig/20230802-061420-root.json
  • 06:13 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 06:12 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 06:12 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 06:12 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 06:11 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 06:10 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 5%: Repooling after replacing its memory', diff saved to https://phabricator.wikimedia.org/P49946 and previous config saved to /var/cache/conftool/dbconfig/20230802-055916-root.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 3%: Repooling after replacing its memory', diff saved to https://phabricator.wikimedia.org/P49945 and previous config saved to /var/cache/conftool/dbconfig/20230802-054411-root.json
  • 05:33 oblivian@deploy1002: Synchronized wmf-config/InitialiseSettings.php: enabling emergency captcha on enwiki - T343294 (take 2) (duration: 06m 40s)
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 1%: Repooling after replacing its memory', diff saved to https://phabricator.wikimedia.org/P49944 and previous config saved to /var/cache/conftool/dbconfig/20230802-052906-root.json
  • 05:23 marostegui: Stop mariadb on es2025 for onsite maintenance dbmaint codfw T343254
  • 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2025 T343254', diff saved to https://phabricator.wikimedia.org/P49943 and previous config saved to /var/cache/conftool/dbconfig/20230802-052021-root.json
  • 05:11 oblivian@deploy1002: Synchronized wmf-config/InitialiseSettings.php: enabling emergency captcha on enwiki - T343294 (duration: 06m 36s)
  • 04:49 _joe_: running scap pull on mwmaint1002 to pick up the noc.w.o changes
  • 01:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['pc2015']
  • 01:24 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pc2015']
  • 01:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc2015.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:14 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc2016.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:04 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host pc2016.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:02 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:02 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: setup switch port and DNS for pc2016 - pt1979@cumin2002"
  • 01:02 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: setup switch port and DNS for pc2016 - pt1979@cumin2002"
  • 00:59 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 00:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2005-dev.codfw.wmnet with OS bullseye
  • 00:58 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:52 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host pc2015.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:41 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:41 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: setup switch interfaces and DNS for pc201[5-6] - pt1979@cumin2002"
  • 00:40 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: setup switch interfaces and DNS for pc201[5-6] - pt1979@cumin2002"
  • 00:38 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 00:37 pt1979@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 00:37 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:36 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 00:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2004-dev.codfw.wmnet with OS bullseye
  • 00:30 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:29 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2006-dev.codfw.wmnet with OS bullseye
  • 00:29 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:25 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2005-dev.codfw.wmnet with reason: host reimage
  • 00:17 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2005-dev.codfw.wmnet with reason: host reimage
  • 00:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2004-dev.codfw.wmnet with reason: host reimage
  • 00:11 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2004-dev.codfw.wmnet with reason: host reimage
  • 00:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2006-dev.codfw.wmnet with reason: host reimage
  • 00:06 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2006-dev.codfw.wmnet with reason: host reimage

2023-08-01

  • 23:13 eileen: config revision changed from 8b3a46c3 to f5e6425b - updated process controll (added segmentation_aging job - rollback if it doesn't work)
  • 22:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt2006-dev.codfw.wmnet with OS bullseye
  • 22:40 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt2005-dev.codfw.wmnet with OS bullseye
  • 22:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt2004-dev.codfw.wmnet with OS bullseye
  • 22:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt2006-dev']
  • 22:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt2005-dev']
  • 22:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt2004-dev']
  • 22:19 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2006-dev']
  • 22:18 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2005-dev']
  • 22:17 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2004-dev']
  • 22:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt2006-dev']
  • 22:14 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt2004-dev']
  • 22:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt2005-dev']
  • 22:11 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2005-dev']
  • 22:11 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt2005-dev']
  • 22:10 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2004-dev']
  • 22:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt2004-dev']
  • 22:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2006-dev']
  • 22:01 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2005-dev']
  • 21:57 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2004-dev']
  • 21:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet2008-dev.codfw.wmnet with OS bullseye
  • 21:46 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet2007-dev.codfw.wmnet with OS bullseye
  • 21:46 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:29 jforrester@deploy1002: Finished scap: Backport for Wikifunctions: Log the 'WikiLambda' warnings and above logs (duration: 10m 22s)
  • 21:23 jforrester@deploy1002: jforrester: Continuing with sync
  • 21:20 jforrester@deploy1002: jforrester: Backport for Wikifunctions: Log the 'WikiLambda' warnings and above logs synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 21:19 jforrester@deploy1002: Started scap: Backport for Wikifunctions: Log the 'WikiLambda' warnings and above logs
  • 21:16 jforrester@deploy1002: Finished scap: Backport for Wikifunctions: Restrict wikilambda-execute to functioneers for now (duration: 09m 03s)
  • 21:10 jforrester@deploy1002: jforrester: Continuing with sync
  • 21:09 jforrester@deploy1002: jforrester: Backport for Wikifunctions: Restrict wikilambda-execute to functioneers for now synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 21:08 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:07 jforrester@deploy1002: Started scap: Backport for Wikifunctions: Restrict wikilambda-execute to functioneers for now
  • 21:05 jforrester@deploy1002: Synchronized ./php-1.41.0-wmf.20/extensions/WikiLambda/: T343253 T343256 (duration: 07m 23s)
  • 20:59 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:55 jforrester@deploy1002: Synchronized ./php-1.41.0-wmf.19/extensions/WikiLambda/: T343253 T343256 (duration: 06m 58s)
  • 20:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2008-dev.codfw.wmnet with reason: host reimage
  • 20:49 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2008-dev.codfw.wmnet with reason: host reimage
  • 20:44 urbanecm@deploy1002: Finished scap: Backport for Write new on group0 for event table migration (T330158) (duration: 21m 46s)
  • 20:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2007-dev.codfw.wmnet with reason: host reimage
  • 20:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2008-dev.codfw.wmnet with OS bullseye
  • 20:42 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:39 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2007-dev.codfw.wmnet with reason: host reimage
  • 20:38 urbanecm@deploy1002: urbanecm and dreamyjazz: Continuing with sync
  • 20:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:28 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudnet2008-dev.codfw.wmnet with OS bullseye
  • 20:23 urbanecm@deploy1002: urbanecm and dreamyjazz: Backport for Write new on group0 for event table migration (T330158) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:22 urbanecm@deploy1002: Started scap: Backport for Write new on group0 for event table migration (T330158)
  • 20:19 urbanecm@deploy1002: Finished scap: Backport for Design: Provide wordmarks/taglines for Wikiversity projects (T341256), Provide wordmarks for Wikivoyage projects (T341259) (duration: 09m 41s)
  • 20:19 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudnet2007-dev.codfw.wmnet with OS bullseye
  • 20:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2007-dev.codfw.wmnet with OS bullseye
  • 20:17 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2008-dev.codfw.wmnet with reason: host reimage
  • 20:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2008-dev.codfw.wmnet with reason: host reimage
  • 20:13 urbanecm@deploy1002: urbanecm and jdlrobson: Continuing with sync
  • 20:11 urbanecm@deploy1002: urbanecm and jdlrobson: Backport for Design: Provide wordmarks/taglines for Wikiversity projects (T341256), Provide wordmarks for Wikivoyage projects (T341259) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option
  • 20:10 urbanecm@deploy1002: Started scap: Backport for Design: Provide wordmarks/taglines for Wikiversity projects (T341256), Provide wordmarks for Wikivoyage projects (T341259)
  • 20:08 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T342617)', diff saved to https://phabricator.wikimedia.org/P49941 and previous config saved to /var/cache/conftool/dbconfig/20230801-200444-ladsgroup.json
  • 19:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudnet2008-dev']
  • 19:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2008-dev.codfw.wmnet with OS bullseye
  • 19:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2006-dev.codfw.wmnet with OS bullseye
  • 19:52 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2007-dev.codfw.wmnet with reason: host reimage
  • 19:51 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudnet2008-dev']
  • 19:50 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P49940 and previous config saved to /var/cache/conftool/dbconfig/20230801-194938-ladsgroup.json
  • 19:48 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2007-dev.codfw.wmnet with reason: host reimage
  • 19:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol2008-dev']
  • 19:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P49939 and previous config saved to /var/cache/conftool/dbconfig/20230801-193432-ladsgroup.json
  • 19:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2006-dev.codfw.wmnet with reason: host reimage
  • 19:31 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2006-dev.codfw.wmnet with reason: host reimage
  • 19:28 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol2008-dev']
  • 19:28 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2007-dev.codfw.wmnet with OS bullseye
  • 19:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol2007-dev']
  • 19:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T342617)', diff saved to https://phabricator.wikimedia.org/P49938 and previous config saved to /var/cache/conftool/dbconfig/20230801-191925-ladsgroup.json
  • 19:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 19:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 19:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49937 and previous config saved to /var/cache/conftool/dbconfig/20230801-191709-ladsgroup.json
  • 19:11 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol2007-dev']
  • 19:10 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol2006-dev']
  • 19:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudnet2007-dev']
  • 19:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol2006-dev']
  • 19:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P49936 and previous config saved to /var/cache/conftool/dbconfig/20230801-190203-ladsgroup.json
  • 19:01 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudnet2007-dev']
  • 18:56 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2006-dev.codfw.wmnet with OS bullseye
  • 18:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P49935 and previous config saved to /var/cache/conftool/dbconfig/20230801-184657-ladsgroup.json
  • 18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T342617)', diff saved to https://phabricator.wikimedia.org/P49934 and previous config saved to /var/cache/conftool/dbconfig/20230801-184220-ladsgroup.json
  • 18:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 18:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49933 and previous config saved to /var/cache/conftool/dbconfig/20230801-184159-ladsgroup.json
  • 18:39 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 18:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudnet2007-dev']
  • 18:37 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 18:37 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 18:36 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 18:36 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:35 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:33 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:33 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:33 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:33 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49932 and previous config saved to /var/cache/conftool/dbconfig/20230801-183151-ladsgroup.json
  • 18:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudnet2008-dev']
  • 18:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P49931 and previous config saved to /var/cache/conftool/dbconfig/20230801-182653-ladsgroup.json
  • 18:21 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudnet2007-dev']
  • 18:17 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudnet2008-dev']
  • 18:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol2007-dev']
  • 18:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol2006-dev']
  • 18:15 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.20 refs T340248
  • 18:15 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol2007-dev']
  • 18:15 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol2006-dev']
  • 18:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P49930 and previous config saved to /var/cache/conftool/dbconfig/20230801-181147-ladsgroup.json
  • 18:05 fabfur: adding dns3001 on cr2-esams and cr3-esams routing for ns2 (T335835)
  • 17:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt2006-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49929 and previous config saved to /var/cache/conftool/dbconfig/20230801-175641-ladsgroup.json
  • 17:55 fabfur: running authdns-update on dns1004 to revert ntp.esams to dns3001 (T335835)
  • 17:48 fabfur: running puppet on 'A:cumin or A:dns-rec or A:netbox' (https://gerrit.wikimedia.org/r/c/operations/puppet/+/944286) (T335835)
  • 17:42 fabfur: started bird and enabled puppet on dns3001 (T335835)
  • 17:41 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns3001.wikimedia.org
  • 17:37 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns3001.wikimedia.org
  • 17:36 fabfur: stopped bird and disable puppet on dns3001 (T335835)
  • 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49928 and previous config saved to /var/cache/conftool/dbconfig/20230801-173130-ladsgroup.json
  • 17:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 17:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T342617)', diff saved to https://phabricator.wikimedia.org/P49927 and previous config saved to /var/cache/conftool/dbconfig/20230801-173109-ladsgroup.json
  • 17:26 fabfur: running puppet on 'A:cumin or A:dns-rec or A:netbox' (https://gerrit.wikimedia.org/r/c/operations/puppet/+/944286) (T335835)
  • 17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P49926 and previous config saved to /var/cache/conftool/dbconfig/20230801-171603-ladsgroup.json
  • 17:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49925 and previous config saved to /var/cache/conftool/dbconfig/20230801-171120-ladsgroup.json
  • 17:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 17:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 17:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T342617)', diff saved to https://phabricator.wikimedia.org/P49924 and previous config saved to /var/cache/conftool/dbconfig/20230801-171059-ladsgroup.json
  • 17:09 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@ee544cb]: Update kartotherian to e28ea7ef (T334668 T332985 T332664 T329924) (duration: 04m 25s)
  • 17:05 mbsantos@deploy1002: Started deploy [kartotherian/deploy@ee544cb]: Update kartotherian to e28ea7ef (T334668 T332985 T332664 T329924)
  • 17:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P49923 and previous config saved to /var/cache/conftool/dbconfig/20230801-170057-ladsgroup.json
  • 16:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P49922 and previous config saved to /var/cache/conftool/dbconfig/20230801-165553-ladsgroup.json
  • 16:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T342617)', diff saved to https://phabricator.wikimedia.org/P49921 and previous config saved to /var/cache/conftool/dbconfig/20230801-164550-ladsgroup.json
  • 16:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P49920 and previous config saved to /var/cache/conftool/dbconfig/20230801-164047-ladsgroup.json
  • 16:38 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudvirt2006-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt2005-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt2004-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T342617)', diff saved to https://phabricator.wikimedia.org/P49919 and previous config saved to /var/cache/conftool/dbconfig/20230801-162541-ladsgroup.json
  • 16:23 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:23 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:22 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 16:22 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 16:21 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 16:20 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 16:07 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 16:06 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 16:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1210 (T342617)', diff saved to https://phabricator.wikimedia.org/P49918 and previous config saved to /var/cache/conftool/dbconfig/20230801-160006-ladsgroup.json
  • 16:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 15:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 15:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T342617)', diff saved to https://phabricator.wikimedia.org/P49917 and previous config saved to /var/cache/conftool/dbconfig/20230801-155945-ladsgroup.json
  • 15:52 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudvirt2005-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:49 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudvirt2004-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudnet2008-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P49916 and previous config saved to /var/cache/conftool/dbconfig/20230801-154439-ladsgroup.json
  • 15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T342617)', diff saved to https://phabricator.wikimedia.org/P49915 and previous config saved to /var/cache/conftool/dbconfig/20230801-154242-ladsgroup.json
  • 15:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 15:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49914 and previous config saved to /var/cache/conftool/dbconfig/20230801-154220-ladsgroup.json
  • 15:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudnet2007-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P49913 and previous config saved to /var/cache/conftool/dbconfig/20230801-153155-ladsgroup.json
  • 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P49912 and previous config saved to /var/cache/conftool/dbconfig/20230801-152933-ladsgroup.json
  • 15:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P49911 and previous config saved to /var/cache/conftool/dbconfig/20230801-152714-ladsgroup.json
  • 15:17 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudnet2008-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P49910 and previous config saved to /var/cache/conftool/dbconfig/20230801-151650-ladsgroup.json
  • 15:15 moritzm: bounce ferm on dse-k8s-ctrl1001
  • 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T342617)', diff saved to https://phabricator.wikimedia.org/P49909 and previous config saved to /var/cache/conftool/dbconfig/20230801-151427-ladsgroup.json
  • 15:14 apine@deploy1002: Finished scap: Backport for Move wikifunctions.org from locked-down to limited deployment (T342820) (duration: 07m 45s)
  • 15:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P49908 and previous config saved to /var/cache/conftool/dbconfig/20230801-151208-ladsgroup.json
  • 15:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol2008-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:08 apine@deploy1002: jforrester and apine: Continuing with sync
  • 15:07 apine@deploy1002: jforrester and apine: Backport for Move wikifunctions.org from locked-down to limited deployment (T342820) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 15:06 apine@deploy1002: Started scap: Backport for Move wikifunctions.org from locked-down to limited deployment (T342820)
  • 15:05 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudnet2007-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P49907 and previous config saved to /var/cache/conftool/dbconfig/20230801-150146-ladsgroup.json
  • 14:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49906 and previous config saved to /var/cache/conftool/dbconfig/20230801-145702-ladsgroup.json
  • 14:47 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add config-master[12]001 - jbond@cumin1001 - T341717"
  • 14:46 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add config-master[12]001 - jbond@cumin1001 - T341717"
  • 14:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P49905 and previous config saved to /var/cache/conftool/dbconfig/20230801-144641-ladsgroup.json
  • 14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1200 (T342617)', diff saved to https://phabricator.wikimedia.org/P49904 and previous config saved to /var/cache/conftool/dbconfig/20230801-143930-ladsgroup.json
  • 14:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 14:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T342617)', diff saved to https://phabricator.wikimedia.org/P49903 and previous config saved to /var/cache/conftool/dbconfig/20230801-143909-ladsgroup.json
  • 14:38 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host dse-k8s-ctrl1001.eqiad.wmnet
  • 14:34 Lucas_WMDE: UTC afternoon backport+config window done (one change, then some k8s issues, which are resolved for now)
  • 14:29 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudcontrol2008-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:27 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host dse-k8s-ctrl1001.eqiad.wmnet
  • 14:26 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1011.eqiad.wmnet
  • 14:25 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 14:25 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 14:25 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 14:24 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 14:24 cgoubert@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 14:24 cgoubert@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 14:24 cgoubert@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 14:24 cgoubert@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 14:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P49902 and previous config saved to /var/cache/conftool/dbconfig/20230801-142403-ladsgroup.json
  • 14:22 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-tool1011.eqiad.wmnet
  • 14:22 cgoubert@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 14:22 cgoubert@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 14:21 cgoubert@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 14:21 cgoubert@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 14:21 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 14:20 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 14:20 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 14:19 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1005.eqiad.wmnet
  • 14:19 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 14:19 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 14:18 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 14:18 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 14:17 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 14:16 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 14:16 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-tool1005.eqiad.wmnet
  • 14:15 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 14:15 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 14:14 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 14:14 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 14:13 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 14:13 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 14:13 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 14:13 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 14:13 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 14:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49901 and previous config saved to /var/cache/conftool/dbconfig/20230801-141144-ladsgroup.json
  • 14:11 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 14:11 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 14:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T342617)', diff saved to https://phabricator.wikimedia.org/P49900 and previous config saved to /var/cache/conftool/dbconfig/20230801-141123-ladsgroup.json
  • 14:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P49899 and previous config saved to /var/cache/conftool/dbconfig/20230801-140856-ladsgroup.json
  • 14:07 volans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet with OS bookworm
  • 14:05 fabfur: running authdns-update on dns1004 to move ntp.esams to dns3002 (https://gerrit.wikimedia.org/r/c/operations/dns/+/944232) (T335835)
  • 13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P49897 and previous config saved to /var/cache/conftool/dbconfig/20230801-135617-ladsgroup.json
  • 13:54 fabfur: removing dns3001 from cr2-esams and cr3-esams routing for reboot (T335835)
  • 13:53 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-worker1001.eqiad.wmnet
  • 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T342617)', diff saved to https://phabricator.wikimedia.org/P49896 and previous config saved to /var/cache/conftool/dbconfig/20230801-135350-ladsgroup.json
  • 13:50 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 13:50 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 13:49 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 13:49 cgoubert@deploy1002: Started scap: (no justification provided)
  • 13:47 volans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 13:46 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-worker1001.eqiad.wmnet
  • 13:46 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 13:46 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 13:45 jbond@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host config-master2001.codfw.wmnet
  • 13:45 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host config-master2001.codfw.wmnet with OS bookworm
  • 13:45 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-druid1001.eqiad.wmnet
  • 13:43 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for btmwiktionary: Add project logo (T343004) (duration: 32m 32s)
  • 13:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P49895 and previous config saved to /var/cache/conftool/dbconfig/20230801-134111-ladsgroup.json
  • 13:39 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-druid1001.eqiad.wmnet
  • 13:35 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-presto1001.eqiad.wmnet
  • 13:33 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on config-master2001.codfw.wmnet with reason: host reimage
  • 13:33 jbond@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host config-master1001.eqiad.wmnet
  • 13:33 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host config-master1001.eqiad.wmnet with OS bookworm
  • 13:32 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 13:31 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-presto1001.eqiad.wmnet
  • 13:30 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on config-master2001.codfw.wmnet with reason: host reimage
  • 13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T342617)', diff saved to https://phabricator.wikimedia.org/P49891 and previous config saved to /var/cache/conftool/dbconfig/20230801-132604-ladsgroup.json
  • 13:24 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and stang: Continuing with sync
  • 13:22 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and stang: Backport for btmwiktionary: Add project logo (T343004) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:19 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on config-master1001.eqiad.wmnet with reason: host reimage
  • 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1185 (T342617)', diff saved to https://phabricator.wikimedia.org/P49890 and previous config saved to /var/cache/conftool/dbconfig/20230801-131946-ladsgroup.json
  • 13:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 13:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T342617)', diff saved to https://phabricator.wikimedia.org/P49889 and previous config saved to /var/cache/conftool/dbconfig/20230801-131925-ladsgroup.json
  • 13:16 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on config-master1001.eqiad.wmnet with reason: host reimage
  • 13:12 jbond@cumin2002: START - Cookbook sre.hosts.reimage for host config-master2001.codfw.wmnet with OS bookworm
  • 13:11 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM config-master2001.codfw.wmnet - jbond@cumin2002"
  • 13:11 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM config-master2001.codfw.wmnet - jbond@cumin2002"
  • 13:11 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for btmwiktionary: Add project logo (T343004)
  • 13:10 jbond@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master2001.codfw.wmnet on all recursors
  • 13:10 jbond@cumin2002: START - Cookbook sre.dns.wipe-cache config-master2001.codfw.wmnet on all recursors
  • 13:10 jbond@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:10 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM config-master2001.codfw.wmnet - jbond@cumin2002"
  • 13:09 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM config-master2001.codfw.wmnet - jbond@cumin2002"
  • 13:06 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host config-master1001.eqiad.wmnet with OS bookworm
  • 13:06 jbond@cumin2002: START - Cookbook sre.dns.netbox
  • 13:06 jbond@cumin2002: START - Cookbook sre.ganeti.makevm for new host config-master2001.codfw.wmnet
  • 13:05 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM config-master1001.eqiad.wmnet - jbond@cumin1001"
  • 13:05 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM config-master1001.eqiad.wmnet - jbond@cumin1001"
  • 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P49888 and previous config saved to /var/cache/conftool/dbconfig/20230801-130419-ladsgroup.json
  • 13:02 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master1001.eqiad.wmnet on all recursors
  • 13:02 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache config-master1001.eqiad.wmnet on all recursors
  • 13:02 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:01 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM config-master1001.eqiad.wmnet - jbond@cumin1001"
  • 13:00 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM config-master1001.eqiad.wmnet - jbond@cumin1001"
  • 12:58 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 12:58 jbond@cumin1001: START - Cookbook sre.ganeti.makevm for new host config-master1001.eqiad.wmnet
  • 12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P49887 and previous config saved to /var/cache/conftool/dbconfig/20230801-124912-ladsgroup.json
  • 12:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T342617)', diff saved to https://phabricator.wikimedia.org/P49886 and previous config saved to /var/cache/conftool/dbconfig/20230801-124508-ladsgroup.json
  • 12:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 12:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 12:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 12:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T342617)', diff saved to https://phabricator.wikimedia.org/P49885 and previous config saved to /var/cache/conftool/dbconfig/20230801-124442-ladsgroup.json
  • 12:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T342617)', diff saved to https://phabricator.wikimedia.org/P49883 and previous config saved to /var/cache/conftool/dbconfig/20230801-123406-ladsgroup.json
  • 12:31 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1003.eqiad.wmnet
  • 12:30 jbond@cumin1001: END (PASS) - Cookbook sre.ganeti.resource_report (exit_code=0)
  • 12:30 jbond@cumin1001: START - Cookbook sre.ganeti.resource_report
  • 12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P49882 and previous config saved to /var/cache/conftool/dbconfig/20230801-122936-ladsgroup.json
  • 12:25 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1003.eqiad.wmnet
  • 12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P49881 and previous config saved to /var/cache/conftool/dbconfig/20230801-121430-ladsgroup.json
  • 12:11 fabfur: imported purged package into bookworm-wikimedia (https://gerrit.wikimedia.org/r/c/operations/software/purged/+/944177) T342154
  • 12:06 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host analytics1076.eqiad.wmnet with OS bullseye
  • 12:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host analytics1077.eqiad.wmnet with OS bullseye
  • 12:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1002.eqiad.wmnet
  • 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T342617)', diff saved to https://phabricator.wikimedia.org/P49880 and previous config saved to /var/cache/conftool/dbconfig/20230801-115924-ladsgroup.json
  • 11:57 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1002.eqiad.wmnet
  • 11:55 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1001.eqiad.wmnet
  • 11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T342617)', diff saved to https://phabricator.wikimedia.org/P49879 and previous config saved to /var/cache/conftool/dbconfig/20230801-115110-ladsgroup.json
  • 11:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 11:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 11:49 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1001.eqiad.wmnet
  • 11:38 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1077.eqiad.wmnet with reason: host reimage
  • 11:36 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1076.eqiad.wmnet with reason: host reimage
  • 11:33 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1077.eqiad.wmnet with reason: host reimage
  • 11:33 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1076.eqiad.wmnet with reason: host reimage
  • 11:22 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1077.eqiad.wmnet with OS bullseye
  • 11:21 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1076.eqiad.wmnet with OS bullseye
  • 11:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T342617)', diff saved to https://phabricator.wikimedia.org/P49878 and previous config saved to /var/cache/conftool/dbconfig/20230801-111829-ladsgroup.json
  • 11:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 11:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 11:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T342617)', diff saved to https://phabricator.wikimedia.org/P49877 and previous config saved to /var/cache/conftool/dbconfig/20230801-111808-ladsgroup.json
  • 11:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 11:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 11:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49876 and previous config saved to /var/cache/conftool/dbconfig/20230801-110858-ladsgroup.json
  • 11:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P49875 and previous config saved to /var/cache/conftool/dbconfig/20230801-110302-ladsgroup.json
  • 10:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P49874 and previous config saved to /var/cache/conftool/dbconfig/20230801-105352-ladsgroup.json
  • 10:51 hnowlan@deploy1002: Finished deploy [restbase/deploy@8eb62f2]: Add gpewiki and btmwiktionary (T335988, T336116) (duration: 20m 29s)
  • 10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P49873 and previous config saved to /var/cache/conftool/dbconfig/20230801-104755-ladsgroup.json
  • 10:45 moritzm: update d-i images to bookworm 12.1 T343121
  • 10:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P49872 and previous config saved to /var/cache/conftool/dbconfig/20230801-103846-ladsgroup.json
  • 10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T342617)', diff saved to https://phabricator.wikimedia.org/P49871 and previous config saved to /var/cache/conftool/dbconfig/20230801-103249-ladsgroup.json
  • 10:31 hnowlan@deploy1002: Started deploy [restbase/deploy@8eb62f2]: Add gpewiki and btmwiktionary (T335988, T336116)
  • 10:28 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host analytics1076.eqiad.wmnet with OS bullseye
  • 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49870 and previous config saved to /var/cache/conftool/dbconfig/20230801-102340-ladsgroup.json
  • 10:21 fabfur: imported prometheus-varnishkafka-exporter package into bookworm-wikimedia (https://gerrit.wikimedia.org/r/c/operations/debs/prometheus-varnishkafka-exporter/+/944169) T342154
  • 10:18 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host analytics1077.eqiad.wmnet with OS bullseye
  • 09:47 urbanecm@deploy1002: Finished scap: Backport for Revert "Fixes: Echo notification count disappears on load in mobile skin" (T335273 T343192) (duration: 11m 35s)
  • 09:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T342617)', diff saved to https://phabricator.wikimedia.org/P49869 and previous config saved to /var/cache/conftool/dbconfig/20230801-094538-ladsgroup.json
  • 09:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 09:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 09:40 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 09:39 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 09:38 urbanecm@deploy1002: urbanecm: Continuing with sync
  • 09:37 urbanecm@deploy1002: urbanecm: Backport for Revert "Fixes: Echo notification count disappears on load in mobile skin" (T335273 T343192) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 09:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49868 and previous config saved to /var/cache/conftool/dbconfig/20230801-093717-ladsgroup.json
  • 09:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 09:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 09:35 urbanecm@deploy1002: Started scap: Backport for Revert "Fixes: Echo notification count disappears on load in mobile skin" (T335273 T343192)
  • 09:33 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1076.eqiad.wmnet with OS bullseye
  • 09:33 btullis@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host analytics1076.eqiad.wmnet with OS bullseye
  • 09:32 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 09:21 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe
  • 09:15 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
  • 09:12 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 09:11 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 09:03 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1077.eqiad.wmnet with OS bullseye
  • 09:03 btullis@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host analytics1077.eqiad.wmnet with OS bullseye
  • 09:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 09:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 08:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 08:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 08:40 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1077.eqiad.wmnet with OS bullseye
  • 08:38 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1076.eqiad.wmnet with OS bullseye
  • 08:33 urbanecm@deploy1002: Finished scap: Backport for GrowthExperiments: enable AddLink task frontend in 10th round of wikis (T308135) (duration: 10m 52s)
  • 08:30 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 08:29 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 08:27 urbanecm@deploy1002: sgimeno and urbanecm: Continuing with sync
  • 08:24 urbanecm@deploy1002: sgimeno and urbanecm: Backport for GrowthExperiments: enable AddLink task frontend in 10th round of wikis (T308135) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 08:22 urbanecm@deploy1002: Started scap: Backport for GrowthExperiments: enable AddLink task frontend in 10th round of wikis (T308135)
  • 08:22 moritzm: installing Linux 4.19.289 on Buster hosts
  • 08:17 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 08:17 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
  • 07:49 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 07:49 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 07:44 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 07:44 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 07:41 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 07:41 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 07:37 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nmaphophe out of all services on: 732 hosts
  • 07:37 root@cumin2002: START - Cookbook sre.idm.logout Logging Nmaphophe out of all services on: 732 hosts
  • 07:37 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nmaphophe out of all services on: 24 hosts
  • 07:37 root@cumin2002: START - Cookbook sre.idm.logout Logging Nmaphophe out of all services on: 24 hosts
  • 07:37 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nmaphophe out of all services on: 1277 hosts
  • 07:36 root@cumin2002: START - Cookbook sre.idm.logout Logging Nmaphophe out of all services on: 1277 hosts
  • 07:07 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 07:07 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 06:54 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 06:54 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 06:24 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 05:48 kart_: cxserver: Remove Youdao MT service (T329137)
  • 05:46 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 05:45 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 05:41 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 05:41 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 05:36 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 05:36 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 05:26 kart_: Updated cxserver to 2023-07-13-063245-production (T340953)
  • 05:24 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 05:23 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 05:18 marostegui: dbmaint s4 testcommonswiki eqiad T343174
  • 05:16 marostegui: dbmaint s4 labswiki (wikitech) eqiad T343175
  • 05:15 marostegui: dbmaint s4 testcommonswiki eqiad T343175
  • 05:12 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 05:12 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 05:07 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 05:06 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 03:57 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.18 (duration: 02m 09s)
  • 03:54 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.20 refs T340248 (duration: 52m 06s)
  • 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.20 refs T340248
  • 02:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T342617)', diff saved to https://phabricator.wikimedia.org/P49867 and previous config saved to /var/cache/conftool/dbconfig/20230801-023010-ladsgroup.json
  • 02:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P49866 and previous config saved to /var/cache/conftool/dbconfig/20230801-021504-ladsgroup.json
  • 01:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P49865 and previous config saved to /var/cache/conftool/dbconfig/20230801-015958-ladsgroup.json
  • 01:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T342617)', diff saved to https://phabricator.wikimedia.org/P49864 and previous config saved to /var/cache/conftool/dbconfig/20230801-014452-ladsgroup.json
  • 00:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol2007-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol2006-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 00:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 00:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T342617)', diff saved to https://phabricator.wikimedia.org/P49863 and previous config saved to /var/cache/conftool/dbconfig/20230801-004000-ladsgroup.json
  • 00:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P49862 and previous config saved to /var/cache/conftool/dbconfig/20230801-002454-ladsgroup.json
  • 00:21 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudcontrol2007-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:17 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudcontrol2006-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:15 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:15 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new cloud nodes DNS and switch config - pt1979@cumin2002"
  • 00:14 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new cloud nodes DNS and switch config - pt1979@cumin2002"
  • 00:11 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 00:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P49861 and previous config saved to /var/cache/conftool/dbconfig/20230801-000948-ladsgroup.json

Other archives

2000s

2010s

2020s