Jump to content

Server Admin Log/Archive 72

From Wikitech
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

2023-10-31

  • 23:59 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1111.eqiad.wmnet with OS bullseye
  • 23:51 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1110.eqiad.wmnet with OS bullseye
  • 23:43 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1112.eqiad.wmnet with reason: host reimage
  • 23:41 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1111.eqiad.wmnet with reason: host reimage
  • 23:38 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1112.eqiad.wmnet with reason: host reimage
  • 23:38 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1111.eqiad.wmnet with reason: host reimage
  • 23:33 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1110.eqiad.wmnet with reason: host reimage
  • 23:30 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1110.eqiad.wmnet with reason: host reimage
  • 23:23 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1112.eqiad.wmnet with OS bullseye
  • 23:23 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1111.eqiad.wmnet with OS bullseye
  • 23:23 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1112.eqiad.wmnet with OS bullseye
  • 23:22 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1111.eqiad.wmnet with OS bullseye
  • 23:15 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1107.eqiad.wmnet with OS bullseye
  • 23:15 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1112.eqiad.wmnet with OS bullseye
  • 23:15 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1111.eqiad.wmnet with OS bullseye
  • 23:15 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1110.eqiad.wmnet with OS bullseye
  • 23:15 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1111.eqiad.wmnet with OS bullseye
  • 23:14 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1110.eqiad.wmnet with OS bullseye
  • 23:14 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1112.eqiad.wmnet with OS bullseye
  • 23:12 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1109.eqiad.wmnet with OS bullseye
  • 23:09 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1108.eqiad.wmnet with OS bullseye
  • 23:08 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1112.eqiad.wmnet with OS bullseye
  • 23:08 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1111.eqiad.wmnet with OS bullseye
  • 23:08 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1110.eqiad.wmnet with OS bullseye
  • 23:08 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1112.eqiad.wmnet with OS bullseye
  • 23:08 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1111.eqiad.wmnet with OS bullseye
  • 23:08 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1110.eqiad.wmnet with OS bullseye
  • 23:01 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1112.eqiad.wmnet with OS bullseye
  • 23:01 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1111.eqiad.wmnet with OS bullseye
  • 23:01 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1110.eqiad.wmnet with OS bullseye
  • 22:57 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1107.eqiad.wmnet with reason: host reimage
  • 22:54 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1109.eqiad.wmnet with reason: host reimage
  • 22:53 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1107.eqiad.wmnet with reason: host reimage
  • 22:52 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1108.eqiad.wmnet with reason: host reimage
  • 22:49 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1109.eqiad.wmnet with reason: host reimage
  • 22:48 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1108.eqiad.wmnet with reason: host reimage
  • 22:38 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1107.eqiad.wmnet with OS bullseye
  • 22:38 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1107.eqiad.wmnet with OS bullseye
  • 22:34 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1109.eqiad.wmnet with OS bullseye
  • 22:33 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1109.eqiad.wmnet with OS bullseye
  • 22:33 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1108.eqiad.wmnet with OS bullseye
  • 22:33 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1108.eqiad.wmnet with OS bullseye
  • 22:25 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1107.eqiad.wmnet with OS bullseye
  • 22:24 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1109.eqiad.wmnet with OS bullseye
  • 22:24 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1107.eqiad.wmnet with OS bullseye
  • 22:24 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1109.eqiad.wmnet with OS bullseye
  • 22:24 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1108.eqiad.wmnet with OS bullseye
  • 22:24 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1108.eqiad.wmnet with OS bullseye
  • 22:19 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1109.eqiad.wmnet with OS bullseye
  • 22:18 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1106.eqiad.wmnet with OS bullseye
  • 22:17 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1108.eqiad.wmnet with OS bullseye
  • 22:17 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1108.eqiad.wmnet with OS bullseye
  • 22:17 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1107.eqiad.wmnet with OS bullseye
  • 22:16 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1107.eqiad.wmnet with OS bullseye
  • 22:05 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1108.eqiad.wmnet with OS bullseye
  • 22:02 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1107.eqiad.wmnet with OS bullseye
  • 22:02 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1107.eqiad.wmnet with OS bullseye
  • 21:57 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1106.eqiad.wmnet with reason: host reimage
  • 21:54 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1106.eqiad.wmnet with reason: host reimage
  • 21:53 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1107.eqiad.wmnet with OS bullseye
  • 21:46 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1103.eqiad.wmnet
  • 21:39 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1106.eqiad.wmnet with OS bullseye
  • 21:38 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1106.eqiad.wmnet with OS bullseye
  • 21:38 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1103.eqiad.wmnet
  • 21:37 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cp1103.eqiad.wmnet
  • 21:37 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1103.eqiad.wmnet
  • 21:34 eileen: civicrm upgraded from 86a08564 to 31d53b57
  • 21:28 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1106.eqiad.wmnet with OS bullseye
  • 21:28 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1106.eqiad.wmnet with OS bullseye
  • 21:21 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1106.eqiad.wmnet with OS bullseye
  • 21:17 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1105.eqiad.wmnet with OS bullseye
  • 21:16 eileen: civicrm upgraded from a458c2bb to 86a08564
  • 20:58 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1105.eqiad.wmnet with reason: host reimage
  • 20:55 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1105.eqiad.wmnet with reason: host reimage
  • 20:40 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1105.eqiad.wmnet with OS bullseye
  • 20:32 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1105.eqiad.wmnet with OS bullseye
  • 20:16 TheresNoTime: close UTC late backport window
  • 20:14 samtar@deploy2002: Finished scap: Backport for Deploy vector 2022 to non-English Wikibooks, etc (T349544) (duration: 10m 51s)
  • 20:08 samtar@deploy2002: samtar and ksarabia: Continuing with sync
  • 20:05 ryankemper@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:05 ryankemper@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:04 samtar@deploy2002: samtar and ksarabia: Backport for Deploy vector 2022 to non-English Wikibooks, etc (T349544) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:03 samtar@deploy2002: Started scap: Backport for Deploy vector 2022 to non-English Wikibooks, etc (T349544)
  • 19:56 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 19:55 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 19:12 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1105.eqiad.wmnet with OS bullseye
  • 19:12 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1105.eqiad.wmnet with OS bullseye
  • 19:01 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1105.eqiad.wmnet with OS bullseye
  • 18:59 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1104.eqiad.wmnet with OS bullseye
  • 18:50 ejegg: restarted fundraising scheduled jobs
  • 18:40 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1104.eqiad.wmnet with reason: host reimage
  • 18:37 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1104.eqiad.wmnet with reason: host reimage
  • 18:24 ejegg: disabled fundraising scheduled jobs for table alter
  • 18:24 dduvall@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.3 refs T348356
  • 18:22 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1104.eqiad.wmnet with OS bullseye
  • 18:10 ejegg: fundraising civicrm upgraded from 5862a3fc to a458c2bb
  • 18:04 sukhe: reprepro -C component/dnsdist include bookworm-wikimedia dnsdist_1.8.2-1+wmf12u1_amd64.changes
  • 17:59 taavi@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt-wdqs1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:56 taavi@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt-wdqs1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:52 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1104.eqiad.wmnet with OS bullseye
  • 17:51 taavi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt-wdqs1002
  • 17:51 taavi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt-wdqs1002
  • 17:43 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 17:43 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 17:42 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 17:42 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 17:42 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 17:42 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 17:27 Krinkle: krinkle@deploy2002:/srv/mediawiki/private: fix untracked warning for readme.FatalErrorSettings.php
  • 16:49 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 16:49 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 16:44 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1103.eqiad.wmnet with OS bullseye
  • 16:35 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 16:34 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 16:31 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1104.eqiad.wmnet with OS bullseye
  • 16:30 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1104.eqiad.wmnet with OS bullseye
  • 16:27 taavi@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt-wdqs1002.eqiad.wmnet with OS bookworm
  • 16:25 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1103.eqiad.wmnet with reason: host reimage
  • 16:23 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 16:23 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 16:22 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1103.eqiad.wmnet with reason: host reimage
  • 16:20 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1104.eqiad.wmnet with OS bullseye
  • 16:15 taavi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1002.eqiad.wmnet with OS bookworm
  • 16:15 taavi@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt-wdqs1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:12 taavi@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt-wdqs1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:12 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:11 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign new IPs to cloudvirt-wdqs1002 - taavi@cumin1001"
  • 16:10 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign new IPs to cloudvirt-wdqs1002 - taavi@cumin1001"
  • 16:08 taavi@cumin1001: START - Cookbook sre.dns.netbox
  • 16:06 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
  • 16:04 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bookworm
  • 15:52 arnaudb@cumin1001: dbctl commit (dc=all): 'discard db1131', diff saved to https://phabricator.wikimedia.org/P53120 and previous config saved to /var/cache/conftool/dbconfig/20231031-155253-arnaudb.json
  • 15:43 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
  • 15:42 arnaudb@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts db1131.eqiad.wmnet
  • 15:42 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:42 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1131.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
  • 15:41 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1131.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
  • 15:38 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
  • 15:33 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1131.eqiad.wmnet
  • 15:29 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 15:28 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 15:26 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 15:25 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 15:25 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 15:24 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bookworm
  • 15:23 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 15:22 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 15:22 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 15:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T343198)', diff saved to https://phabricator.wikimedia.org/P53119 and previous config saved to /var/cache/conftool/dbconfig/20231031-152105-arnaudb.json
  • 15:11 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 15:11 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 15:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P53118 and previous config saved to /var/cache/conftool/dbconfig/20231031-150558-arnaudb.json
  • 15:06 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 15:06 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 15:05 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 15:05 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 15:04 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
  • 15:04 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bookworm
  • 14:57 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
  • 14:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P53117 and previous config saved to /var/cache/conftool/dbconfig/20231031-145052-arnaudb.json
  • 14:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T343198)', diff saved to https://phabricator.wikimedia.org/P53116 and previous config saved to /var/cache/conftool/dbconfig/20231031-143545-arnaudb.json
  • 14:13 sukhe: install4002:/etc/dhcp/automation/ttyS1-115200 rm cp4052.conf
  • 14:06 sbassett: Deployed updated security mitigation for T348828
  • 13:59 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 13:58 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 13:49 ejegg: fundraising civicrm upgraded from 71d26d3b to 5862a3fc
  • 13:45 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
  • 13:36 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 13:36 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 13:30 TheresNoTime: close UTC afternoon backport window
  • 13:27 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 13:27 samtar@deploy2002: Finished scap: Backport for Roll-out Parsoid Kartographer support for all English language wikis (T342871) (duration: 10m 49s)
  • 13:22 samtar@deploy2002: ihurbain and samtar: Continuing with sync
  • 13:18 samtar@deploy2002: ihurbain and samtar: Backport for Roll-out Parsoid Kartographer support for all English language wikis (T342871) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:17 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 13:16 samtar@deploy2002: Started scap: Backport for Roll-out Parsoid Kartographer support for all English language wikis (T342871)
  • 12:53 arnaudb@cumin1001: dbctl commit (dc=all): 'db1227 (re)pooling @ 100%: dh1227 host warmup', diff saved to https://phabricator.wikimedia.org/P53113 and previous config saved to /var/cache/conftool/dbconfig/20231031-125348-arnaudb.json
  • 12:49 arnaudb@cumin1001: dbctl commit (dc=all): 'db1230 (re)pooling @ 100%: db1230 host warmup', diff saved to https://phabricator.wikimedia.org/P53112 and previous config saved to /var/cache/conftool/dbconfig/20231031-124918-arnaudb.json
  • 12:23 arnaudb@cumin1001: dbctl commit (dc=all): 'db1227 (re)pooling @ 80%: dh1227 host warmup', diff saved to https://phabricator.wikimedia.org/P53108 and previous config saved to /var/cache/conftool/dbconfig/20231031-122338-arnaudb.json
  • 12:19 arnaudb@cumin1001: dbctl commit (dc=all): 'db1230 (re)pooling @ 80%: db1230 host warmup', diff saved to https://phabricator.wikimedia.org/P53107 and previous config saved to /var/cache/conftool/dbconfig/20231031-121908-arnaudb.json
  • 12:08 arnaudb@cumin1001: dbctl commit (dc=all): 'db1227 (re)pooling @ 70%: dh1227 host warmup', diff saved to https://phabricator.wikimedia.org/P53106 and previous config saved to /var/cache/conftool/dbconfig/20231031-120833-arnaudb.json
  • 12:04 arnaudb@cumin1001: dbctl commit (dc=all): 'db1230 (re)pooling @ 70%: db1230 host warmup', diff saved to https://phabricator.wikimedia.org/P53105 and previous config saved to /var/cache/conftool/dbconfig/20231031-120403-arnaudb.json
  • 11:53 arnaudb@cumin1001: dbctl commit (dc=all): 'db1227 (re)pooling @ 60%: dh1227 host warmup', diff saved to https://phabricator.wikimedia.org/P53104 and previous config saved to /var/cache/conftool/dbconfig/20231031-115328-arnaudb.json
  • 11:48 arnaudb@cumin1001: dbctl commit (dc=all): 'db1230 (re)pooling @ 60%: db1230 host warmup', diff saved to https://phabricator.wikimedia.org/P53103 and previous config saved to /var/cache/conftool/dbconfig/20231031-114858-arnaudb.json
  • 11:38 arnaudb@cumin1001: dbctl commit (dc=all): 'db1227 (re)pooling @ 50%: dh1227 host warmup', diff saved to https://phabricator.wikimedia.org/P53102 and previous config saved to /var/cache/conftool/dbconfig/20231031-113823-arnaudb.json
  • 11:33 arnaudb@cumin1001: dbctl commit (dc=all): 'db1230 (re)pooling @ 50%: db1230 host warmup', diff saved to https://phabricator.wikimedia.org/P53101 and previous config saved to /var/cache/conftool/dbconfig/20231031-113353-arnaudb.json
  • 11:24 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol1007.eqiad.wmnet with OS bookworm
  • 11:23 arnaudb@cumin1001: dbctl commit (dc=all): 'db1227 (re)pooling @ 40%: dh1227 host warmup', diff saved to https://phabricator.wikimedia.org/P53099 and previous config saved to /var/cache/conftool/dbconfig/20231031-112318-arnaudb.json
  • 11:18 arnaudb@cumin1001: dbctl commit (dc=all): 'db1230 (re)pooling @ 40%: db1230 host warmup', diff saved to https://phabricator.wikimedia.org/P53098 and previous config saved to /var/cache/conftool/dbconfig/20231031-111849-arnaudb.json
  • 11:08 arnaudb@cumin1001: dbctl commit (dc=all): 'db1227 (re)pooling @ 30%: dh1227 host warmup', diff saved to https://phabricator.wikimedia.org/P53097 and previous config saved to /var/cache/conftool/dbconfig/20231031-110813-arnaudb.json
  • 11:03 arnaudb@cumin1001: dbctl commit (dc=all): 'db1230 (re)pooling @ 30%: db1230 host warmup', diff saved to https://phabricator.wikimedia.org/P53096 and previous config saved to /var/cache/conftool/dbconfig/20231031-110344-arnaudb.json
  • 10:53 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1007.eqiad.wmnet with reason: host reimage
  • 10:53 arnaudb@cumin1001: dbctl commit (dc=all): 'db1227 (re)pooling @ 20%: dh1227 host warmup', diff saved to https://phabricator.wikimedia.org/P53095 and previous config saved to /var/cache/conftool/dbconfig/20231031-105308-arnaudb.json
  • 10:50 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1007.eqiad.wmnet with reason: host reimage
  • 10:48 arnaudb@cumin1001: dbctl commit (dc=all): 'db1230 (re)pooling @ 20%: db1230 host warmup', diff saved to https://phabricator.wikimedia.org/P53094 and previous config saved to /var/cache/conftool/dbconfig/20231031-104839-arnaudb.json
  • 10:38 arnaudb@cumin1001: dbctl commit (dc=all): 'db1227 (re)pooling @ 10%: dh1227 host warmup', diff saved to https://phabricator.wikimedia.org/P53093 and previous config saved to /var/cache/conftool/dbconfig/20231031-103804-arnaudb.json
  • 10:37 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1007.eqiad.wmnet with OS bookworm
  • 10:33 arnaudb@cumin1001: dbctl commit (dc=all): 'db1230 (re)pooling @ 10%: db1230 host warmup', diff saved to https://phabricator.wikimedia.org/P53092 and previous config saved to /var/cache/conftool/dbconfig/20231031-103334-arnaudb.json
  • 10:22 arnaudb@cumin1001: dbctl commit (dc=all): 'db1227 (re)pooling @ 5%: dh1227 host warmup', diff saved to https://phabricator.wikimedia.org/P53091 and previous config saved to /var/cache/conftool/dbconfig/20231031-102259-arnaudb.json
  • 10:18 arnaudb@cumin1001: dbctl commit (dc=all): 'db1230 (re)pooling @ 5%: db1230 host warmup', diff saved to https://phabricator.wikimedia.org/P53090 and previous config saved to /var/cache/conftool/dbconfig/20231031-101829-arnaudb.json
  • 10:17 arnaudb@cumin1001: dbctl commit (dc=all): 'set db1230 as a depooled host', diff saved to https://phabricator.wikimedia.org/P53089 and previous config saved to /var/cache/conftool/dbconfig/20231031-101750-arnaudb.json
  • 09:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2179 (T343198)', diff saved to https://phabricator.wikimedia.org/P53088 and previous config saved to /var/cache/conftool/dbconfig/20231031-095054-arnaudb.json
  • 09:50 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 09:50 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 09:47 arnaudb@cumin1001: dbctl commit (dc=all): 'set db1230 as a depooled host', diff saved to https://phabricator.wikimedia.org/P53087 and previous config saved to /var/cache/conftool/dbconfig/20231031-094737-arnaudb.json
  • 09:39 arnaudb@cumin1001: dbctl commit (dc=all): 'set db1230 as a depooled host', diff saved to https://phabricator.wikimedia.org/P53086 and previous config saved to /var/cache/conftool/dbconfig/20231031-093919-arnaudb.json
  • 09:35 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 100%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53085 and previous config saved to /var/cache/conftool/dbconfig/20231031-093457-arnaudb.json
  • 09:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Set ', diff saved to https://phabricator.wikimedia.org/P53084 and previous config saved to /var/cache/conftool/dbconfig/20231031-093448-arnaudb.json
  • 09:01 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 09:00 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 08:57 arnaudb@cumin1001: dbctl commit (dc=all): 'db1230 (re)pooling @ 5%: db1230 host warmup', diff saved to https://phabricator.wikimedia.org/P53083 and previous config saved to /var/cache/conftool/dbconfig/20231031-085740-arnaudb.json
  • 08:56 arnaudb@cumin1001: dbctl commit (dc=all): 'db1230 config append', diff saved to https://phabricator.wikimedia.org/P53082 and previous config saved to /var/cache/conftool/dbconfig/20231031-085615-arnaudb.json
  • 08:53 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 90%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53081 and previous config saved to /var/cache/conftool/dbconfig/20231031-085346-arnaudb.json
  • 08:38 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 75%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53080 and previous config saved to /var/cache/conftool/dbconfig/20231031-083841-arnaudb.json
  • 08:23 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 60%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53079 and previous config saved to /var/cache/conftool/dbconfig/20231031-082336-arnaudb.json
  • 08:08 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 45%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53078 and previous config saved to /var/cache/conftool/dbconfig/20231031-080832-arnaudb.json
  • 07:53 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 30%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53077 and previous config saved to /var/cache/conftool/dbconfig/20231031-075327-arnaudb.json
  • 07:38 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 15%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53076 and previous config saved to /var/cache/conftool/dbconfig/20231031-073822-arnaudb.json
  • 07:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 weight rebalancing - depooled', diff saved to https://phabricator.wikimedia.org/P53075 and previous config saved to /var/cache/conftool/dbconfig/20231031-073652-arnaudb.json
  • 07:33 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 weight rebalancing', diff saved to https://phabricator.wikimedia.org/P53074 and previous config saved to /var/cache/conftool/dbconfig/20231031-073312-arnaudb.json
  • 07:30 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 depooling from API and pooling in db2140', diff saved to https://phabricator.wikimedia.org/P53073 and previous config saved to /var/cache/conftool/dbconfig/20231031-073023-arnaudb.json
  • 07:19 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 weight mimic old db2140', diff saved to https://phabricator.wikimedia.org/P53072 and previous config saved to /var/cache/conftool/dbconfig/20231031-071938-arnaudb.json
  • 07:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Promote db2140 to s4 primary and set section read-write T349820', diff saved to https://phabricator.wikimedia.org/P53071 and previous config saved to /var/cache/conftool/dbconfig/20231031-070549-arnaudb.json
  • 07:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Set s4 codfw as read-only for maintenance - T349820', diff saved to https://phabricator.wikimedia.org/P53070 and previous config saved to /var/cache/conftool/dbconfig/20231031-070405-arnaudb.json
  • 07:02 arnaudb: Starting s4 codfw failover from db2179 to db2140 - T349820
  • 06:49 marostegui@deploy2002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc2014 to pc1 master" (duration: 07m 12s)
  • 06:44 marostegui@deploy2002: marostegui: Continuing with sync
  • 06:43 marostegui@deploy2002: marostegui: Backport for Revert "ProductionServices.php: Promote pc2014 to pc1 master" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 06:42 marostegui@deploy2002: Started scap: Backport for Revert "ProductionServices.php: Promote pc2014 to pc1 master"
  • 06:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Set db2140 with weight 0 T349820', diff saved to https://phabricator.wikimedia.org/P53068 and previous config saved to /var/cache/conftool/dbconfig/20231031-063647-arnaudb.json
  • 06:33 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 34 hosts with reason: Primary switchover s4 T349820
  • 06:33 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 34 hosts with reason: Primary switchover s4 T349820
  • 06:31 marostegui@deploy2002: Finished scap: Backport for ProductionServices.php: Promote pc2014 to pc1 master (duration: 06m 50s)
  • 06:26 marostegui@deploy2002: marostegui: Continuing with sync
  • 06:25 marostegui@deploy2002: marostegui: Backport for ProductionServices.php: Promote pc2014 to pc1 master synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 06:24 marostegui@deploy2002: Started scap: Backport for ProductionServices.php: Promote pc2014 to pc1 master
  • 03:55 mwpresync@deploy2002: Pruned MediaWiki: 1.42.0-wmf.1 (duration: 02m 14s)
  • 03:53 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.42.0-wmf.3 refs T348356 (duration: 50m 44s)
  • 03:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.42.0-wmf.3 refs T348356
  • 00:46 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1103.eqiad.wmnet with OS bullseye
  • 00:29 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
  • 00:19 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1103.eqiad.wmnet with OS bullseye

2023-10-30

  • 23:56 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
  • 23:56 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1103.eqiad.wmnet with OS bullseye
  • 23:50 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
  • 21:22 sbassett: Deployed updated security mitigation for T348828
  • 21:19 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for search-loader[2001-2002].codfw.wmnet,search-loader[1001-1002].eqiad.wmnet
  • 21:19 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for search-loader[2001-2002].codfw.wmnet,search-loader[1001-1002].eqiad.wmnet
  • 20:58 ejegg: re-enabled fundraising scheduled jobs after deployment
  • 20:45 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 20:45 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 20:44 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 20:44 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 20:43 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 20:43 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 20:41 ejegg: fundraising civicrm upgraded from 2c79475e to 71d26d3b
  • 20:40 ejegg: disable fundraising scheduled jobs for deployment
  • 20:29 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 20:29 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 20:28 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 20:21 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 20:20 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns3004.wikimedia.org with OS bookworm
  • 20:17 dancy@deploy2002: Finished scap: Backport for namespaces:mediawiki: add Extensions/Skins as alias of Extension/Skin (+ tallk) (T349970) (duration: 10m 09s)
  • 20:11 dancy@deploy2002: dancy and rhinosf1: Continuing with sync
  • 20:10 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 20:08 dancy@deploy2002: dancy and rhinosf1: Backport for namespaces:mediawiki: add Extensions/Skins as alias of Extension/Skin (+ tallk) (T349970) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:07 dancy@deploy2002: Started scap: Backport for namespaces:mediawiki: add Extensions/Skins as alias of Extension/Skin (+ tallk) (T349970)
  • 19:51 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns3004.wikimedia.org with reason: host reimage
  • 19:47 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns3004.wikimedia.org with reason: host reimage
  • 19:21 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns3004.wikimedia.org with OS bookworm
  • 18:59 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bookworm
  • 18:54 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
  • 18:53 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1103.eqiad.wmnet with OS bullseye
  • 18:52 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bookworm
  • 18:38 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns3003.wikimedia.org with OS bookworm
  • 18:36 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
  • 18:35 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bookworm
  • 18:34 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
  • 18:34 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1103.eqiad.wmnet with OS bullseye
  • 18:33 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: ping_offload
  • 18:27 jbond: migrate ping_offload to puppet7
  • 18:27 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: ping_offload
  • 18:26 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
  • 18:24 sukhe: racadm racreset cp1103.eqiad.wmnet
  • 18:22 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1103.eqiad.wmnet with OS bullseye
  • 18:20 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on search-loader[2001-2002].codfw.wmnet with reason: T346039
  • 18:19 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on search-loader[2001-2002].codfw.wmnet with reason: T346039
  • 18:16 bking@deploy2002: Finished deploy [search/mjolnir/deploy@daf8c32]: T346039 (duration: 00m 06s)
  • 18:16 bking@deploy2002: Started deploy [search/mjolnir/deploy@daf8c32]: T346039
  • 18:11 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
  • 18:10 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
  • 17:56 jbond: migrate bastionhost to puppet7
  • 17:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS bookworm
  • 17:42 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns3003.wikimedia.org with reason: host reimage
  • 17:40 jbond: migrate pki::multirootca to puppet7
  • 17:39 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns3003.wikimedia.org with reason: host reimage
  • 17:27 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host pki2002.codfw.wmnet
  • 17:23 jbond@cumin1001: START - Cookbook sre.puppet.migrate-host for host pki2002.codfw.wmnet
  • 17:22 jbond: migrate pki2002 to puppet7
  • 17:16 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1103.eqiad.wmnet with OS bullseye
  • 17:14 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 17:12 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns3003.wikimedia.org with OS bookworm
  • 17:10 jbond: migrate pki::root to puppet7
  • 17:04 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 16:51 sukhe: running authdns-update for CR 969816
  • 16:39 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp4052.ulsfo.wmnet with reason: depooled, reimaging
  • 16:39 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp4052.ulsfo.wmnet with reason: depooled, reimaging
  • 16:26 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1103.eqiad.wmnet with reason: host reimage
  • 16:23 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1103.eqiad.wmnet with reason: host reimage
  • 16:22 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt-wdqs1003.eqiad.wmnet with OS bookworm
  • 16:22 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - taavi@cumin1001"
  • 16:21 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - taavi@cumin1001"
  • 16:16 jbond: migrate O:ganeti_test to puppet7
  • 16:14 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ganeti-test1002.eqiad.wmnet
  • 16:07 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
  • 16:07 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1103.eqiad.wmnet with OS bullseye
  • 16:04 jbond: migrate ganeti-test1002.eqiad.wmnet to puppet7
  • 16:03 jbond@cumin1001: START - Cookbook sre.puppet.migrate-host for host ganeti-test1002.eqiad.wmnet
  • 16:02 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt-wdqs1003.eqiad.wmnet with reason: host reimage
  • 15:58 taavi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt-wdqs1003.eqiad.wmnet with reason: host reimage
  • 15:57 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "cloudvirt-wdqs1003 - taavi@cumin1001"
  • 15:56 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "cloudvirt-wdqs1003 - taavi@cumin1001"
  • 15:55 jbond: migrate failoid to puppet7
  • 15:51 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
  • 15:51 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1103.eqiad.wmnet with OS bullseye
  • 15:49 jbond: move builder to puppet7
  • 15:49 jbond: move cluster::unprivmanagement to puppet7
  • 15:49 jbond: move config_master to puppet7
  • 15:43 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
  • 15:42 taavi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1003.eqiad.wmnet with OS bookworm
  • 15:33 taavi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt-wdqs1003
  • 15:33 taavi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt-wdqs1003
  • 15:30 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:30 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign new IPs to cloudvirt-wdqs1003 - taavi@cumin1001"
  • 15:29 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign new IPs to cloudvirt-wdqs1003 - taavi@cumin1001"
  • 15:29 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt-wdqs1003
  • 15:27 taavi@cumin1001: START - Cookbook sre.dns.netbox
  • 15:21 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt-wdqs1003
  • 14:41 bking@deploy2002: Finished deploy [search/mjolnir/deploy@daf8c32]: T346039 (duration: 00m 05s)
  • 14:41 bking@deploy2002: Started deploy [search/mjolnir/deploy@daf8c32]: T346039
  • 14:38 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on search-loader2001.codfw.wmnet with reason: T346039
  • 14:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
  • 14:37 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on search-loader2001.codfw.wmnet with reason: T346039
  • 14:36 inflatador: bking@search-loader2001 disabling services as part of bullseye migration T346039
  • 14:34 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
  • 14:32 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 14:31 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 14:06 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
  • 12:55 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1130.eqiad.wmnet onto db1230.eqiad.wmnet
  • 12:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1217.eqiad.wmnet with OS bookworm
  • 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'New host', diff saved to https://phabricator.wikimedia.org/P53065 and previous config saved to /var/cache/conftool/dbconfig/20231030-122855-marostegui.json
  • 12:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1217.eqiad.wmnet with reason: host reimage
  • 12:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1217.eqiad.wmnet with reason: host reimage
  • 12:11 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1217.eqiad.wmnet with OS bookworm
  • 11:52 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1130.eqiad.wmnet onto db1230.eqiad.wmnet
  • 11:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Adding db1230 depooled, depooling db1130', diff saved to https://phabricator.wikimedia.org/P53064 and previous config saved to /var/cache/conftool/dbconfig/20231030-113401-arnaudb.json
  • 11:28 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: provisionning db1230.eqiad.wmnet - T344036
  • 11:28 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: provisionning db1230.eqiad.wmnet - T344036
  • 11:28 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: provisionning db1230.eqiad.wmnet - T344036
  • 11:28 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: provisionning db1230.eqiad.wmnet - T344036
  • 09:42 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@af33784] (releasing): (no justification provided) (duration: 00m 40s)
  • 09:42 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@af33784] (releasing): (no justification provided)
  • 08:29 vgutierrez: switched to digicert-2023 in esams, eqsin and drmrs - T341119
  • 08:17 wmde-fisch@deploy2002: Finished scap: Backport for Cleanup Kartographer Nearby flags (T332785) (duration: 07m 35s)
  • 08:12 wmde-fisch@deploy2002: wmde-fisch: Continuing with sync
  • 08:11 wmde-fisch@deploy2002: wmde-fisch: Backport for Cleanup Kartographer Nearby flags (T332785) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:10 wmde-fisch@deploy2002: Started scap: Backport for Cleanup Kartographer Nearby flags (T332785)
  • 08:10 vgutierrez: triggering a puppet run on cp hosts in esams, eqsin and drmrs to switch to the new unified digicert certificates - T341119
  • 08:06 vgutierrez: repool cp5025 - T341119
  • 08:06 marostegui@deploy2002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc1 master" (duration: 06m 41s)
  • 08:01 marostegui@deploy2002: marostegui: Continuing with sync
  • 08:00 marostegui@deploy2002: marostegui: Backport for Revert "ProductionServices.php: Promote pc1014 to pc1 master" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:59 marostegui@deploy2002: Started scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc1 master"
  • 07:52 vgutierrez: depool cp5025 to perform some digicert-2023 related sanity checks - T341119
  • 07:49 marostegui@deploy2002: Finished scap: Backport for ProductionServices.php: Promote pc1014 to pc1 master (duration: 06m 36s)
  • 07:48 marostegui@deploy2002: marostegui: Continuing with sync
  • 07:44 marostegui@deploy2002: marostegui: Backport for ProductionServices.php: Promote pc1014 to pc1 master synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:43 marostegui@deploy2002: Started scap: Backport for ProductionServices.php: Promote pc1014 to pc1 master
  • 07:35 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
  • 07:34 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
  • 07:29 marostegui@deploy2002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc1 master" (duration: 06m 33s)
  • 07:24 marostegui@deploy2002: marostegui: Continuing with sync
  • 07:24 marostegui@deploy2002: marostegui: Backport for Revert "ProductionServices.php: Promote pc1014 to pc1 master" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:22 marostegui@deploy2002: Started scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc1 master"
  • 07:22 marostegui@deploy2002: Finished scap: Backport for ProductionServices.php: Promote pc1014 to pc1 master (duration: 14m 04s)
  • 07:18 elukey: arm keyholder on acmechief2002 and deploy1002
  • 07:16 marostegui@deploy2002: marostegui: Continuing with sync
  • 07:16 marostegui@deploy2002: marostegui: Backport for ProductionServices.php: Promote pc1014 to pc1 master synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:08 marostegui@deploy2002: Started scap: Backport for ProductionServices.php: Promote pc1014 to pc1 master

2023-10-28

  • 21:25 fabfur: re-pooled cp1089 and cp3069
  • 21:05 fabfur: depooled cp1089 and cp3069 to restart varnish|haproxy and let purged process incoming messages
  • 20:20 fabfur: restarted purged on cp1089, cp6005, cp3069
  • 19:46 fabfur: restarted purged on cp1078

2023-10-27

  • 22:47 rzl: reprepro -C main include bullseye-wikimedia k8s-controller-sidecars_1.0.2-1_source.changes
  • 22:05 ejegg: fundraising civicrm upgraded from 74781efd to 2c79475e
  • 15:38 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2004.codfw.wmnet with OS bullseye
  • 15:38 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmooney@cumin1001"
  • 15:21 herron: power cycled titan1001
  • 14:59 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmooney@cumin1001"
  • 14:42 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2004.codfw.wmnet with reason: host reimage
  • 14:39 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2004.codfw.wmnet with reason: host reimage
  • 14:19 topranks: announcing internal core routes to esams asw's to test policy T344547
  • 14:19 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:18 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:12 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:12 jclark@cumin1001: START - Cookbook sre.hosts.provision for host sretest1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:04 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 14:04 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 14:04 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 14:03 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 14:03 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 14:02 jayme@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 13:38 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host acmechief2002.codfw.wmnet
  • 13:38 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest2004.codfw.wmnet with OS bullseye
  • 13:37 cmooney@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2004.codfw.wmnet with OS bullseye
  • 13:36 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:36 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: change sretest2004 DNS - cmooney@cumin1001"
  • 13:35 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: change sretest2004 DNS - cmooney@cumin1001"
  • 13:33 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 13:31 jbond@cumin1001: START - Cookbook sre.puppet.migrate-host for host acmechief2002.codfw.wmnet
  • 13:27 jbond@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host acmechief2002.codfw.wmnet
  • 13:27 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host acmechief2002.codfw.wmnet with OS bookworm
  • 13:00 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest2004.codfw.wmnet with OS bullseye
  • 12:55 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:54 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:41 jayme: updated mwdebug1001 to icu67 - T345561
  • 12:17 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on acmechief2002.codfw.wmnet with reason: host reimage
  • 12:14 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on acmechief2002.codfw.wmnet with reason: host reimage
  • 11:52 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1102.eqiad.wmnet with OS bullseye
  • 11:34 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1102.eqiad.wmnet with reason: host reimage
  • 11:31 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1102.eqiad.wmnet with reason: host reimage
  • 11:31 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host acmechief2002.codfw.wmnet with OS bookworm
  • 11:30 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM acmechief2002.codfw.wmnet - jbond@cumin1001"
  • 11:29 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM acmechief2002.codfw.wmnet - jbond@cumin1001"
  • 11:29 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) acmechief2002.codfw.wmnet on all recursors
  • 11:29 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache acmechief2002.codfw.wmnet on all recursors
  • 11:29 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:29 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM acmechief2002.codfw.wmnet - jbond@cumin1001"
  • 11:28 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM acmechief2002.codfw.wmnet - jbond@cumin1001"
  • 11:26 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 11:26 jbond@cumin1001: START - Cookbook sre.ganeti.makevm for new host acmechief2002.codfw.wmnet
  • 11:18 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1102.eqiad.wmnet with OS bullseye
  • 11:17 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1102.eqiad.wmnet with OS bullseye
  • 11:08 volans@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
  • 11:08 volans@cumin2002: START - Cookbook sre.ganeti.resource-report
  • 11:01 jbond@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
  • 11:01 jbond@cumin2002: START - Cookbook sre.ganeti.resource-report
  • 11:00 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 10:48 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 10:48 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 10:48 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 10:45 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 10:45 jiji@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 10:44 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 10:40 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1102.eqiad.wmnet with OS bullseye
  • 10:36 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1101.eqiad.wmnet with OS bullseye
  • 10:20 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt-wdqs1001.eqiad.wmnet
  • 10:20 taavi@cumin1001: START - Cookbook sre.hosts.remove-downtime for cloudvirt-wdqs1001.eqiad.wmnet
  • 10:17 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1101.eqiad.wmnet with reason: host reimage
  • 10:17 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 10:14 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1101.eqiad.wmnet with reason: host reimage
  • 10:14 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 10:14 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 10:13 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
  • 09:59 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1101.eqiad.wmnet with OS bullseye
  • 09:59 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1101.eqiad.wmnet with OS bullseye
  • 09:34 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1101.eqiad.wmnet with OS bullseye
  • 09:19 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 09:19 btullis@cumin1001: Added views for new wiki: tlywiki T345169
  • 09:02 moritzm: deployment-prep app servers are now using ICU67/Unicode 13
  • 08:49 moritzm: uploaded libxml2 2.9.4+dfsg1-7+deb10u6+icu67+wmf1 to component/icu67 for buster-wikimedia (rebase of the ICU compat patches on top of the latest buster security update for libxml2) T345561
  • 08:48 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 08:41 moritzm: downgrading dh-python on build2001 to the version which is in Bullseye. Before, 5.20230130~bpo11+1 was installed from bullseye-backports, but that version has dropped the python2 sequence we still need for some Buster builds
  • 08:25 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudmetrics1004.eqiad.wmnet with OS bookworm
  • 08:10 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudmetrics1004.eqiad.wmnet with reason: host reimage
  • 08:07 taavi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudmetrics1004.eqiad.wmnet with reason: host reimage
  • 07:55 taavi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudmetrics1004.eqiad.wmnet with OS bookworm
  • 07:54 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudmetrics1003.eqiad.wmnet with OS bookworm
  • 07:54 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 07:48 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudmetrics1004.eqiad.wmnet with reason: cloudmetrics1003 reimage
  • 07:48 taavi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudmetrics1004.eqiad.wmnet with reason: cloudmetrics1003 reimage
  • 07:39 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudmetrics1003.eqiad.wmnet with reason: host reimage
  • 07:36 taavi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudmetrics1003.eqiad.wmnet with reason: host reimage
  • 07:32 ayounsi@cumin1001: START - Cookbook sre.hosts.provision for host sretest1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 07:24 taavi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudmetrics1003.eqiad.wmnet with OS bookworm
  • 06:12 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2003.codfw.wmnet with OS bullseye
  • 01:49 cstone: civicrm upgraded from 70e0b88d to 74781efd

2023-10-26

  • 22:49 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2006.wikimedia.org with OS bookworm
  • 22:10 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2006.wikimedia.org with reason: host reimage
  • 22:07 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2006.wikimedia.org with reason: host reimage
  • 21:47 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns2006.wikimedia.org with OS bookworm
  • 21:45 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:45 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:32 cstone: payments-wiki upgraded from f7407053 to 04428d6e
  • 21:16 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: still trying to get nova to schedule hosts there
  • 21:16 taavi@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: still trying to get nova to schedule hosts there
  • 21:12 taavi@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1001.eqiad.wmnet
  • 21:00 taavi@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1001.eqiad.wmnet
  • 20:45 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
  • 20:45 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - taavi@cumin1001"
  • 20:44 cstone: payments-wiki upgraded from f7407053 to 99b330be
  • 20:44 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - taavi@cumin1001"
  • 20:42 brennen: end of utc late backport & config window
  • 20:42 brennen@deploy2002: Finished scap: Backport for OIDC: Return instead of null for email in profile (T283456) (duration: 07m 25s)
  • 20:41 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2005.wikimedia.org with OS bookworm
  • 20:37 brennen@deploy2002: brennen and tgr: Continuing with sync
  • 20:36 brennen@deploy2002: brennen and tgr: Backport for OIDC: Return instead of null for email in profile (T283456) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:35 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "cloudvirt-wdqs1001 - taavi@cumin1001"
  • 20:34 brennen@deploy2002: Started scap: Backport for OIDC: Return instead of null for email in profile (T283456)
  • 20:34 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "cloudvirt-wdqs1001 - taavi@cumin1001"
  • 20:34 brennen@deploy2002: Finished scap: Backport for Deploy pilot survey on metawiki (T349854) (duration: 08m 56s)
  • 20:31 bvibber: brion running video transcode backfill via mwmaint2002 (requeueTranscodes.php) + job queue
  • 20:29 brennen@deploy2002: dani and brennen: Continuing with sync
  • 20:26 brennen@deploy2002: dani and brennen: Backport for Deploy pilot survey on metawiki (T349854) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:25 brennen@deploy2002: Started scap: Backport for Deploy pilot survey on metawiki (T349854)
  • 20:23 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2005.wikimedia.org with reason: host reimage
  • 20:20 brennen@deploy2002: Finished scap: Backport for "Soft-launch" iOS-compatible HLS video transcodes (T68722) (duration: 08m 29s)
  • 20:19 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2005.wikimedia.org with reason: host reimage
  • 20:15 brennen@deploy2002: brennen and brion: Continuing with sync
  • 20:14 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: host reimage
  • 20:13 brennen@deploy2002: brennen and brion: Backport for "Soft-launch" iOS-compatible HLS video transcodes (T68722) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:12 brennen@deploy2002: Started scap: Backport for "Soft-launch" iOS-compatible HLS video transcodes (T68722)
  • 20:11 taavi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: host reimage
  • 20:03 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns2005.wikimedia.org with OS bookworm
  • 19:59 taavi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
  • 19:59 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt-wdqs1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:43 taavi@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt-wdqs1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:41 taavi@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
  • 19:38 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2004.wikimedia.org with OS bookworm
  • 19:30 taavi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
  • 19:29 taavi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt-wdqs1001
  • 19:29 taavi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt-wdqs1001
  • 19:28 taavi@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt-wdqs1001
  • 19:28 taavi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt-wdqs1001
  • 19:08 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2004.wikimedia.org with reason: host reimage
  • 19:05 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2004.wikimedia.org with reason: host reimage
  • 18:44 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns2004.wikimedia.org with OS bookworm
  • 18:07 dancy@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.42.0-wmf.2 refs T348355
  • 17:53 sukhe: sudo cumin -b1 -s300 'A:dns-rec and not A:codfw' 'systemctl restart pdns-recursor.service'
  • 17:36 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:36 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign new IPs to cloudvirt-wdqs1001 - taavi@cumin1001"
  • 17:35 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign new IPs to cloudvirt-wdqs1001 - taavi@cumin1001"
  • 17:32 taavi@cumin1001: START - Cookbook sre.dns.netbox
  • 17:19 stevemunene@cumin1001: END (FAIL) - Cookbook sre.druid.roll-restart-workers (exit_code=99) for Druid public cluster: Roll restart of Druid jvm daemons.
  • 17:17 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 17:01 sukhe: sudo cumin -b1 -s30 'A:dns-rec and not A:codfw' 'systemctl restart haproxy.service'
  • 16:18 stevemunene@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons.
  • 16:05 hnowlan@deploy2002: Finished deploy [restbase/deploy@c461bad]: Adding fonwiki T347940 (duration: 16m 53s)
  • 16:04 sukhe: sudo cumin -b1 -s300 'A:dns-rec and A:edges' 'systemctl restart ntp.service'
  • 15:48 hnowlan@deploy2002: Started deploy [restbase/deploy@c461bad]: Adding fonwiki T347940
  • 15:42 sukhe: sudo cumin -b1 -s600 'A:dns-rec and (A:eqiad or A:codfw)' 'systemctl restart ntp.service'
  • 15:42 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough and A:wikidough
  • 15:35 jgiannelos@deploy2002: Finished deploy [restbase/deploy@4c14785]: (no justification provided) (duration: 13m 21s)
  • 15:30 XioNoX: test add BGP session between ssw1-e1-eqiad and lsw1-e8-eqiad
  • 15:22 jgiannelos@deploy2002: Started deploy [restbase/deploy@4c14785]: (no justification provided)
  • 15:15 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 15:12 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2003.codfw.wmnet with reason: host reimage
  • 15:09 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2003.codfw.wmnet with reason: host reimage
  • 14:53 Lucas_WMDE: UTC afternoon backport+config window (belatedly) done
  • 14:52 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for CX3 Build 0.2.0+20231026 (T348563 T308836) (duration: 14m 01s)
  • 14:49 sukhe@cumin2002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough and A:wikidough
  • 14:46 lucaswerkmeister-wmde@deploy2002: kartik and lucaswerkmeister-wmde: Continuing with sync
  • 14:42 jgiannelos@deploy2002: Finished deploy [restbase/deploy@ff46322]: (no justification provided) (duration: 01m 38s)
  • 14:40 jgiannelos@deploy2002: Started deploy [restbase/deploy@ff46322]: (no justification provided)
  • 14:39 lucaswerkmeister-wmde@deploy2002: kartik and lucaswerkmeister-wmde: Backport for CX3 Build 0.2.0+20231026 (T348563 T308836) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:38 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for CX3 Build 0.2.0+20231026 (T348563 T308836)
  • 14:36 filippo@deploy2002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
  • 14:36 filippo@deploy2002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
  • 14:36 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply
  • 14:36 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply
  • 14:36 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/opentelemetry-collector: apply
  • 14:35 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/opentelemetry-collector: apply
  • 14:33 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest2003.codfw.wmnet with OS bullseye
  • 14:23 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Remove broken QUnit test (T349485) (duration: 06m 53s)
  • 14:20 ejegg: donorwiki upgraded from 894eacce to f7407053
  • 14:17 lucaswerkmeister-wmde@deploy2002: abi and lucaswerkmeister-wmde: Continuing with sync
  • 14:17 lucaswerkmeister-wmde@deploy2002: abi and lucaswerkmeister-wmde: Backport for Remove broken QUnit test (T349485) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:16 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Remove broken QUnit test (T349485)
  • 14:14 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/similar-users: apply
  • 14:14 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/similar-users: apply
  • 14:14 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/similar-users: apply
  • 14:09 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/similar-users: apply
  • 14:09 jayme@deploy2002: helmfile [staging] START helmfile.d/services/similar-users: apply
  • 13:56 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for cirrus: disable canary events for update & error streams (duration: 07m 19s)
  • 13:51 lucaswerkmeister-wmde@deploy2002: dcausse and lucaswerkmeister-wmde: Continuing with sync
  • 13:50 lucaswerkmeister-wmde@deploy2002: dcausse and lucaswerkmeister-wmde: Backport for cirrus: disable canary events for update & error streams synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:49 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for cirrus: disable canary events for update & error streams
  • 13:46 moritzm: installing cpio security updates
  • 13:46 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for CX3 Build 0.2.0+20231026 (T348563 T308836) (duration: 14m 48s)
  • 13:40 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and kartik: Continuing with sync
  • 13:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and kartik: Backport for CX3 Build 0.2.0+20231026 (T348563 T308836) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:32 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:maps-replica-eqiad
  • 13:31 moritzm: installing curl security updates on buster
  • 13:31 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for CX3 Build 0.2.0+20231026 (T348563 T308836)
  • 13:30 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Add throttle rule for Edit-a-Thon on 2023-11-03 (T349234) (duration: 06m 43s)
  • 13:27 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot rolling restart_daemons on A:maps-replica-eqiad
  • 13:26 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:maps-replica-codfw
  • 13:25 lucaswerkmeister-wmde@deploy2002: zoranzoki21 and lucaswerkmeister-wmde: Continuing with sync
  • 13:24 lucaswerkmeister-wmde@deploy2002: zoranzoki21 and lucaswerkmeister-wmde: Backport for Add throttle rule for Edit-a-Thon on 2023-11-03 (T349234) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:23 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Add throttle rule for Edit-a-Thon on 2023-11-03 (T349234)
  • 13:21 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot rolling restart_daemons on A:maps-replica-codfw
  • 13:21 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Enable block feature for AbuseFilter on srwiki (T349727) (duration: 10m 23s)
  • 13:20 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 13:20 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 13:15 lucaswerkmeister-wmde@deploy2002: zoranzoki21 and lucaswerkmeister-wmde: Continuing with sync
  • 13:15 moritzm: installing poppler security updates
  • 13:11 lucaswerkmeister-wmde@deploy2002: zoranzoki21 and lucaswerkmeister-wmde: Backport for Enable block feature for AbuseFilter on srwiki (T349727) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:10 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Enable block feature for AbuseFilter on srwiki (T349727)
  • 13:04 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 12:27 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
  • 12:26 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
  • 11:04 kevinbazira@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 11:03 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 10:58 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 10:51 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 10:51 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 10:51 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 10:40 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 10:30 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 10:30 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 10:25 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 10:25 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 10:20 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 10:20 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 10:10 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 10:10 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 09:29 dcausse: erratum (replace wdqs1009 with wdqs2009 in the above msg): depooling and restarting blazegraph on wdqs2009 (stuck since 2023-10-12)
  • 09:28 dcausse: depooling and restarting blazegraph on wdqs1009 (stuck since 2023-10-12)
  • 09:23 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1009.eqiad.wmnet with OS bullseye
  • 09:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 09:14 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 09:06 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1009.eqiad.wmnet with reason: host reimage
  • 09:03 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1009.eqiad.wmnet with reason: host reimage
  • 08:50 brouberol@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-jumbo1009.eqiad.wmnet with OS bullseye
  • 08:49 urbanecm: mwmaint2002: `foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/refreshUserImpactData.php --registeredWithin=1year --editedWithin=2week --hasEditsAtLeast=3 --ignoreIfUpdatedWithin=1second --verbose --use-job-queue` (testing T344428; after enabling backend on all Wikipedias)
  • 08:48 urbanecm@deploy2002: Finished scap: Backport for Growth: Enable new Impact backend everywhere (T344143) (duration: 09m 29s)
  • 08:43 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 08:40 urbanecm@deploy2002: urbanecm: Backport for Growth: Enable new Impact backend everywhere (T344143) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:40 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 08:40 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1008.eqiad.wmnet with OS bullseye
  • 08:39 urbanecm@deploy2002: Started scap: Backport for Growth: Enable new Impact backend everywhere (T344143)
  • 08:32 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 08:32 urbanecm@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 08:31 urbanecm@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 08:29 urbanecm@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 08:28 urbanecm@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 08:28 urbanecm@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 08:27 urbanecm@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 08:24 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1008.eqiad.wmnet with reason: host reimage
  • 08:21 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1008.eqiad.wmnet with reason: host reimage
  • 08:07 brouberol@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-jumbo1008.eqiad.wmnet with OS bullseye
  • 08:02 godog: restart prometheus k8s k8s-aux - T343529
  • 07:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15133
  • 07:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 15133
  • 07:36 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 07:32 jelto@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 07:31 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 07:23 jelto@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 07:21 apergos: UTC morning backport and config window closed
  • 07:19 kartik@deploy2002: Finished scap: Backport for testwiki: Enable Section translation on some Wikipedias with potential to be supported with MinT (T345267) (duration: 13m 11s)
  • 07:13 kartik@deploy2002: kartik: Continuing with sync
  • 07:08 kartik@deploy2002: kartik: Backport for testwiki: Enable Section translation on some Wikipedias with potential to be supported with MinT (T345267) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:06 kartik@deploy2002: Started scap: Backport for testwiki: Enable Section translation on some Wikipedias with potential to be supported with MinT (T345267)
  • 06:52 moritzm: installing openssl security updates
  • 06:40 _joe_: rebuilding the base httpd image for mediawiki to pick up glogger changes
  • 04:31 cstone: civicrm upgraded from 16175067 to 70e0b88d
  • 01:35 cstone: payments-wiki upgraded from 382a5a70 to f7407053

2023-10-25

  • 22:28 jforrester@deploy2002: Finished scap: Backport for diff: Fix LinkRenderer method call (T349726) (duration: 07m 21s)
  • 22:22 jforrester@deploy2002: jforrester and umherirrender: Continuing with sync
  • 22:22 jforrester@deploy2002: jforrester and umherirrender: Backport for diff: Fix LinkRenderer method call (T349726) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:20 jforrester@deploy2002: Started scap: Backport for diff: Fix LinkRenderer method call (T349726)
  • 21:01 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 21:00 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 21:00 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 20:59 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 20:58 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 20:57 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 20:25 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:23 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:20 ejegg: payments-wiki upgraded from 7575f0e6 to 382a5a70
  • 20:11 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:10 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:04 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns1006.wikimedia.org with OS bookworm
  • 19:57 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase1018.eqiad.wmnet
  • 19:57 eevans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:57 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase1018.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
  • 19:56 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase1018.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
  • 19:50 eevans@cumin1001: START - Cookbook sre.dns.netbox
  • 19:44 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:44 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:44 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns1006.wikimedia.org with reason: host reimage
  • 19:41 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns1006.wikimedia.org with reason: host reimage
  • 19:40 eevans@cumin1001: START - Cookbook sre.hosts.decommission for hosts restbase1018.eqiad.wmnet
  • 19:36 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase1017.eqiad.wmnet
  • 19:36 eevans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:36 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase1017.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
  • 19:35 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase1017.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
  • 19:33 eevans@cumin1001: START - Cookbook sre.dns.netbox
  • 19:27 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns1006.wikimedia.org with OS bookworm
  • 19:25 eevans@cumin1001: START - Cookbook sre.hosts.decommission for hosts restbase1017.eqiad.wmnet
  • 19:20 sukhe: sukhe@cumin2002:~$ sudo cumin 'A:dns-rec' "enable-puppet 'wait before enabling'"
  • 19:19 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase1016.eqiad.wmnet
  • 19:19 eevans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:19 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase1016.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
  • 19:18 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase1016.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
  • 19:17 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:16 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 19:16 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 19:16 cmooney@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: check if it makes vlan1054 records - cmooney@cumin1001"
  • 19:14 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: check if it makes vlan1054 records - cmooney@cumin1001"
  • 19:12 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 18:33 dancy@deploy2002: Synchronized php: group1 wikis to 1.42.0-wmf.2 refs T348355 (duration: 05m 52s)
  • 18:32 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:32 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:28 dancy@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.42.0-wmf.2 refs T348355
  • 18:17 eevans@cumin1001: START - Cookbook sre.dns.netbox
  • 18:11 eevans@cumin1001: START - Cookbook sre.hosts.decommission for hosts restbase1016.eqiad.wmnet
  • 18:04 ejegg: fundraising civicrm upgraded from 6cfae26a to 16175067
  • 17:28 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns1005.wikimedia.org with OS bookworm
  • 17:21 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 17:20 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 17:15 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 17:15 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 17:10 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 17:09 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 17:04 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 17:04 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns1005.wikimedia.org with reason: host reimage
  • 17:04 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 17:02 ottomata: temporarily increasing log level to trace for eventgate-logging-external in eqiad canary release only - T347477
  • 16:59 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns1005.wikimedia.org with reason: host reimage
  • 16:47 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 16:46 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 16:46 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 16:45 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns1005.wikimedia.org with OS bookworm
  • 16:45 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 16:44 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:44 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:07 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 15:08 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns1004.wikimedia.org with OS bookworm
  • 14:43 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns1004.wikimedia.org with reason: host reimage
  • 14:39 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns1004.wikimedia.org with reason: host reimage
  • 14:30 jforrester@deploy2002: sync-world aborted: Backport for Allow logged out users to run FunctionEvaluator widget (T301670 T349055 T349057) (duration: 55m 10s)
  • 14:27 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns1004.wikimedia.org with OS bookworm
  • 14:27 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns1004.wikimedia.org with OS bookworm
  • 14:22 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1100.eqiad.wmnet with OS bullseye
  • 14:11 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1007.eqiad.wmnet with OS bullseye
  • 14:09 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns1004.wikimedia.org with OS bookworm
  • 14:02 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1100.eqiad.wmnet with reason: host reimage
  • 14:02 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host deploy1002
  • 14:02 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host deploy1002
  • 13:59 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1100.eqiad.wmnet with reason: host reimage
  • 13:55 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1007.eqiad.wmnet with reason: host reimage
  • 13:54 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-tool1010
  • 13:54 jforrester@deploy2002: jforrester: Continuing with sync
  • 13:53 jforrester@deploy2002: jforrester: Backport for Allow logged out users to run FunctionEvaluator widget (T301670 T349055 T349057) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:52 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host an-tool1010
  • 13:52 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on 15 hosts with reason: not pooled, reimaging in progress
  • 13:51 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1007.eqiad.wmnet with reason: host reimage
  • 13:51 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on 15 hosts with reason: not pooled, reimaging in progress
  • 13:42 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1100.eqiad.wmnet with OS bullseye
  • 13:36 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 13:35 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 13:35 jforrester@deploy2002: Started scap: Backport for Allow logged out users to run FunctionEvaluator widget (T301670 T349055 T349057)
  • 13:29 jforrester@deploy2002: Finished scap: Backport for Remove no-op $wgHiddenPrefs[] = 'prefershttps' (duration: 06m 54s)
  • 13:25 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on an-tool1010.eqiad.wmnet with reason: Moving an-tool1010
  • 13:25 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on an-tool1010.eqiad.wmnet with reason: Moving an-tool1010
  • 13:24 jforrester@deploy2002: matmarex and jforrester: Continuing with sync
  • 13:24 jforrester@deploy2002: matmarex and jforrester: Backport for Remove no-op $wgHiddenPrefs[] = 'prefershttps' synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:22 jforrester@deploy2002: Started scap: Backport for Remove no-op $wgHiddenPrefs[] = 'prefershttps'
  • 13:21 jforrester@deploy2002: Finished scap: Backport for [wikifunctions] Allow logged-out users to run approved functions (T349055) (duration: 07m 59s)
  • 13:16 jforrester@deploy2002: jforrester: Continuing with sync
  • 13:14 jforrester@deploy2002: jforrester: Backport for [wikifunctions] Allow logged-out users to run approved functions (T349055) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:13 jforrester@deploy2002: Started scap: Backport for [wikifunctions] Allow logged-out users to run approved functions (T349055)
  • 13:11 jforrester@deploy2002: Finished scap: Backport for ExtensionDistributor: Add REL1_41 as the development snapshot (T346929) (duration: 07m 01s)
  • 13:06 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1017.eqiad.wmnet
  • 13:06 jforrester@deploy2002: jforrester: Continuing with sync
  • 13:05 jforrester@deploy2002: jforrester: Backport for ExtensionDistributor: Add REL1_41 as the development snapshot (T346929) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:04 jforrester@deploy2002: Started scap: Backport for ExtensionDistributor: Add REL1_41 as the development snapshot (T346929)
  • 13:01 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 12:59 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1017.eqiad.wmnet
  • 10:56 urbanecm: mwmaint2002: foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/refreshUserImpactData.php --registeredWithin=1year --editedWithin=2week --hasEditsAtLeast=3 --ignoreIfUpdatedWithin=1second --verbose --use-job-queue (T344428; all wikis, higher file limit)
  • 10:24 urbanecm: mwmaint2002: foreachwikiindblist /srv/mediawiki/dblists/growth-biggest.dblist extensions/GrowthExperiments/maintenance/refreshUserImpactData.php --registeredWithin=1year --editedWithin=2week --hasEditsAtLeast=3 --ignoreIfUpdatedWithin=1second --verbose --use-job-queue (T344428; with higher file limit)
  • 10:02 taavi: import kubernetes 1.23 packages for debian bookworm T284656
  • 09:53 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-jumbo1007.eqiad.wmnet with OS bullseye
  • 09:50 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 09:48 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 09:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1231 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P53053 and previous config saved to /var/cache/conftool/dbconfig/20231025-090648-arnaudb.json
  • 08:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1231 (re)pooling @ 90%: Maint over', diff saved to https://phabricator.wikimedia.org/P53052 and previous config saved to /var/cache/conftool/dbconfig/20231025-085143-arnaudb.json
  • 08:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db1231 (re)pooling @ 80%: Maint over', diff saved to https://phabricator.wikimedia.org/P53051 and previous config saved to /var/cache/conftool/dbconfig/20231025-083638-arnaudb.json
  • 08:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1231 (re)pooling @ 70%: Maint over', diff saved to https://phabricator.wikimedia.org/P53050 and previous config saved to /var/cache/conftool/dbconfig/20231025-082133-arnaudb.json
  • 08:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1231 (re)pooling @ 60%: Maint over', diff saved to https://phabricator.wikimedia.org/P53049 and previous config saved to /var/cache/conftool/dbconfig/20231025-080628-arnaudb.json
  • 07:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1231 (re)pooling @ 50%: Maint over', diff saved to https://phabricator.wikimedia.org/P53048 and previous config saved to /var/cache/conftool/dbconfig/20231025-075123-arnaudb.json
  • 07:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db1231 (re)pooling @ 40%: Maint over', diff saved to https://phabricator.wikimedia.org/P53047 and previous config saved to /var/cache/conftool/dbconfig/20231025-073618-arnaudb.json
  • 07:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1231 (re)pooling @ 30%: Maint over', diff saved to https://phabricator.wikimedia.org/P53046 and previous config saved to /var/cache/conftool/dbconfig/20231025-072113-arnaudb.json
  • 07:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1231 (re)pooling @ 20%: Maint over', diff saved to https://phabricator.wikimedia.org/P53045 and previous config saved to /var/cache/conftool/dbconfig/20231025-070608-arnaudb.json
  • 06:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1231 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P53044 and previous config saved to /var/cache/conftool/dbconfig/20231025-065103-arnaudb.json
  • 06:50 arnaudb: repooling db1231

2023-10-24

  • 21:58 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:58 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:09 sukhe: running authdns-update for CR 968354
  • 21:08 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns5004.wikimedia.org with OS bookworm
  • 21:06 jdrewniak@deploy2002: Finished scap: Backport for Disable Parsoid internal REST API everywhere except on Parsoid cluster (T334980) (duration: 12m 39s)
  • 21:00 jdrewniak@deploy2002: jdrewniak and cscott: Continuing with sync
  • 20:54 jdrewniak@deploy2002: jdrewniak and cscott: Backport for Disable Parsoid internal REST API everywhere except on Parsoid cluster (T334980) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:53 jdrewniak@deploy2002: Started scap: Backport for Disable Parsoid internal REST API everywhere except on Parsoid cluster (T334980)
  • 20:49 jdrewniak@deploy2002: Finished scap: Backport for Enable Vector readability survey on select wikis (T349232), Follow-up to 74b5834: Add language prefix to Readability survey (T349232), Follow-up to 74b5834: Add language prefix to Readability survey (T349232) (duration: 06m 57s)
  • 20:44 jdrewniak@deploy2002: jdrewniak: Continuing with sync
  • 20:44 jdrewniak@deploy2002: jdrewniak: Backport for Enable Vector readability survey on select wikis (T349232), Follow-up to 74b5834: Add language prefix to Readability survey (T349232), Follow-up to 74b5834: Add language prefix to Readability survey (T349232) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:42 jdrewniak@deploy2002: Started scap: Backport for Enable Vector readability survey on select wikis (T349232), Follow-up to 74b5834: Add language prefix to Readability survey (T349232), Follow-up to 74b5834: Add language prefix to Readability survey (T349232)
  • 20:24 jdrewniak@deploy2002: Finished scap: Backport for Update comment about EditAttemptStep instruments, CentralAuth: Clarify why we don't use second-level domain for some wikis (T257852), Remove unused VisualEditor config settings (T344757 T344759), [noop] Explain more thoroughly how the '-' prefix works (duration: 07m 21s)
  • 20:18 jdrewniak@deploy2002: tgr and matmarex and jdrewniak: Continuing with sync
  • 20:18 jdrewniak@deploy2002: tgr and matmarex and jdrewniak: Backport for Update comment about EditAttemptStep instruments, CentralAuth: Clarify why we don't use second-level domain for some wikis (T257852), Remove unused VisualEditor config settings (T344757 T344759), [noop] Explain more thoroughly how the '-' prefix works synced to the testservers (htt
  • 20:16 jdrewniak@deploy2002: Started scap: Backport for Update comment about EditAttemptStep instruments, CentralAuth: Clarify why we don't use second-level domain for some wikis (T257852), Remove unused VisualEditor config settings (T344757 T344759), [noop] Explain more thoroughly how the '-' prefix works
  • 20:14 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
  • 20:10 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
  • 19:57 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@c585842]: T346373: Update mjolnir to use python 3.10 (duration: 00m 28s)
  • 19:56 ebernhardson@deploy2002: Started deploy [airflow-dags/search@c585842]: T346373: Update mjolnir to use python 3.10
  • 19:49 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 19:47 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 19:47 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 19:45 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 19:45 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 19:43 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 19:43 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 19:23 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns5004.wikimedia.org with OS bookworm
  • 19:00 andrew@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 19:00 andrew@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 18:59 andrew@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 18:59 andrew@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 18:55 andrew@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 18:55 andrew@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 18:54 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 18:54 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns5003.wikimedia.org with OS bookworm
  • 18:50 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 18:50 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 18:50 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 18:48 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 18:48 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 18:48 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 18:47 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 18:47 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 18:47 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 18:42 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 18:42 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 18:42 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 18:42 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 18:41 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 18:41 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase2012.codfw.wmnet
  • 18:41 eevans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:41 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase2012.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
  • 18:41 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 18:39 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase2012.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
  • 18:39 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 18:38 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 18:37 eevans@cumin1001: START - Cookbook sre.dns.netbox
  • 18:31 eevans@cumin1001: START - Cookbook sre.hosts.decommission for hosts restbase2012.codfw.wmnet
  • 18:24 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 18:23 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 18:18 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 18:18 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 18:16 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 18:15 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 18:13 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 18:13 dancy@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.2 refs T348355
  • 18:13 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 18:03 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 18:00 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 17:50 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 17:49 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns5003.wikimedia.org with reason: host reimage
  • 17:46 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns5003.wikimedia.org with reason: host reimage
  • 17:41 ejegg: fundraising civicrm upgraded from 8e8ffec0 to 6cfae26a
  • 16:59 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns5003.wikimedia.org with OS bookworm
  • 16:46 xcollazo@deploy2002: Finished deploy [airflow-dags/analytics@cc56357]: Deploying latest DAGs to analytics Airflow instance (duration: 01m 55s)
  • 16:44 xcollazo@deploy2002: Started deploy [airflow-dags/analytics@cc56357]: Deploying latest DAGs to analytics Airflow instance
  • 15:48 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1100.eqiad.wmnet with OS bullseye
  • 15:32 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1100.eqiad.wmnet with reason: host reimage
  • 15:26 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1100.eqiad.wmnet with reason: host reimage
  • 15:22 godog: clean up overlapping blocks from thanos for instance 'cloud'
  • 15:11 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1100.eqiad.wmnet with OS bullseye
  • 15:10 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1100.eqiad.wmnet with OS bullseye
  • 14:59 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1100.eqiad.wmnet with OS bullseye
  • 14:58 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1100.eqiad.wmnet with OS bullseye
  • 14:57 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1016.eqiad.wmnet
  • 14:50 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1016.eqiad.wmnet
  • 14:48 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1100.eqiad.wmnet with OS bullseye
  • 14:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Adding db1227 depooled', diff saved to https://phabricator.wikimedia.org/P53041 and previous config saved to /var/cache/conftool/dbconfig/20231024-143204-arnaudb.json
  • 14:01 TheresNoTime: close backport window
  • 14:00 samtar@deploy2002: Finished scap: Backport for Fix typo (undefined event) (T349271) (duration: 09m 26s)
  • 13:55 samtar@deploy2002: samtar and cparle: Continuing with sync
  • 13:52 samtar@deploy2002: samtar and cparle: Backport for Fix typo (undefined event) (T349271) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:51 samtar@deploy2002: Started scap: Backport for Fix typo (undefined event) (T349271)
  • 13:43 samtar@deploy2002: Finished scap: Backport for Add stream config for iOS schema (T347122) (duration: 07m 52s)
  • 13:38 samtar@deploy2002: samtar and tsev: Continuing with sync
  • 13:37 samtar@deploy2002: samtar and tsev: Backport for Add stream config for iOS schema (T347122) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:36 samtar@deploy2002: Started scap: Backport for Add stream config for iOS schema (T347122)
  • 13:34 samtar@deploy2002: Finished scap: Backport for cirrus: add wgCirrusSearchUseEventBusBridge and enable it on testwiki (T325565) (duration: 06m 55s)
  • 13:31 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 13:30 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 13:30 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 13:30 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 13:30 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 13:29 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 13:28 samtar@deploy2002: samtar and dcausse: Continuing with sync
  • 13:28 samtar@deploy2002: samtar and dcausse: Backport for cirrus: add wgCirrusSearchUseEventBusBridge and enable it on testwiki (T325565) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:27 samtar@deploy2002: Started scap: Backport for cirrus: add wgCirrusSearchUseEventBusBridge and enable it on testwiki (T325565)
  • 13:25 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
  • 13:25 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
  • 13:24 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 13:24 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
  • 13:24 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
  • 13:23 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
  • 13:23 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync
  • 13:22 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: sync
  • 13:22 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 13:22 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 13:22 samtar@deploy2002: Finished scap: Backport for cirrus: add the mediawiki.cirrussearch.page_rerender.v1 stream (T325565) (duration: 07m 45s)
  • 13:17 samtar@deploy2002: samtar and dcausse: Continuing with sync
  • 13:15 samtar@deploy2002: samtar and dcausse: Backport for cirrus: add the mediawiki.cirrussearch.page_rerender.v1 stream (T325565) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:14 samtar@deploy2002: Started scap: Backport for cirrus: add the mediawiki.cirrussearch.page_rerender.v1 stream (T325565)
  • 13:10 samtar@deploy2002: Finished scap: Backport for Increase Lua memory limit to 100MB on Wiktionary only (T165935) (duration: 07m 51s)
  • 13:05 samtar@deploy2002: samtar and tstarling: Continuing with sync
  • 13:04 samtar@deploy2002: samtar and tstarling: Backport for Increase Lua memory limit to 100MB on Wiktionary only (T165935) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:03 samtar@deploy2002: Started scap: Backport for Increase Lua memory limit to 100MB on Wiktionary only (T165935)
  • 12:41 jbond: migrate idp_test to puppet7
  • 11:17 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:17 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:16 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:15 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:13 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 11:12 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 11:12 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 11:12 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 11:11 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 11:11 jiji@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 11:09 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 11:08 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 11:08 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 11:08 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 11:08 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 11:07 jiji@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 11:05 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 11:05 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 11:04 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 11:04 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 10:59 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 10:59 jiji@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 10:58 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 10:57 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 10:57 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 10:57 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 10:57 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 10:56 jiji@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 10:54 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 10:53 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 10:47 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 10:46 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 10:44 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 10:43 jiji@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 10:43 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on an-test-client1002.eqiad.wmnet with reason: Cold booting with ganeti to increase RAM
  • 10:42 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on an-test-client1002.eqiad.wmnet with reason: Cold booting with ganeti to increase RAM
  • 10:42 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 10:41 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:40 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 10:39 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 10:27 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/recommendation-api: apply
  • 10:27 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 10:26 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/recommendation-api: apply
  • 10:26 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 10:15 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: apply
  • 10:14 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/recommendation-api: apply
  • 10:10 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: apply
  • 10:10 jiji@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: apply
  • 10:08 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 10:07 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
  • 10:04 jnuche@deploy2002: Pruned MediaWiki: 1.41.0-wmf.30 (duration: 02m 08s)
  • 10:02 jnuche@deploy2002: Finished scap: testwikis wikis to 1.42.0-wmf.2 refs T348355 (duration: 25m 27s)
  • 09:49 kevinbazira@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 09:48 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 09:45 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 09:43 arnaudb@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P53039 and previous config saved to /var/cache/conftool/dbconfig/20231024-094329-arnaudb.json
  • 09:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 09:39 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 09:36 jnuche@deploy2002: Started scap: testwikis wikis to 1.42.0-wmf.2 refs T348355
  • 09:28 arnaudb@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 90%: Maint over', diff saved to https://phabricator.wikimedia.org/P53038 and previous config saved to /var/cache/conftool/dbconfig/20231024-092824-arnaudb.json
  • 09:16 vgutierrez: upload golang-github-florianl-go-tc to apt.wm.o (bookworm) - T348837
  • 09:13 arnaudb@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 80%: Maint over', diff saved to https://phabricator.wikimedia.org/P53037 and previous config saved to /var/cache/conftool/dbconfig/20231024-091319-arnaudb.json
  • 09:11 taavi: restart ferm on deploy1002 T349587
  • 09:04 taavi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host deploy1002
  • 09:03 taavi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host deploy1002
  • 08:58 arnaudb@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 70%: Maint over', diff saved to https://phabricator.wikimedia.org/P53036 and previous config saved to /var/cache/conftool/dbconfig/20231024-085815-arnaudb.json
  • 08:43 arnaudb@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 60%: Maint over', diff saved to https://phabricator.wikimedia.org/P53035 and previous config saved to /var/cache/conftool/dbconfig/20231024-084310-arnaudb.json
  • 08:33 marostegui@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1127.eqiad.wmnet onto db1227.eqiad.wmnet
  • 08:28 arnaudb@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 50%: Maint over', diff saved to https://phabricator.wikimedia.org/P53034 and previous config saved to /var/cache/conftool/dbconfig/20231024-082805-arnaudb.json
  • 08:13 arnaudb@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 40%: Maint over', diff saved to https://phabricator.wikimedia.org/P53033 and previous config saved to /var/cache/conftool/dbconfig/20231024-081300-arnaudb.json
  • 07:57 arnaudb@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 30%: Maint over', diff saved to https://phabricator.wikimedia.org/P53032 and previous config saved to /var/cache/conftool/dbconfig/20231024-075755-arnaudb.json
  • 07:42 arnaudb@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 20%: Maint over', diff saved to https://phabricator.wikimedia.org/P53031 and previous config saved to /var/cache/conftool/dbconfig/20231024-074250-arnaudb.json
  • 07:27 arnaudb@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P53030 and previous config saved to /var/cache/conftool/dbconfig/20231024-072745-arnaudb.json
  • 07:27 arnaudb: repool db2109
  • 07:08 marostegui@cumin1001: START - Cookbook sre.mysql.clone of db1127.eqiad.wmnet onto db1227.eqiad.wmnet
  • 06:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: provisionning db1227 - T344036
  • 06:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: provisionning db1227 - T344036
  • 06:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: provisionning db1227 - T344036
  • 06:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: provisionning db1227 - T344036
  • 06:54 godog: +50G to prometheus/analytics in eqiad
  • 06:45 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:45 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove ipv6 from pc1015 - marostegui@cumin1001"
  • 06:44 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove ipv6 from pc1015 - marostegui@cumin1001"
  • 06:42 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 06:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e8-eqiad
  • 06:33 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-e8-eqiad
  • 06:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15435
  • 06:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 15435
  • 05:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 39180
  • 05:20 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 39180
  • 03:51 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.42.0-wmf.2 refs T348355 (duration: 47m 53s)
  • 03:03 mwpresync@deploy2002: Started scap: testwikis wikis to 1.42.0-wmf.2 refs T348355

2023-10-23

  • 23:05 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 23:05 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 22:58 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 22:58 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 22:55 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 22:54 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 21:37 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4004.wikimedia.org with OS bookworm
  • 21:05 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 21:04 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 20:44 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage
  • 20:41 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage
  • 20:18 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS bookworm
  • 19:50 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 19:49 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:45 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns6002.wikimedia.org with OS bookworm
  • 18:33 bking@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:32 bking@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:31 herron: sretest1001:~/tmp/backfill$ promtool tsdb create-blocks-from rules --start 1672531200 --end 1698080718 --url http://prometheus.svc.eqiad.wmnet/ops/ logstash-requests.yaml T349521
  • 18:19 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:18 bking@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:14 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:13 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:12 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns6002.wikimedia.org with reason: host reimage
  • 18:09 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns6002.wikimedia.org with reason: host reimage
  • 18:00 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 18:00 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 17:59 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 17:59 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 17:57 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 17:56 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 17:44 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns6002.wikimedia.org with OS bookworm
  • 17:41 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 17:40 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 17:26 ejegg: fundraising python tools upgraded from e56ae8ae to 9e84c689
  • 17:25 ejegg: standalone (IPN listener) SmashPig upgraded from e27dfbce to c5b12dc3
  • 16:57 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:56 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:47 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:46 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:09 marostegui@cumin1001: dbctl commit (dc=all): 'New host being setup', diff saved to https://phabricator.wikimedia.org/P53029 and previous config saved to /var/cache/conftool/dbconfig/20231023-160926-marostegui.json
  • 16:08 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:08 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 15:52 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 15:51 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 15:05 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:05 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:56 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:55 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:55 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:55 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:55 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:55 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:55 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:54 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:53 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1021']
  • 14:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Depool db1127', diff saved to https://phabricator.wikimedia.org/P53028 and previous config saved to /var/cache/conftool/dbconfig/20231023-145101-arnaudb.json
  • 14:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Provision db1227 depooled as a candidate master for s7', diff saved to https://phabricator.wikimedia.org/P53027 and previous config saved to /var/cache/conftool/dbconfig/20231023-145011-arnaudb.json
  • 14:48 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 14:48 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: provisionning db1227 - T344036
  • 14:48 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: provisionning db1227 - T344036
  • 14:47 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: provisionning db1227 - T344036
  • 14:47 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: provisionning db1227 - T344036
  • 14:46 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1021']
  • 14:42 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1021']
  • 14:41 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1021']
  • 14:30 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 14:26 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 14:26 jayme: switched mw-api-int (mw-on-k8s) to certmanager certificates - T300033
  • 14:26 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 14:25 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 14:24 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 14:14 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 14:14 jayme: switched mw-api-ext (mw-on-k8s) to certmanager certificates - T300033
  • 14:13 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 14:13 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 14:12 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 14:06 jayme@deploy2002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 14:06 jayme@deploy2002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 14:06 jayme@deploy2002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 14:06 jayme@deploy2002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 14:06 jayme: switched mw-jobrunner (mw-on-k8s) to certmanager certificates - T300033
  • 14:05 jayme@deploy2002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 14:05 jayme@deploy2002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 14:05 jayme@deploy2002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 14:05 jayme@deploy2002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 13:53 urbanecm@deploy2002: Finished scap: Backport for Stop writing to $wgCentralAuthCookieDomain in 'EnterMobileMode' hook (duration: 15m 50s)
  • 13:52 moritzm: installing batik security updates
  • 13:48 urbanecm@deploy2002: urbanecm and matmarex: Continuing with sync
  • 13:38 urbanecm@deploy2002: urbanecm and matmarex: Backport for Stop writing to $wgCentralAuthCookieDomain in 'EnterMobileMode' hook synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:37 urbanecm@deploy2002: Started scap: Backport for Stop writing to $wgCentralAuthCookieDomain in 'EnterMobileMode' hook
  • 13:37 urbanecm@deploy2002: Finished scap: Backport for New stream for Android Patroller tasks feature (T348816) (duration: 06m 54s)
  • 13:31 urbanecm@deploy2002: urbanecm and sharvaniharan: Continuing with sync
  • 13:31 urbanecm@deploy2002: urbanecm and sharvaniharan: Backport for New stream for Android Patroller tasks feature (T348816) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:30 urbanecm@deploy2002: Started scap: Backport for New stream for Android Patroller tasks feature (T348816)
  • 13:29 urbanecm@deploy2002: Finished scap: Backport for Remove 'currentProto'/'finalProto'/'proto' business (T348852), Remove unused $wgIncludeLegacyJavaScript, Remove $wgApiFrameOptions override for enwiki and zhwiki (T131183) (duration: 11m 56s)
  • 13:23 urbanecm@deploy2002: matmarex and urbanecm: Continuing with sync
  • 13:18 urbanecm@deploy2002: matmarex and urbanecm: Backport for Remove 'currentProto'/'finalProto'/'proto' business (T348852), Remove unused $wgIncludeLegacyJavaScript, Remove $wgApiFrameOptions override for enwiki and zhwiki (T131183) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:17 urbanecm@deploy2002: Started scap: Backport for Remove 'currentProto'/'finalProto'/'proto' business (T348852), Remove unused $wgIncludeLegacyJavaScript, Remove $wgApiFrameOptions override for enwiki and zhwiki (T131183)
  • 13:16 urbanecm@deploy2002: Finished scap: Backport for wikidatawiki: Switch property for determining Lexeme language code (T348923) (duration: 12m 50s)
  • 13:11 urbanecm@deploy2002: migr and urbanecm: Continuing with sync
  • 13:05 urbanecm@deploy2002: migr and urbanecm: Backport for wikidatawiki: Switch property for determining Lexeme language code (T348923) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:04 moritzm: installing libxpm security updates on buster
  • 13:04 urbanecm@deploy2002: Started scap: Backport for wikidatawiki: Switch property for determining Lexeme language code (T348923)
  • 12:41 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 12:40 jayme: switched mw-web (mw-on-k8s) to certmanager certificates - T300033
  • 12:40 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 12:40 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 12:39 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 12:33 moritzm: installing libx11 security updates
  • 12:16 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1131.eqiad.wmnet onto db1231.eqiad.wmnet
  • 11:49 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@054e07d] (releasing): (no justification provided) (duration: 00m 42s)
  • 11:49 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@054e07d] (releasing): (no justification provided)
  • 11:49 moritzm: added Balthazar to pwstore
  • 11:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Server not yet in productin use
  • 11:33 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Server not yet in productin use
  • 10:51 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafka-jumbo1001.eqiad.wmnet
  • 10:51 brouberol@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:51 brouberol@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1001"
  • 10:51 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1131.eqiad.wmnet onto db1231.eqiad.wmnet
  • 10:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Depool db1131 T344036', diff saved to https://phabricator.wikimedia.org/P53025 and previous config saved to /var/cache/conftool/dbconfig/20231023-105036-arnaudb.json
  • 10:50 brouberol@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1001"
  • 10:41 jayme: switched mw-debug (mw-on-k8s) to certmanager certificates - T300033
  • 10:40 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-client1002.eqiad.wmnet
  • 10:40 brouberol@cumin1001: START - Cookbook sre.dns.netbox
  • 10:37 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: provisionning - T344036
  • 10:37 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: provisionning - T344036
  • 10:37 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: provisionning - T344036
  • 10:37 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: provisionning - T344036
  • 10:36 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-client1002.eqiad.wmnet
  • 10:35 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 10:34 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:34 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 10:34 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 10:32 brouberol@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafka-jumbo1001.eqiad.wmnet
  • 10:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Provision db1231 depooled as a candidate master for s6', diff saved to https://phabricator.wikimedia.org/P53024 and previous config saved to /var/cache/conftool/dbconfig/20231023-103202-arnaudb.json
  • 10:31 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 10:29 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:29 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: test - jbond@cumin1001"
  • 10:28 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: test - jbond@cumin1001"
  • 10:26 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 10:26 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:26 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: test - jbond@cumin1001"
  • 10:25 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: test - jbond@cumin1001"
  • 10:23 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 10:20 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:19 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 10:13 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafka-jumbo1002.eqiad.wmnet
  • 10:13 brouberol@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:13 brouberol@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1001"
  • 10:12 brouberol@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1001"
  • 10:11 taavi: reprepro: drop thirdparty/kubeadm-k8s-1-22 component
  • 10:10 brouberol@cumin1001: START - Cookbook sre.dns.netbox
  • 10:04 brouberol@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafka-jumbo1002.eqiad.wmnet
  • 10:02 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 10:02 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 09:57 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafka-jumbo1003.eqiad.wmnet
  • 09:57 brouberol@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:55 brouberol@cumin1001: START - Cookbook sre.dns.netbox
  • 09:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add A/PTR for lsw1-f8/ssw links - ayounsi@cumin1001"
  • 09:54 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add A/PTR for lsw1-f8/ssw links - ayounsi@cumin1001"
  • 09:51 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 09:50 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
  • 09:50 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 09:49 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/media-analytics: apply
  • 09:49 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
  • 09:48 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
  • 09:48 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
  • 09:47 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/media-analytics: apply
  • 09:47 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 09:37 brouberol@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: kafka-jumbo1004.eqiad.wmnet
  • 09:37 brouberol@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: kafka-jumbo1004.eqiad.wmnet
  • 09:36 brouberol@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1001 - brouberol@cumin1001 - T336044"
  • 09:35 brouberol@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1001 - brouberol@cumin1001 - T336044"
  • 09:32 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 09:32 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 09:31 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 09:31 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 09:28 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:21 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 09:21 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 09:19 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 09:18 elukey@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 09:18 elukey@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 09:17 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 09:13 brouberol@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1001"
  • 09:00 brouberol@cumin1001: START - Cookbook sre.dns.netbox
  • 08:55 brouberol@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafka-jumbo1004.eqiad.wmnet
  • 08:52 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafka-jumbo1005.eqiad.wmnet
  • 08:52 brouberol@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:52 brouberol@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1001"
  • 08:51 brouberol@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1001"
  • 08:38 brouberol@cumin1001: START - Cookbook sre.dns.netbox
  • 08:33 brouberol@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafka-jumbo1005.eqiad.wmnet
  • 08:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudmetrics1003.eqiad.wmnet
  • 08:24 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafka-jumbo1006.eqiad.wmnet
  • 08:24 brouberol@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:24 brouberol@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1006.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1001"
  • 08:21 brouberol@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1006.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1001"
  • 08:19 brouberol@cumin1001: START - Cookbook sre.dns.netbox
  • 08:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudmetrics1003.eqiad.wmnet
  • 08:14 brouberol@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafka-jumbo1006.eqiad.wmnet
  • 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudmetrics1004.eqiad.wmnet
  • 08:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudmetrics1004.eqiad.wmnet
  • 08:01 moritzm: installing Linux kernel updates for Buster 5.10 backport
  • 07:42 taavi: mwscript purgeList.php enwiki <<< "https://en.wikipedia.org/static/images/project-logos/knwiktionary.png" (and for 1.5x and 2x variants)
  • 07:36 hashar: Upgrading CI Jenkins # T349282
  • 07:26 taavi@deploy2002: Finished scap: Backport for knwiktionary: update logo (T349036), dewiktionary: add tagline (T348978), hiwikisource: Adjust width-height ratio of logo to fix display issue (T310961) (duration: 16m 59s)
  • 07:22 elukey@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 07:22 elukey@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 07:21 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 07:20 taavi@deploy2002: taavi and anzx: Continuing with sync
  • 07:17 taavi@deploy2002: taavi and anzx: Backport for knwiktionary: update logo (T349036), dewiktionary: add tagline (T348978), hiwikisource: Adjust width-height ratio of logo to fix display issue (T310961) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:09 taavi@deploy2002: Started scap: Backport for knwiktionary: update logo (T349036), dewiktionary: add tagline (T348978), hiwikisource: Adjust width-height ratio of logo to fix display issue (T310961)

2023-10-21

  • 00:10 krinkle@deploy2002: Synchronized wmf-config/logging.php: (no justification provided) (duration: 06m 03s)

2023-10-20

  • 22:47 cstone: civicrm upgraded from ca081c11 to 8e8ffec0
  • 21:39 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:38 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:33 ejegg: fundraising civicrm upgraded from 1263a91b to ca081c11
  • 21:06 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 21:06 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 20:21 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 20:20 ejegg: fundraising civicrm upgraded from e57425a9 to 1263a91b
  • 20:20 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 20:19 bvibber: brion running requeueTranscodes.php on mwmaint2002 for audio and video transcode backfill, will use some jobqueue cpu but should be nicely throttled
  • 20:05 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:05 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:46 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:44 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:35 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:35 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:08 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 19:07 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:07 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:06 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 19:06 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 19:05 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:05 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 19:05 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:57 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:56 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:43 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:42 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:42 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:41 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:36 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:36 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:59 topranks: Disabling BGP from asw1-by27-esams to cr1-esams to move BGP peers to new group T349125
  • 15:55 topranks: Disabling BGP from asw1-by27-esams to cr2-esams to move BGP peers to new group T349125
  • 15:47 topranks: Disabling BGP from asw1-bw27-esams to cr2-esams to move BGP peers to new group T349125
  • 15:39 topranks: Disabling BGP from asw1-bw27-esams to cr1-esams to move BGP peers to new group T349125
  • 15:37 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@fd88cfa]: Update kafka hosts mjolnir communicates with (duration: 00m 27s)
  • 15:36 ebernhardson@deploy2002: Started deploy [airflow-dags/search@fd88cfa]: Update kafka hosts mjolnir communicates with
  • 15:26 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 12 hosts with reason: changing bgp config on esams switches
  • 15:25 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 12 hosts with reason: changing bgp config on esams switches
  • 15:18 topranks: Disabling BGP from asw1-b13-drmrs to cr1-drmrs to move BGP peers to new group T349125
  • 15:16 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 15:15 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 15:13 topranks: Disabling BGP from asw1-b13-drmrs to cr2-drmrs to move BGP peers to new group T349125
  • 15:09 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 12 hosts with reason: changing bgp config on drmrs switches
  • 15:09 topranks: Disabling BGP from asw1-b12-drmrs to cr2-drmrs to move BGP peers to new group T349125
  • 15:08 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 12 hosts with reason: changing bgp config on drmrs switches
  • 14:57 topranks: Disabling BGP from asw1-b12-drmrs to cr1-drmrs to move BGP peers to new group T349125
  • 14:49 ejegg: payments-wiki upgraded from 87cda414 to 7575f0e6
  • 14:33 topranks: Disabling BGP from ssw1-f1-eqiad to cr2-eqiad to move BGP peers to new group T349125
  • away: fundraising civicrm upgraded from f11ad380 to e57425a9
  • 13:19 topranks: Disabling BGP from ssw1-e1-eqiad to cr1-eqiad to move BGP peers to new group T349125
  • 12:11 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be1003.eqiad.wmnet with OS bullseye
  • 11:55 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
  • 11:52 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
  • 11:42 jynus: refactoring tables @ db1164[bbackups] T349360
  • 11:37 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1003.eqiad.wmnet with OS bullseye
  • 11:36 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host moss-be1003.eqiad.wmnet with OS bullseye
  • 10:46 kevinbazira@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 10:46 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 10:39 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 10:19 godog: powercycle titan1001
  • 10:13 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:13 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add A/PTR for lsw1-f8/ssw links - ayounsi@cumin1001"
  • 10:12 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add A/PTR for lsw1-f8/ssw links - ayounsi@cumin1001"
  • 10:04 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 09:58 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 08:45 brouberol@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts kafka-jumbo1006.eqiad.wmnet
  • 08:43 brouberol@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafka-jumbo1006.eqiad.wmnet
  • 07:43 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 07:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add A/PTR for lsw1-f8/ssw links - ayounsi@cumin1001"
  • 07:26 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add A/PTR for lsw1-f8/ssw links - ayounsi@cumin1001"
  • 07:25 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on etherpad1003.eqiad.wmnet with reason: Reboot to use new CPU and memory config
  • 07:24 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on etherpad1003.eqiad.wmnet with reason: Reboot to use new CPU and memory config
  • 07:22 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 07:21 jelto: increase etherpad1003 CPU and memory (1CPU,1GB -> 2CPU,2GB) - T348386
  • 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1119 from dbctl T349272', diff saved to https://phabricator.wikimedia.org/P53021 and previous config saved to /var/cache/conftool/dbconfig/20231020-061822-marostegui.json
  • 03:15 tstarling@deploy2002: Synchronized wmf-config/InitialiseSettings.php: Enable source maps everywhere T47514 (duration: 06m 26s)
  • 03:03 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1003.eqiad.wmnet with OS bullseye

2023-10-19

  • 22:37 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host sretest2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:36 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:32 hmonroy@deploy2002: Finished scap: Backport for PhonosButton: use text() instead of append() (T349312) (duration: 06m 48s)
  • 21:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host moss-be1003.eqiad.wmnet with OS bullseye
  • 21:27 hmonroy@deploy2002: hmonroy: Continuing with sync
  • 21:27 hmonroy@deploy2002: hmonroy: Backport for PhonosButton: use text() instead of append() (T349312) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:25 hmonroy@deploy2002: Started scap: Backport for PhonosButton: use text() instead of append() (T349312)
  • 21:12 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1003.eqiad.wmnet with OS bullseye
  • 20:55 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host moss-be1003.eqiad.wmnet with OS bullseye
  • 20:39 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1003.eqiad.wmnet with OS bullseye
  • 20:02 brennen: utc late backport window: no patches
  • 18:22 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:22 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:09 brennen@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.42.0-wmf.1 refs T348354
  • 17:33 urandom: Decommissioning Cassandra, restbase1018-{a,b,c} — T328490
  • 16:50 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:49 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:17 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 16:16 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 16:16 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 16:15 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 16:15 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:14 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 15:42 jgiannelos@deploy2002: Finished deploy [restbase/deploy@a311c5d]: (no justification provided) (duration: 00m 54s)
  • 15:41 jgiannelos@deploy2002: Started deploy [restbase/deploy@a311c5d]: (no justification provided)
  • 15:30 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:30 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:25 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host sretest2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:15 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on kafka-jumbo1006.eqiad.wmnet with reason: host is being decommissioned
  • 15:15 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on kafka-jumbo1006.eqiad.wmnet with reason: host is being decommissioned
  • 15:15 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on kafka-jumbo1005.eqiad.wmnet with reason: host is being decommissioned
  • 15:14 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on kafka-jumbo1005.eqiad.wmnet with reason: host is being decommissioned
  • 15:14 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on kafka-jumbo1004.eqiad.wmnet with reason: host is being decommissioned
  • 15:14 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on kafka-jumbo1004.eqiad.wmnet with reason: host is being decommissioned
  • 15:14 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on kafka-jumbo1003.eqiad.wmnet with reason: host is being decommissioned
  • 15:13 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on kafka-jumbo1003.eqiad.wmnet with reason: host is being decommissioned
  • 15:13 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on kafka-jumbo1002.eqiad.wmnet with reason: host is being decommissioned
  • 15:13 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on kafka-jumbo1002.eqiad.wmnet with reason: host is being decommissioned
  • 15:13 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on kafka-jumbo1001.eqiad.wmnet with reason: host is being decommissioned
  • 15:13 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on kafka-jumbo1001.eqiad.wmnet with reason: host is being decommissioned
  • 15:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudnet1008-dev']
  • 15:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudnet1007-dev']
  • 15:09 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudnet1007-dev']
  • 15:09 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudnet1008-dev']
  • 15:08 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudnet1008-dev.eqiad.wmnet']
  • 15:08 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudnet1007-dev.eqiad.wmnet']
  • 15:08 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet']
  • 15:08 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet']
  • 15:08 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet']
  • 15:08 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet']
  • 15:07 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol1009-dev.eqiad.wmnet']
  • 15:06 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet']
  • 15:05 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet']
  • 15:04 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet']
  • 15:04 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet']
  • 14:59 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudnet1007-dev.eqiad.wmnet']
  • 14:59 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudnet1008-dev.eqiad.wmnet']
  • 14:59 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet']
  • 14:58 elukey: powercycle titan1001
  • 14:58 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1009-dev.eqiad.wmnet']
  • 14:57 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet']
  • 14:56 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1010-dev']
  • 14:56 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1010-dev']
  • 14:55 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1009-dev']
  • 14:55 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1009-dev']
  • 14:55 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1010-dev']
  • 14:55 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1009-dev']
  • 14:55 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1009-dev']
  • 14:55 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1010-dev']
  • 14:54 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudnet1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:51 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol1009-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:51 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol1010-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:51 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:44 elukey: powercycle titan1001
  • 14:39 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:38 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:35 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 14:34 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudnet1007-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:32 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudnet1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:31 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1009-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:31 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1010-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:31 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:29 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:28 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 14:21 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:17 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcontrol1010-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:17 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcontrol1009-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:17 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:14 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1010-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:14 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:12 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcontrol1010-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:12 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcontrol1009-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:05 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudnet1007-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:04 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1010-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:03 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1009-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:03 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:01 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:01 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudcontrol100[8-10]-dev cloudnet100[7-8]-dev - jclark@cumin1001"
  • 14:00 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudcontrol100[8-10]-dev cloudnet100[7-8]-dev - jclark@cumin1001"
  • 13:58 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 13:48 wmde-fisch@deploy2002: Finished scap: Backport for Revert "Revert "Workaround to center search terms label"" (T252346) (duration: 07m 50s)
  • 13:43 wmde-fisch@deploy2002: wmde-fisch: Continuing with sync
  • 13:42 wmde-fisch@deploy2002: wmde-fisch: Backport for Revert "Revert "Workaround to center search terms label"" (T252346) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:41 wmde-fisch@deploy2002: Started scap: Backport for Revert "Revert "Workaround to center search terms label"" (T252346)
  • 13:00 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:00 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: noop - volans@cumin1001"
  • 12:59 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: noop - volans@cumin1001"
  • 12:52 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 12:50 volans@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 12:50 volans@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 12:50 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 12:50 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 11:47 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 11:46 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@6f09297] (releasing): (no justification provided) (duration: 01m 08s)
  • 11:44 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@6f09297] (releasing): (no justification provided)
  • 11:30 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 08:36 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 07:33 volans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
  • 07:20 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on db2109.codfw.wmnet with reason: db2109 downtime while repooling
  • 07:20 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on db2109.codfw.wmnet with reason: db2109 downtime while repooling
  • 07:17 tgr: UTC morning deploys done
  • 07:16 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 07:13 volans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 06:57 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 06:34 volans: enabled distributed locking support in spicerack/cookbooks T341973
  • 06:32 volans@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest1001.eqiad.wmnet
  • 06:32 volans@cumin2002: START - Cookbook sre.hosts.dhcp for host sretest1001.eqiad.wmnet
  • 06:31 volans@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest1001.eqiad.wmnet
  • 06:31 volans@cumin2002: START - Cookbook sre.hosts.dhcp for host sretest1001.eqiad.wmnet
  • 05:14 tchin@deploy2002: Finished deploy [airflow-dags/analytics@60950f6]: Deploying airflow [data-engineering/airflow-dags@60950f6b] (duration: 01m 12s)
  • 05:12 tchin@deploy2002: Started deploy [airflow-dags/analytics@60950f6]: Deploying airflow [data-engineering/airflow-dags@60950f6b]

2023-10-18

  • 23:58 eileen: civicrm upgraded from 4a5634ed to f11ad380
  • 22:12 eileen: civicrm upgraded from 52202980 to 4a5634ed
  • 21:58 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:54 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host sretest2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:44 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:35 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host sretest2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:23 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1229.eqiad.wmnet with OS bullseye
  • 21:23 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 21:16 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 21:08 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:08 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:02 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1229.eqiad.wmnet with reason: host reimage
  • 20:59 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1229.eqiad.wmnet with reason: host reimage
  • 20:46 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
  • 20:46 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1229.eqiad.wmnet with OS bullseye
  • 20:44 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
  • 20:43 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1229.eqiad.wmnet with OS bullseye
  • 20:43 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
  • 20:42 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1229.eqiad.wmnet with OS bullseye
  • 19:40 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
  • 19:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1229.eqiad.wmnet with OS bullseye
  • 19:33 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
  • 19:30 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1229.eqiad.wmnet with OS bullseye
  • 19:25 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:25 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:16 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns6001.wikimedia.org with OS bookworm
  • 19:02 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
  • 19:00 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1110.eqiad.wmnet with OS bullseye
  • 19:00 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 19:00 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1104.eqiad.wmnet with OS bullseye
  • 19:00 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 18:45 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns6001.wikimedia.org with reason: host reimage
  • 18:41 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns6001.wikimedia.org with reason: host reimage
  • 18:36 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:36 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:35 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:35 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:33 brennen@deploy2002: Synchronized php: group1 wikis to 1.42.0-wmf.1 refs T348354 (duration: 05m 40s)
  • 18:28 brennen@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.42.0-wmf.1 refs T348354
  • 18:20 brennen: train 1.42.0-wmf.1 (T348354): logs clean and no blockers, rolling to group1
  • 18:17 brennen@deploy2002: Finished scap: Backport for Fix Typo in OS Dark Mode field (T346106) (duration: 13m 46s)
  • 18:17 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns6001.wikimedia.org with OS bookworm
  • 18:12 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 18:12 brennen@deploy2002: brennen and jdlrobson: Continuing with sync
  • 18:05 brennen@deploy2002: brennen and jdlrobson: Backport for Fix Typo in OS Dark Mode field (T346106) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 18:03 brennen@deploy2002: Started scap: Backport for Fix Typo in OS Dark Mode field (T346106)
  • 17:55 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1110.eqiad.wmnet with reason: host reimage
  • 17:52 tchin@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:52 tchin@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:51 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1110.eqiad.wmnet with reason: host reimage
  • 17:47 tchin@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:46 tchin@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:44 sukhe: running authdns-update for CR 966573
  • 17:43 tchin@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:43 tchin@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:42 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 17:34 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1110.eqiad.wmnet with OS bullseye
  • 17:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1110.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cp1110.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:29 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1110
  • 17:28 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1110
  • 17:27 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:26 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 17:25 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1104.eqiad.wmnet with reason: host reimage
  • 17:24 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1110']
  • 17:23 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1110']
  • 17:22 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1104.eqiad.wmnet with reason: host reimage
  • 17:13 XioNoX: restart turnilo to pickup UI change
  • 17:12 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1110.eqiad.wmnet with OS bullseye
  • 17:07 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
  • 17:07 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
  • 17:05 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1104.eqiad.wmnet with OS bullseye
  • 17:04 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1104']
  • 17:04 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1104']
  • 17:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1100.eqiad.wmnet with OS bullseye
  • 17:04 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 17:03 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 17:01 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1101.eqiad.wmnet with OS bullseye
  • 17:01 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 17:00 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:56 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1102.eqiad.wmnet with OS bullseye
  • 16:56 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:54 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:46 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1100.eqiad.wmnet with reason: host reimage
  • 16:43 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1101.eqiad.wmnet with reason: host reimage
  • 16:40 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1102.eqiad.wmnet with reason: host reimage
  • 16:40 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1101.eqiad.wmnet with reason: host reimage
  • 16:39 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1100.eqiad.wmnet with reason: host reimage
  • 16:37 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1102.eqiad.wmnet with reason: host reimage
  • 16:33 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1010.eqiad.wmnet with OS bullseye
  • 16:30 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:30 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:29 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 16:28 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1103.eqiad.wmnet with OS bullseye
  • 16:28 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:28 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 16:28 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 16:26 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 16:25 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1102.eqiad.wmnet with OS bullseye
  • 16:24 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:24 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1102']
  • 16:23 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1101.eqiad.wmnet with OS bullseye
  • 16:22 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1100.eqiad.wmnet with OS bullseye
  • 16:22 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1100']
  • 16:20 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1100']
  • 16:20 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1101']
  • 16:20 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1101']
  • 16:19 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1110.eqiad.wmnet with OS bullseye
  • 16:19 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1110']
  • 16:18 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1110']
  • 16:18 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 16:18 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
  • 16:18 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:18 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cp1102 - jclark@cumin1001"
  • 16:17 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 16:17 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 16:17 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cp1102 - jclark@cumin1001"
  • 16:17 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 16:16 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:15 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:15 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 16:14 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
  • 16:14 jclark@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cp1110']
  • 16:14 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1110']
  • 16:14 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1102
  • 16:13 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1102
  • 16:11 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1010.eqiad.wmnet with reason: host reimage
  • 16:10 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1110']
  • 16:08 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1110']
  • 16:08 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1103.eqiad.wmnet with reason: host reimage
  • 16:08 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1010.eqiad.wmnet with reason: host reimage
  • 16:08 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1102']
  • 16:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
  • 16:07 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1102
  • 16:07 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1105.eqiad.wmnet with OS bullseye
  • 16:07 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:06 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:06 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1102
  • 16:05 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1103.eqiad.wmnet with reason: host reimage
  • 16:05 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1102']
  • 16:04 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
  • 16:02 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1106.eqiad.wmnet with OS bullseye
  • 16:02 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:00 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 15:57 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1110.eqiad.wmnet with OS bullseye
  • 15:53 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:52 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:51 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1102']
  • 15:51 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
  • 15:50 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1107.eqiad.wmnet with OS bullseye
  • 15:50 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 15:50 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
  • 15:50 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1010.eqiad.wmnet with OS bullseye
  • 15:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1104']
  • 15:49 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 15:49 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1105.eqiad.wmnet with reason: host reimage
  • 15:49 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1104']
  • 15:47 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1010.eqiad.wmnet with OS bullseye
  • 15:46 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1105.eqiad.wmnet with reason: host reimage
  • 15:46 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1111.eqiad.wmnet with OS bullseye
  • 15:46 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 15:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1106.eqiad.wmnet with reason: host reimage
  • 15:43 inflatador: bking@deploy2002 destroy dse-k8s-services instance of rdf-streaming-updater T349095
  • 15:40 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1106.eqiad.wmnet with reason: host reimage
  • 15:40 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 15:32 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1107.eqiad.wmnet with reason: host reimage
  • 15:29 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1105.eqiad.wmnet with OS bullseye
  • 15:29 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1105']
  • 15:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1105']
  • 15:28 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:28 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:28 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1107.eqiad.wmnet with reason: host reimage
  • 15:26 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1106.eqiad.wmnet with OS bullseye
  • 15:25 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1106']
  • 15:23 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1111.eqiad.wmnet with reason: host reimage
  • 15:20 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1111.eqiad.wmnet with reason: host reimage
  • 15:19 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1106']
  • 15:13 dancy@deploy2002: Finished deploy [releng/jenkins-deploy@2cf7af2] (releasing): (no justification provided) (duration: 00m 44s)
  • 15:12 dancy@deploy2002: Started deploy [releng/jenkins-deploy@2cf7af2] (releasing): (no justification provided)
  • 15:10 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1107.eqiad.wmnet with OS bullseye
  • 15:09 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1107']
  • 15:09 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1107']
  • 15:07 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1114.eqiad.wmnet with OS bullseye
  • 15:07 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 15:06 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 15:04 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1110.eqiad.wmnet with OS bullseye
  • 15:03 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1110']
  • 15:03 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1111.eqiad.wmnet with OS bullseye
  • 15:02 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1111']
  • 15:02 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1010.eqiad.wmnet with OS bullseye
  • 15:02 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1111']
  • 15:01 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1111']
  • 15:01 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1111']
  • 15:01 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:00 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 14:59 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1111']
  • 14:59 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aqs1010.eqiad.wmnet with OS bullseye
  • 14:59 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1111']
  • 14:59 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1111']
  • 14:59 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1111']
  • 14:59 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1111
  • 14:58 elukey: powercycle titan1001 (no mgmt console / tty available, no host metrics, no ssh)
  • 14:57 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1111
  • 14:57 volans@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest1001.eqiad.wmnet
  • 14:57 volans@cumin2002: START - Cookbook sre.hosts.dhcp for host sretest1001.eqiad.wmnet
  • 14:57 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1111']
  • 14:57 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1111']
  • 14:56 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1111']
  • 14:56 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1111']
  • 14:56 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1110']
  • 14:51 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1108.eqiad.wmnet with OS bullseye
  • 14:51 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 14:49 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1114.eqiad.wmnet with reason: host reimage
  • 14:46 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1114.eqiad.wmnet with reason: host reimage
  • 14:44 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1010.eqiad.wmnet with OS bullseye
  • 14:40 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 14:31 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1114.eqiad.wmnet with OS bullseye
  • 14:25 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1114']
  • 14:24 ejegg: fundraising civicrm upgraded from d8fe92e3 to 52202980
  • 14:23 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1108.eqiad.wmnet with reason: host reimage
  • 14:20 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1108.eqiad.wmnet with reason: host reimage
  • 14:18 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1114']
  • 14:03 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1108.eqiad.wmnet with OS bullseye
  • 13:58 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
  • 13:23 volans@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest1001.eqiad.wmnet
  • 13:23 volans@cumin2002: START - Cookbook sre.hosts.dhcp for host sretest1001.eqiad.wmnet
  • 13:14 volans: uploaded spicerack_8.0.2 to apt.wikimedia.org bullseye-wikimedia
  • 13:10 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 13:07 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 13:06 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 13:06 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 13:05 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 13:05 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 13:04 sukhe: running authdns-update for CR 966243
  • 13:04 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 13:04 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 13:03 arnaudb@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P53008 and previous config saved to /var/cache/conftool/dbconfig/20231018-130343-arnaudb.json
  • 13:03 arnaudb@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P53007 and previous config saved to /var/cache/conftool/dbconfig/20231018-130325-arnaudb.json
  • 12:59 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 12:59 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 12:52 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 12:51 jbond: upload puppet_7.23.0-1~debu11u1 (bullseye backport
  • 12:48 arnaudb@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P53006 and previous config saved to /var/cache/conftool/dbconfig/20231018-124838-arnaudb.json
  • 12:48 arnaudb@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P53005 and previous config saved to /var/cache/conftool/dbconfig/20231018-124820-arnaudb.json
  • 12:44 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 12:44 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 12:44 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 12:43 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
  • 12:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2109.codfw.wmnet with reason: db2109 downtime while repooling
  • 12:39 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2109.codfw.wmnet with reason: db2109 downtime while repooling
  • 12:38 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 12:37 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 12:33 arnaudb@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P53004 and previous config saved to /var/cache/conftool/dbconfig/20231018-123333-arnaudb.json
  • 12:33 arnaudb@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P53003 and previous config saved to /var/cache/conftool/dbconfig/20231018-123315-arnaudb.json
  • 12:18 arnaudb@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P53002 and previous config saved to /var/cache/conftool/dbconfig/20231018-121828-arnaudb.json
  • 12:18 arnaudb@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P53001 and previous config saved to /var/cache/conftool/dbconfig/20231018-121811-arnaudb.json
  • 12:17 arnaudb: repool db2161 and db1126
  • 11:51 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1009.eqiad.wmnet
  • 11:44 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1009.eqiad.wmnet
  • 11:43 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudbackup1002-dev.eqiad.wmnet with OS bookworm
  • 11:34 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 11:31 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 11:29 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
  • 11:29 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
  • 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 11:23 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
  • 11:21 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
  • 11:20 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
  • 11:16 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
  • 11:16 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
  • 11:14 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudbackup1002-dev.eqiad.wmnet with reason: host reimage
  • 11:12 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudbackup1002-dev.eqiad.wmnet with reason: host reimage
  • 11:11 ladsgroup@deploy2002: Finished scap: Backport for Set s6 and s8 to write both for pagelinks migration (T345732) (duration: 10m 10s)
  • 11:08 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 11:05 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 11:02 ladsgroup@deploy2002: ladsgroup: Backport for Set s6 and s8 to write both for pagelinks migration (T345732) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 11:01 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudbackup1002-dev.eqiad.wmnet with OS bookworm
  • 11:01 ladsgroup@deploy2002: Started scap: Backport for Set s6 and s8 to write both for pagelinks migration (T345732)
  • 10:40 volans: re-enabled puppet on the cumin hosts. installed spicerack 8.0.1 on the cumin hosts
  • 10:37 volans@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1001.eqiad.wmnet with OS bullseye
  • 10:35 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1007.eqiad.wmnet
  • 10:32 fnegri@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudbackup1002-dev.eqiad.wmnet with OS bookworm
  • 10:28 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 10:19 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudbackup1002-dev.eqiad.wmnet with reason: host reimage
  • 10:16 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudbackup1002-dev.eqiad.wmnet with reason: host reimage
  • 10:09 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 10:07 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudbackup1002-dev.eqiad.wmnet with OS bookworm
  • 10:03 volans@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 09:54 volans@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 09:52 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on stat1009.eqiad.wmnet with reason: Extending downtime for stat1009
  • 09:52 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on stat1009.eqiad.wmnet with reason: Extending downtime for stat1009
  • 09:48 volans@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest1001.eqiad.wmnet
  • 09:47 volans@cumin2002: START - Cookbook sre.hosts.dhcp for host sretest1001.eqiad.wmnet
  • 09:25 volans: uploaded spicerack_8.0.1 to apt.wikimedia.org bullseye-wikimedia
  • 09:23 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 09:23 jynus: aborting backup of es1022, es1025 (there was already another backup running)
  • 09:23 fnegri@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudbackup1002-dev.eqiad.wmnet with OS bookworm
  • 09:22 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 09:21 jynus: starting new backup of es1022, es1025 (new clusters only)
  • 09:20 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1006.eqiad.wmnet
  • 09:20 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 09:19 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 09:17 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 09:17 jayme@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 09:17 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on stat1009.eqiad.wmnet with reason: Moving /home to /srv/home on stat1009 and rebooting
  • 09:16 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on stat1009.eqiad.wmnet with reason: Moving /home to /srv/home on stat1009 and rebooting
  • 09:14 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1007.eqiad.wmnet
  • 09:13 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1006.eqiad.wmnet
  • 09:13 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1004.eqiad.wmnet
  • 09:10 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudbackup1002-dev.eqiad.wmnet with reason: host reimage
  • 09:06 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudbackup1002-dev.eqiad.wmnet with reason: host reimage
  • 09:05 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1004.eqiad.wmnet
  • 09:02 aqu@deploy2002: Finished deploy [airflow-dags/analytics@c17c91c]: Fix following yesterday weekly train deploy - Second try [airflow-dags@c17c91ce] (duration: 00m 06s)
  • 09:02 aqu@deploy2002: Started deploy [airflow-dags/analytics@c17c91c]: Fix following yesterday weekly train deploy - Second try [airflow-dags@c17c91ce]
  • 09:01 aqu@deploy2002: deploy aborted: Fix following yesterday weekly train deploy [airflow-dags@c17c91ce] (duration: 01m 10s)
  • 09:00 aqu@deploy2002: Started deploy [airflow-dags/analytics@c17c91c]: Fix following yesterday weekly train deploy [airflow-dags@c17c91ce]
  • 08:54 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudbackup1002-dev.eqiad.wmnet with OS bookworm
  • 08:51 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 08:40 jayme@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 08:18 volans@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest1001.eqiad.wmnet
  • 08:14 volans@cumin2002: START - Cookbook sre.hosts.dhcp for host sretest1001.eqiad.wmnet
  • 08:08 volans@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest1001.eqiad.wmnet
  • 08:06 volans@cumin2002: START - Cookbook sre.hosts.dhcp for host sretest1001.eqiad.wmnet
  • 08:03 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:03 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add A/PTR for lsw1-e8/ssw links - ayounsi@cumin1001"
  • 08:02 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add A/PTR for lsw1-e8/ssw links - ayounsi@cumin1001"
  • 07:54 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 07:47 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 07:46 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2132.codfw.wmnet with OS bookworm
  • 07:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2132.codfw.wmnet with reason: host reimage
  • 07:37 volans: temporarily disabled puppet on the A:cumin hosts to deploy and test spicerack v8.0.0
  • 07:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2132.codfw.wmnet with reason: host reimage
  • 07:28 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply
  • 07:28 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply
  • 07:28 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/opentelemetry-collector: apply
  • 07:28 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/opentelemetry-collector: apply
  • 07:27 filippo@deploy2002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
  • 07:27 filippo@deploy2002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
  • 07:20 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2132.codfw.wmnet with OS bookworm
  • 07:06 aqu@deploy2002: Finished deploy [airflow-dags/analytics@5dcce3b]: Add missing MR in yesterday weekly train (run 2) [airflow-dags@5dcce3bd] (duration: 00m 07s)
  • 07:05 aqu@deploy2002: Started deploy [airflow-dags/analytics@5dcce3b]: Add missing MR in yesterday weekly train (run 2) [airflow-dags@5dcce3bd]
  • 07:05 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@be05071]: (no justification provided) (duration: 00m 06s)
  • 07:05 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@be05071]: (no justification provided)
  • 07:04 aqu@deploy2002: deploy aborted: Add missing MR in yesterday weekly train [airflow-dags@5dcce3bd] (duration: 03m 52s)
  • 07:00 aqu@deploy2002: Started deploy [airflow-dags/analytics@5dcce3b]: Add missing MR in yesterday weekly train [airflow-dags@5dcce3bd]
  • 07:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2160.codfw.wmnet with OS bookworm
  • 06:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2160.codfw.wmnet with reason: host reimage
  • 06:38 XioNoX: push pfw policies - T349101
  • 06:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2160.codfw.wmnet with reason: host reimage
  • 06:16 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2160.codfw.wmnet with OS bookworm
  • 06:08 marostegui@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2160.codfw.wmnet with OS bookworm
  • 05:57 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2160.codfw.wmnet with OS bookworm
  • 01:22 eileen: civicrm upgraded from da11d010 to d8fe92e3

2023-10-17

  • 22:03 herron: pyrra.wm.o upgraded to 0.7.1 T302995
  • 21:32 catrope@deploy2002: backport Cancelled
  • 21:10 inflatador: bking@cumin1001 repool wdqs eqiad after rdf-streaming-updater fix
  • 21:05 catrope@deploy2002: Finished scap: Backport for Add language prefix to Readability survey (T347208) (duration: 13m 03s)
  • 21:00 catrope@deploy2002: catrope and jdrewniak: Continuing with sync
  • 20:53 catrope@deploy2002: catrope and jdrewniak: Backport for Add language prefix to Readability survey (T347208) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:52 inflatador: bking@cumin1001 depool wdqs eqiad due to rdf-streaming-updater failure
  • 20:52 catrope@deploy2002: Started scap: Backport for Add language prefix to Readability survey (T347208)
  • 20:36 volans: uploaded spicerack_8.0.0 to apt.wikimedia.org bullseye-wikimedia
  • 20:36 eevans@deploy2002: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
  • 20:36 eevans@deploy2002: helmfile [codfw] START helmfile.d/services/sessionstore: apply
  • 20:35 eevans@deploy2002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
  • 20:34 eevans@deploy2002: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
  • 20:31 eevans@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 20:31 eevans@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 20:29 catrope@deploy2002: Finished scap: Backport for Fixes incorrect Hebrew logo and applies gotwiki (T341253 T341251) (duration: 09m 59s)
  • 20:27 eevans@deploy2002: helmfile [codfw] DONE helmfile.d/services/echostore: apply
  • 20:26 eevans@deploy2002: helmfile [codfw] START helmfile.d/services/echostore: apply
  • 20:24 eevans@deploy2002: helmfile [eqiad] DONE helmfile.d/services/echostore: apply
  • 20:24 eevans@deploy2002: helmfile [eqiad] START helmfile.d/services/echostore: apply
  • 20:24 catrope@deploy2002: jdlrobson and catrope: Continuing with sync
  • 20:21 eevans@deploy2002: helmfile [staging] DONE helmfile.d/services/echostore: apply
  • 20:21 eevans@deploy2002: helmfile [staging] START helmfile.d/services/echostore: apply
  • 20:20 catrope@deploy2002: jdlrobson and catrope: Backport for Fixes incorrect Hebrew logo and applies gotwiki (T341253 T341251) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:19 catrope@deploy2002: Started scap: Backport for Fixes incorrect Hebrew logo and applies gotwiki (T341253 T341251)
  • 20:16 catrope@deploy2002: Finished scap: Backport for Wordmark for blk wiktionary and got wikipedia (T341253 T341257) (duration: 11m 17s)
  • 20:11 catrope@deploy2002: catrope and jdlrobson: Continuing with sync
  • 20:06 catrope@deploy2002: catrope and jdlrobson: Backport for Wordmark for blk wiktionary and got wikipedia (T341253 T341257) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:05 catrope@deploy2002: Started scap: Backport for Wordmark for blk wiktionary and got wikipedia (T341253 T341257)
  • 18:46 hashar@deploy2002: Finished scap: Backport for logging: reorder wmgMonologProcessors entries (T349086) (duration: 08m 14s)
  • 18:43 hashar@deploy2002: hashar: Continuing with sync
  • 18:39 hashar@deploy2002: hashar: Backport for logging: reorder wmgMonologProcessors entries (T349086) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 18:38 hashar@deploy2002: Started scap: Backport for logging: reorder wmgMonologProcessors entries (T349086)
  • 18:25 brennen@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.1 refs T348354
  • 18:18 brennen: train 1.42.0-wmf.1 (T348354): blockers resolved, rolling to group0
  • 18:16 brennen@deploy2002: Finished scap: Backport for Pass full content to Parsoid for redirect pages (T349087) (duration: 07m 42s)
  • 18:11 brennen@deploy2002: brennen: Continuing with sync
  • 18:09 brennen@deploy2002: brennen: Backport for Pass full content to Parsoid for redirect pages (T349087) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 18:08 brennen@deploy2002: Started scap: Backport for Pass full content to Parsoid for redirect pages (T349087)
  • 17:05 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
  • 17:05 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
  • 16:22 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:22 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:50 sukhe: running authdns-update for CR 966564
  • 15:09 brennen@deploy2002: Finished deploy [phabricator/deployment@745d703]: deploy to phab1004 for T349038 (duration: 00m 57s)
  • 15:08 brennen@deploy2002: Started deploy [phabricator/deployment@745d703]: deploy to phab1004 for T349038
  • 15:07 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 15:07 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 15:07 brennen@deploy2002: Finished deploy [phabricator/deployment@745d703]: test deploy to phab2002 for T349038 (duration: 00m 33s)
  • 15:06 brennen@deploy2002: Started deploy [phabricator/deployment@745d703]: test deploy to phab2002 for T349038
  • 15:04 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator maintenance
  • 15:03 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator maintenance
  • 15:03 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator maintenance
  • 15:03 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator maintenance
  • 15:03 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 15:02 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 14:59 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 14:58 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 14:28 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 14:28 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 14:24 denisse@deploy2002: Finished deploy [performance/navtiming@2e17c67]: (no justification provided) (duration: 00m 05s)
  • 14:24 denisse@deploy2002: Started deploy [performance/navtiming@2e17c67]: (no justification provided)
  • 14:11 jdrewniak@deploy2002: Finished scap: Backport for ParserOutputAccess: Fix local cache when page is edited within the process (T349033) (duration: 15m 56s)
  • 14:05 jdrewniak@deploy2002: jdrewniak: Continuing with sync
  • 14:03 tchin@deploy2002: Finished deploy [airflow-dags/analytics_test@be05071]: Regular analytics weekly train (duration: 00m 06s)
  • 14:03 tchin@deploy2002: Started deploy [airflow-dags/analytics_test@be05071]: Regular analytics weekly train
  • 14:01 tchin@deploy2002: Finished deploy [airflow-dags/analytics@fae5764]: (no justification provided) (duration: 01m 22s)
  • 13:59 tchin@deploy2002: Started deploy [airflow-dags/analytics@fae5764]: (no justification provided)
  • 13:56 jdrewniak@deploy2002: jdrewniak: Backport for ParserOutputAccess: Fix local cache when page is edited within the process (T349033) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:55 jdrewniak@deploy2002: Started scap: Backport for ParserOutputAccess: Fix local cache when page is edited within the process (T349033)
  • 13:52 tchin@deploy2002: Finished deploy [analytics/refinery@0d09fbd] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0d09fbdc] (duration: 02m 59s)
  • 13:50 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1225.eqiad.wmnet with reason: db1225 downtime for restoration
  • 13:50 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1225.eqiad.wmnet with reason: db1225 downtime for restoration
  • 13:49 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 13:49 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 13:49 tchin@deploy2002: Started deploy [analytics/refinery@0d09fbd] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0d09fbdc]
  • 13:49 tchin@deploy2002: Finished deploy [analytics/refinery@0d09fbd] (thin): Regular analytics weekly train THIN [analytics/refinery@0d09fbdc] (duration: 00m 07s)
  • 13:49 tchin@deploy2002: Started deploy [analytics/refinery@0d09fbd] (thin): Regular analytics weekly train THIN [analytics/refinery@0d09fbdc]
  • 13:48 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 13:48 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 13:48 tchin@deploy2002: Finished deploy [analytics/refinery@0d09fbd]: Regular analytics weekly train [analytics/refinery@0d09fbdc] (duration: 07m 24s)
  • 13:47 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2160.codfw.wmnet with OS bookworm
  • 13:46 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 13:46 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 13:40 tchin@deploy2002: Started deploy [analytics/refinery@0d09fbd]: Regular analytics weekly train [analytics/refinery@0d09fbdc]
  • 13:40 jdrewniak@deploy2002: Finished scap: Backport for Enable Vector readability survey on select wikis (T347208) (duration: 09m 50s)
  • 13:34 jdrewniak@deploy2002: jdrewniak: Continuing with sync
  • 13:32 jdrewniak@deploy2002: jdrewniak: Backport for Enable Vector readability survey on select wikis (T347208) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:30 jdrewniak@deploy2002: Started scap: Backport for Enable Vector readability survey on select wikis (T347208)
  • 13:26 jdrewniak@deploy2002: Backport cancelled.
  • 13:15 jdrewniak@deploy2002: Backport cancelled.
  • 12:59 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2160.codfw.wmnet with OS bookworm
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1119 T339185', diff saved to https://phabricator.wikimedia.org/P52995 and previous config saved to /var/cache/conftool/dbconfig/20231017-124916-root.json
  • 12:28 urandom: Starting Cassandra decommission(s) of restbase1017 —
  • 11:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T343198)', diff saved to https://phabricator.wikimedia.org/P52994 and previous config saved to /var/cache/conftool/dbconfig/20231017-115217-arnaudb.json
  • 11:39 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Depool db1126 T349077', diff saved to https://phabricator.wikimedia.org/P52993 and previous config saved to /var/cache/conftool/dbconfig/20231017-113809-arnaudb.json
  • 11:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P52992 and previous config saved to /var/cache/conftool/dbconfig/20231017-113711-arnaudb.json
  • 11:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Set db1126 with weight 275 T349077', diff saved to https://phabricator.wikimedia.org/P52991 and previous config saved to /var/cache/conftool/dbconfig/20231017-113432-arnaudb.json
  • 11:29 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 11:27 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 11:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P52990 and previous config saved to /var/cache/conftool/dbconfig/20231017-112204-arnaudb.json
  • 11:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Promote db1209 to s8 primary T349077', diff saved to https://phabricator.wikimedia.org/P52989 and previous config saved to /var/cache/conftool/dbconfig/20231017-111720-arnaudb.json
  • 11:12 arnaudb: Starting s8 eqiad failover from db1126 to db1209 - T349077
  • 11:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T343198)', diff saved to https://phabricator.wikimedia.org/P52988 and previous config saved to /var/cache/conftool/dbconfig/20231017-110658-arnaudb.json
  • 11:00 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:59 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Set db1209 with weight 0 T349077', diff saved to https://phabricator.wikimedia.org/P52987 and previous config saved to /var/cache/conftool/dbconfig/20231017-104839-arnaudb.json
  • 10:46 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s8 T349077
  • 10:46 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: Primary switchover s8 T349077
  • 10:28 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:28 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 09:59 hashar: Deleted operations-puppet-catalog-compiler Jenkins job to replace it with a new job letting one picks the Puppet version(s) to compile against | T236373
  • 09:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 09:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 09:58 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for an-airflow[1002,1004-1006].eqiad.wmnet,an-launcher1002.eqiad.wmnet
  • 09:58 btullis@cumin1001: START - Cookbook sre.hosts.remove-downtime for an-airflow[1002,1004-1006].eqiad.wmnet,an-launcher1002.eqiad.wmnet
  • 09:48 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-db1001.eqiad.wmnet
  • 09:48 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on an-airflow[1002,1004-1006].eqiad.wmnet,an-launcher1002.eqiad.wmnet with reason: Rebooting Airflow instances for T344671
  • 09:47 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on an-airflow[1002,1004-1006].eqiad.wmnet,an-launcher1002.eqiad.wmnet with reason: Rebooting Airflow instances for T344671
  • 09:42 btullis@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host an-airflow1007.eqiad.wmnet
  • 09:42 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-db1001.eqiad.wmnet
  • 09:36 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@b010dae]: (no justification provided) (duration: 00m 46s)
  • 09:35 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@b010dae]: (no justification provided)
  • 09:33 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-airflow1007.eqiad.wmnet
  • 09:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1002.eqiad.wmnet
  • 09:28 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-airflow1002.eqiad.wmnet
  • 09:28 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1005.eqiad.wmnet
  • 09:26 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 09:26 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 09:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-airflow1005.eqiad.wmnet
  • 09:24 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1004.eqiad.wmnet
  • 09:21 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 09:20 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 09:20 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-airflow1004.eqiad.wmnet
  • 09:17 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1006.eqiad.wmnet
  • 09:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 09:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 09:13 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-airflow1006.eqiad.wmnet
  • 09:12 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on an-airflow[1002,1004-1006].eqiad.wmnet,an-launcher1002.eqiad.wmnet with reason: Rebooting Airflow instances for T344671
  • 09:12 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on an-airflow[1002,1004-1006].eqiad.wmnet,an-launcher1002.eqiad.wmnet with reason: Rebooting Airflow instances for T344671
  • 08:38 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 08:35 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 08:32 XioNoX: push pfw policies - T348576
  • 07:26 hashar@deploy2002: Finished deploy [gerrit/gerrit@578be93]: wm-checks-api: filter out Zuul start messages | T348920 (duration: 00m 07s)
  • 07:26 hashar@deploy2002: Started deploy [gerrit/gerrit@578be93]: wm-checks-api: filter out Zuul start messages | T348920
  • 07:23 hashar@deploy2002: Finished deploy [gerrit/gerrit@1153a16]: wm-checks-api: filter out Zuul start messages | T348920 (duration: 00m 05s)
  • 07:22 hashar@deploy2002: Started deploy [gerrit/gerrit@1153a16]: wm-checks-api: filter out Zuul start messages | T348920
  • 06:06 isaranto@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2161 T349053', diff saved to https://phabricator.wikimedia.org/P52986 and previous config saved to /var/cache/conftool/dbconfig/20231017-060214-root.json
  • 06:06 isaranto@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 06:02 isaranto@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2165 to s8 primary and set section read-write T349053', diff saved to https://phabricator.wikimedia.org/P52985 and previous config saved to /var/cache/conftool/dbconfig/20231017-060047-root.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s8 codfw as read-only for maintenance - T349053', diff saved to https://phabricator.wikimedia.org/P52984 and previous config saved to /var/cache/conftool/dbconfig/20231017-060021-root.json
  • 06:00 marostegui: Starting s8 codfw failover from db2161 to db2165 - T349053
  • 05:59 kart_: Update MinT to 2023-10-16-101614-production (T333969, T336683, T348097)
  • 05:36 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 05:31 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 05:29 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 05:19 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 05:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s8 T349053
  • 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2165 with weight 0 T349053', diff saved to https://phabricator.wikimedia.org/P52983 and previous config saved to /var/cache/conftool/dbconfig/20231017-051723-root.json
  • 05:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: Primary switchover s8 T349053
  • 03:55 mwpresync@deploy2002: Pruned MediaWiki: 1.41.0-wmf.29 (duration: 02m 15s)
  • 03:53 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.42.0-wmf.1 refs T348354 (duration: 50m 15s)
  • 03:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.42.0-wmf.1 refs T348354
  • 02:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T343198)', diff saved to https://phabricator.wikimedia.org/P52982 and previous config saved to /var/cache/conftool/dbconfig/20231017-021040-arnaudb.json
  • 02:10 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 02:10 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 02:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T343198)', diff saved to https://phabricator.wikimedia.org/P52981 and previous config saved to /var/cache/conftool/dbconfig/20231017-021018-arnaudb.json
  • 01:55 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P52980 and previous config saved to /var/cache/conftool/dbconfig/20231017-015511-arnaudb.json
  • 01:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P52979 and previous config saved to /var/cache/conftool/dbconfig/20231017-014005-arnaudb.json
  • 01:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T343198)', diff saved to https://phabricator.wikimedia.org/P52978 and previous config saved to /var/cache/conftool/dbconfig/20231017-012459-arnaudb.json

2023-10-16

  • 22:04 maryum: deployed security patch for T347742
  • 21:53 maryum: deployed security patch for T347708
  • 21:40 maryum: deployed security patch for T348343
  • 21:04 sbassett: deployed security mitigation for T348828
  • 20:55 cjming: end of UTC late backport window
  • 20:53 cjming@deploy2002: Finished scap: Backport for wordmarks/taglines for Wiktionary projects (T341257) (duration: 07m 17s)
  • 20:47 cjming@deploy2002: jdlrobson and cjming: Continuing with sync
  • 20:46 cjming@deploy2002: jdlrobson and cjming: Backport for wordmarks/taglines for Wiktionary projects (T341257) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:45 cjming@deploy2002: Started scap: Backport for wordmarks/taglines for Wiktionary projects (T341257)
  • 20:44 cjming@deploy2002: Finished scap: Backport for Update logos for remaining Wikisource projects (T343753) (duration: 07m 50s)
  • 20:39 cjming@deploy2002: jdlrobson and cjming: Continuing with sync
  • 20:37 cjming@deploy2002: jdlrobson and cjming: Backport for Update logos for remaining Wikisource projects (T343753) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:36 cjming@deploy2002: Started scap: Backport for Update logos for remaining Wikisource projects (T343753)
  • 20:35 cjming@deploy2002: Finished scap: Backport for Fixes Thai Wikinews wordmark and sewikimedia (T348757 T347534) (duration: 07m 08s)
  • 20:30 cjming@deploy2002: cjming and jdlrobson: Continuing with sync
  • 20:29 cjming@deploy2002: cjming and jdlrobson: Backport for Fixes Thai Wikinews wordmark and sewikimedia (T348757 T347534) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:28 cjming@deploy2002: Started scap: Backport for Fixes Thai Wikinews wordmark and sewikimedia (T348757 T347534)
  • 20:26 cjming@deploy2002: Finished scap: Backport for Merge ReplyWidget[Plain/Visual] modules (T348834) (duration: 07m 23s)
  • 20:21 cjming@deploy2002: kemayo and cjming: Continuing with sync
  • 20:20 cjming@deploy2002: kemayo and cjming: Backport for Merge ReplyWidget[Plain/Visual] modules (T348834) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:19 cjming@deploy2002: Started scap: Backport for Merge ReplyWidget[Plain/Visual] modules (T348834)
  • 20:18 cjming@deploy2002: Finished scap: Backport for Enable display of Client Hints data on all wikis (T341110 T337942) (duration: 08m 17s)
  • 20:13 cjming@deploy2002: dreamyjazz and cjming: Continuing with sync
  • 20:11 cjming@deploy2002: dreamyjazz and cjming: Backport for Enable display of Client Hints data on all wikis (T341110 T337942) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:10 cjming@deploy2002: Started scap: Backport for Enable display of Client Hints data on all wikis (T341110 T337942)
  • 19:55 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:55 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:42 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:42 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:30 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:30 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:27 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:27 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:23 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts aqs1010.eqiad.wmnet
  • 19:23 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts aqs1010.eqiad.wmnet
  • 19:20 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:20 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:17 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts aqs1010.eqiad.wmnet
  • 19:13 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:12 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:09 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts aqs1010.eqiad.wmnet
  • 18:51 sukhe: exiqgrep -i -r <redacted> | xargs exim -Mrm
  • 18:41 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 18:27 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 18:27 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 18:27 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 18:20 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:19 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:06 ejegg: fundraising python tools upgraded from 7c6a28e0 to e56ae8ae
  • 17:59 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
  • 17:59 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
  • 17:55 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:55 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:41 denisse: Upgrading navtiming on the webperf hosts in the beta cluster
  • 17:14 ejegg: fundraising python tools upgraded from 0c17296c to 7c6a28e0
  • 16:48 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 16:46 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 16:43 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:42 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:25 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:23 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T343198)', diff saved to https://phabricator.wikimedia.org/P52975 and previous config saved to /var/cache/conftool/dbconfig/20231016-161829-arnaudb.json
  • 16:18 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 16:18 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 16:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T343198)', diff saved to https://phabricator.wikimedia.org/P52974 and previous config saved to /var/cache/conftool/dbconfig/20231016-161806-arnaudb.json
  • 16:10 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 16:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P52973 and previous config saved to /var/cache/conftool/dbconfig/20231016-160300-arnaudb.json
  • 15:47 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P52972 and previous config saved to /var/cache/conftool/dbconfig/20231016-154754-arnaudb.json
  • 15:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T343198)', diff saved to https://phabricator.wikimedia.org/P52971 and previous config saved to /var/cache/conftool/dbconfig/20231016-153247-arnaudb.json
  • 15:10 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for sessionstore2001.codfw.wmnet
  • 15:10 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for sessionstore2001.codfw.wmnet
  • 15:08 sukhe: running authdns-update
  • 15:03 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 14:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4003.wikimedia.org with OS bookworm
  • 14:54 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on sessionstore2001.codfw.wmnet with reason: Moving host — T348142
  • 14:54 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on sessionstore2001.codfw.wmnet with reason: Moving host — T348142
  • 14:42 ejegg: Standalone (IPN listener) SmashPig upgraded from 211284b9 to e27dfbce
  • 14:35 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:34 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:34 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:33 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:33 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:33 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:30 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:30 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:28 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
  • 14:26 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 14:25 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
  • 14:23 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:22 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:22 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:21 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:20 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:20 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:18 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:17 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:17 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:16 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:16 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:15 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:10 ladsgroup@deploy2002: Finished scap: Backport for Disable DoubleWiki extension everywhere (T344544) (duration: 08m 09s)
  • 14:05 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 14:03 ladsgroup@deploy2002: ladsgroup: Backport for Disable DoubleWiki extension everywhere (T344544) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:02 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bookworm
  • 14:02 ladsgroup@deploy2002: Started scap: Backport for Disable DoubleWiki extension everywhere (T344544)
  • 13:53 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 13:52 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 13:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 13:52 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 13:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 13:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 13:42 jayme@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:41 jayme@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:41 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 13:40 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:39 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 13:39 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 13:38 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:37 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 13:36 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/opentelemetry-collector: apply
  • 13:36 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/opentelemetry-collector: apply
  • 13:36 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply
  • 13:36 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply
  • 13:36 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
  • 13:35 jayme@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
  • 13:34 jayme@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:34 TheresNoTime: close UTC afternoon backport window
  • 13:34 jayme@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:34 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:34 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:34 jayme@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:33 jayme@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 13:33 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 13:33 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 13:33 jayme@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 13:33 jayme@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 13:33 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 13:33 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 13:33 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 13:32 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:30 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:30 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 13:30 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 13:30 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 13:14 brouberol@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:14 brouberol@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:12 samtar@deploy2002: Finished scap: Backport for fix incubatorwiki wordmark (T348577), update throttle rule for UIUC Wikipedia edit-a-thon November 13, 2023 and remove old throttle rules (T346043) (duration: 08m 08s)
  • 13:07 samtar@deploy2002: samtar and anzx: Continuing with sync
  • 13:05 samtar@deploy2002: samtar and anzx: Backport for fix incubatorwiki wordmark (T348577), update throttle rule for UIUC Wikipedia edit-a-thon November 13, 2023 and remove old throttle rules (T346043) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:05 brouberol@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:04 brouberol@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:04 samtar@deploy2002: Started scap: Backport for fix incubatorwiki wordmark (T348577), update throttle rule for UIUC Wikipedia edit-a-thon November 13, 2023 and remove old throttle rules (T346043)
  • 12:35 ladsgroup@deploy2002: Finished scap: Backport for Switch ES cluster to cluster28 and cluster29 (T342685) (duration: 18m 52s)
  • 12:29 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 12:17 ladsgroup@deploy2002: ladsgroup: Backport for Switch ES cluster to cluster28 and cluster29 (T342685) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:16 ladsgroup@deploy2002: Started scap: Backport for Switch ES cluster to cluster28 and cluster29 (T342685)
  • 11:15 brouberol@deploy2002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
  • 11:12 brouberol@deploy2002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
  • 11:10 brouberol@deploy2002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
  • 11:07 brouberol@deploy2002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
  • 11:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-coord1001.eqiad.wmnet
  • 10:56 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-coord1001.eqiad.wmnet
  • 10:18 ladsgroup@deploy2002: Finished scap: Backport for Change default of pagelinks to write both (T345732) (duration: 07m 44s)
  • 10:12 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 10:11 ladsgroup@deploy2002: ladsgroup: Backport for Change default of pagelinks to write both (T345732) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:10 ladsgroup@deploy2002: Started scap: Backport for Change default of pagelinks to write both (T345732)
  • 10:06 ladsgroup@deploy2002: Finished scap: Backport for Enable pagelinks migration WRITE BOTH on some more wikis (T345732) (duration: 09m 19s)
  • 10:01 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 09:58 ladsgroup@deploy2002: ladsgroup: Backport for Enable pagelinks migration WRITE BOTH on some more wikis (T345732) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:57 ladsgroup@deploy2002: Started scap: Backport for Enable pagelinks migration WRITE BOTH on some more wikis (T345732)
  • 09:52 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 09:52 phuedx@deploy2002: Finished scap: Backport for Revert "Introduce Web Accessibility Features and Submodule" (duration: 10m 04s)
  • 09:52 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 09:51 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 09:47 phuedx@deploy2002: phuedx: Continuing with sync
  • 09:43 phuedx@deploy2002: phuedx: Backport for Revert "Introduce Web Accessibility Features and Submodule" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:42 phuedx@deploy2002: Started scap: Backport for Revert "Introduce Web Accessibility Features and Submodule"
  • 09:38 phuedx@deploy2002: backport Cancelled
  • 09:00 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host an-test-master1001.eqiad.wmnet
  • 08:56 brouberol@deploy2002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
  • 08:52 brouberol@deploy2002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
  • 08:51 brouberol@deploy2002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 08:48 brouberol@deploy2002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 08:46 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-master1001.eqiad.wmnet
  • 08:44 brouberol@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 08:44 brouberol@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 08:44 brouberol@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 08:43 brouberol@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 08:43 brouberol@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 08:42 brouberol@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 08:41 brouberol@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 08:40 brouberol@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 08:40 brouberol@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 08:39 brouberol@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 08:38 brouberol@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 08:38 brouberol@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 08:36 brouberol@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 08:35 brouberol@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 08:35 brouberol@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 08:35 brouberol@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 08:35 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 08:34 brouberol@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 08:34 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 08:34 brouberol@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 08:34 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 08:10 hashar@deploy2002: Finished scap: Backport for Don't try to lock to serialize m3u8 file writes (T348689 T348667 T348375 T348753) (duration: 27m 04s)
  • 07:58 hashar@deploy2002: jforrester and hashar: Continuing with sync
  • 07:57 hashar@deploy2002: jforrester and hashar: Backport for Don't try to lock to serialize m3u8 file writes (T348689 T348667 T348375 T348753) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:55 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 07:54 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 07:54 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
  • 07:53 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
  • 07:43 hashar@deploy2002: Started scap: Backport for Don't try to lock to serialize m3u8 file writes (T348689 T348667 T348375 T348753)
  • 07:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T343198)', diff saved to https://phabricator.wikimedia.org/P52968 and previous config saved to /var/cache/conftool/dbconfig/20231016-073731-arnaudb.json
  • 07:37 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 07:37 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 07:37 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 07:36 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 07:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T343198)', diff saved to https://phabricator.wikimedia.org/P52967 and previous config saved to /var/cache/conftool/dbconfig/20231016-073653-arnaudb.json
  • 07:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P52966 and previous config saved to /var/cache/conftool/dbconfig/20231016-072147-arnaudb.json
  • 07:17 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 07:17 elukey@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 07:15 aqu@deploy2002: Finished deploy [analytics/refinery@1baf3be] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@1baf3be2] (duration: 02m 51s)
  • 07:12 aqu@deploy2002: Started deploy [analytics/refinery@1baf3be] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@1baf3be2]
  • 07:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P52965 and previous config saved to /var/cache/conftool/dbconfig/20231016-070640-arnaudb.json
  • 06:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T343198)', diff saved to https://phabricator.wikimedia.org/P52964 and previous config saved to /var/cache/conftool/dbconfig/20231016-065134-arnaudb.json
  • 05:41 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 05:41 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 05:40 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 05:40 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 05:39 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 05:38 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 05:36 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 05:35 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 05:34 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 05:33 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 05:33 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 05:33 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 05:32 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 05:32 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 05:32 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 05:32 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 05:31 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 05:31 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .

2023-10-15

  • 22:24 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T343198)', diff saved to https://phabricator.wikimedia.org/P52963 and previous config saved to /var/cache/conftool/dbconfig/20231015-222435-arnaudb.json
  • 22:24 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 22:24 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 22:24 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T343198)', diff saved to https://phabricator.wikimedia.org/P52962 and previous config saved to /var/cache/conftool/dbconfig/20231015-222414-arnaudb.json
  • 22:09 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P52961 and previous config saved to /var/cache/conftool/dbconfig/20231015-220907-arnaudb.json
  • 21:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P52960 and previous config saved to /var/cache/conftool/dbconfig/20231015-215401-arnaudb.json
  • 21:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T343198)', diff saved to https://phabricator.wikimedia.org/P52959 and previous config saved to /var/cache/conftool/dbconfig/20231015-213855-arnaudb.json
  • 19:10 urandom: starting Cassandra decommission of restbase1016-b — T328490
  • 14:35 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 14:32 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 14:31 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 14:31 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 14:31 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 14:30 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 14:30 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 13:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T343198)', diff saved to https://phabricator.wikimedia.org/P52958 and previous config saved to /var/cache/conftool/dbconfig/20231015-130027-arnaudb.json
  • 13:00 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 13:00 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 13:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T343198)', diff saved to https://phabricator.wikimedia.org/P52957 and previous config saved to /var/cache/conftool/dbconfig/20231015-130005-arnaudb.json
  • 12:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P52956 and previous config saved to /var/cache/conftool/dbconfig/20231015-124459-arnaudb.json
  • 12:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P52955 and previous config saved to /var/cache/conftool/dbconfig/20231015-122953-arnaudb.json
  • 12:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T343198)', diff saved to https://phabricator.wikimedia.org/P52954 and previous config saved to /var/cache/conftool/dbconfig/20231015-121446-arnaudb.json
  • 11:03 hashar@deploy2002: Finished deploy [integration/docroot@096f637]: (no justification provided) (duration: 00m 05s)
  • 11:03 hashar@deploy2002: Started deploy [integration/docroot@096f637]: (no justification provided)
  • 03:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T343198)', diff saved to https://phabricator.wikimedia.org/P52953 and previous config saved to /var/cache/conftool/dbconfig/20231015-035420-arnaudb.json
  • 03:54 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 03:53 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 03:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T343198)', diff saved to https://phabricator.wikimedia.org/P52952 and previous config saved to /var/cache/conftool/dbconfig/20231015-035347-arnaudb.json
  • 03:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P52951 and previous config saved to /var/cache/conftool/dbconfig/20231015-033841-arnaudb.json
  • 03:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P52950 and previous config saved to /var/cache/conftool/dbconfig/20231015-032335-arnaudb.json
  • 03:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T343198)', diff saved to https://phabricator.wikimedia.org/P52949 and previous config saved to /var/cache/conftool/dbconfig/20231015-030828-arnaudb.json

2023-10-14

  • 18:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T343198)', diff saved to https://phabricator.wikimedia.org/P52948 and previous config saved to /var/cache/conftool/dbconfig/20231014-184517-arnaudb.json
  • 18:45 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 18:45 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 18:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T343198)', diff saved to https://phabricator.wikimedia.org/P52947 and previous config saved to /var/cache/conftool/dbconfig/20231014-184455-arnaudb.json
  • 18:30 urandom: starting Cassandra decommission of restbase1016-a — T328490
  • 18:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P52946 and previous config saved to /var/cache/conftool/dbconfig/20231014-182949-arnaudb.json
  • 18:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P52945 and previous config saved to /var/cache/conftool/dbconfig/20231014-181442-arnaudb.json
  • 17:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T343198)', diff saved to https://phabricator.wikimedia.org/P52944 and previous config saved to /var/cache/conftool/dbconfig/20231014-175936-arnaudb.json
  • 17:34 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 17:34 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 17:33 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 17:33 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 17:32 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 17:32 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 09:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T343198)', diff saved to https://phabricator.wikimedia.org/P52943 and previous config saved to /var/cache/conftool/dbconfig/20231014-091542-arnaudb.json
  • 09:15 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 09:15 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 02:29 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 02:22 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 02:22 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 02:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T343198)', diff saved to https://phabricator.wikimedia.org/P52942 and previous config saved to /var/cache/conftool/dbconfig/20231014-022208-arnaudb.json
  • 02:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P52941 and previous config saved to /var/cache/conftool/dbconfig/20231014-020701-arnaudb.json
  • 01:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P52940 and previous config saved to /var/cache/conftool/dbconfig/20231014-015154-arnaudb.json
  • 01:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T343198)', diff saved to https://phabricator.wikimedia.org/P52939 and previous config saved to /var/cache/conftool/dbconfig/20231014-013648-arnaudb.json
  • 00:04 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)

2023-10-13

  • 23:56 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 22:21 ejegg: fundraising civicrm upgraded from c5f54d97 to e71ccffb
  • 21:32 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1107.eqiad.wmnet with OS bullseye
  • 21:32 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1106.eqiad.wmnet with OS bullseye
  • 21:29 hashar@deploy2002: Finished deploy [integration/docroot@096f637]: Expand Purtle doc card (duration: 00m 05s)
  • 21:29 hashar@deploy2002: Started deploy [integration/docroot@096f637]: Expand Purtle doc card
  • 21:29 hashar@deploy2002: Finished deploy [integration/docroot@504d455]: Fix php-session-serializer tagline (duration: 00m 06s)
  • 21:28 hashar@deploy2002: Started deploy [integration/docroot@504d455]: Fix php-session-serializer tagline
  • 20:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1111.eqiad.wmnet with OS bullseye
  • 20:41 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1114.eqiad.wmnet with OS bullseye
  • 20:29 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1113.eqiad.wmnet with OS bullseye
  • 20:29 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 20:26 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 20:24 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1108']
  • 20:23 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1108']
  • 20:22 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1108.eqiad.wmnet with OS bullseye
  • 20:15 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1107.eqiad.wmnet with OS bullseye
  • 20:12 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1107.eqiad.wmnet with reason: host reimage
  • 20:12 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1107.eqiad.wmnet with OS bullseye
  • 20:12 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1106.eqiad.wmnet with OS bullseye
  • 20:11 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1112.eqiad.wmnet with OS bullseye
  • 20:11 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 20:10 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1113.eqiad.wmnet with reason: host reimage
  • 20:07 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1107.eqiad.wmnet with reason: host reimage
  • 20:07 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 20:06 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1113.eqiad.wmnet with reason: host reimage
  • 20:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1109.eqiad.wmnet with OS bullseye
  • 20:04 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 20:03 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 19:57 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1111.eqiad.wmnet with OS bullseye
  • 19:56 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1102']
  • 19:56 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
  • 19:55 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1102
  • 19:54 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1102
  • 19:53 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1102']
  • 19:53 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
  • 19:52 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1107.eqiad.wmnet with OS bullseye
  • 19:49 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1112.eqiad.wmnet with reason: host reimage
  • 19:48 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1113.eqiad.wmnet with OS bullseye
  • 19:47 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1109.eqiad.wmnet with reason: host reimage
  • 19:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
  • 19:43 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1102']
  • 19:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
  • 19:39 jclark@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cp1113']
  • 19:38 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1102
  • 19:37 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1102
  • 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1102
  • 19:35 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1102
  • 19:30 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1108.eqiad.wmnet with OS bullseye
  • 19:28 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1109.eqiad.wmnet with OS bullseye
  • 19:27 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1112']
  • 19:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1112']
  • 19:25 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1102']
  • 19:24 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
  • 19:24 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1102']
  • 19:24 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
  • 19:24 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1112.eqiad.wmnet with OS bullseye
  • 19:24 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp1114']
  • 19:23 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1112']
  • 19:23 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1112']
  • 19:23 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1112']
  • 19:22 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1112']
  • 19:22 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1113']
  • 19:21 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1114.eqiad.wmnet with OS bullseye
  • 19:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1110']
  • 19:20 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1110']
  • 19:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1111']
  • 19:19 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1111']
  • 19:18 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1112']
  • 19:18 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1112']
  • 19:17 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1113']
  • 19:17 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1114']
  • 19:17 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1114']
  • 19:14 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@520fa55]: (no justification provided) (duration: 00m 23s)
  • 19:14 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1113']
  • 19:14 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@520fa55]: (no justification provided)
  • 19:14 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1112']
  • 19:08 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1112']
  • 19:08 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1112']
  • 19:07 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1112']
  • 19:07 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1111']
  • 19:04 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1111']
  • 19:03 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1110']
  • 19:00 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1110']
  • 18:58 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1109']
  • 18:52 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1109']
  • 18:06 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1108']
  • 18:06 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1104']
  • 18:03 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1115.eqiad.wmnet with OS bullseye
  • 18:03 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 18:02 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1107']
  • 18:00 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1108']
  • 18:00 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@520fa55]: (no justification provided) (duration: 00m 59s)
  • 17:59 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@520fa55]: (no justification provided)
  • 17:56 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1107']
  • 17:55 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1106']
  • 17:55 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1106']
  • 17:54 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1105']
  • 17:53 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1105']
  • 17:53 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1104']
  • 17:52 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1103']
  • 17:50 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1101']
  • 17:48 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1100']
  • 17:46 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1103']
  • 17:46 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 17:46 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1102']
  • 17:45 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
  • 17:44 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1101']
  • 17:42 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1100']
  • 17:29 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1115.eqiad.wmnet with reason: host reimage
  • 17:26 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1105']
  • 17:26 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@520fa55]: (no justification provided) (duration: 01m 01s)
  • 17:25 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1115.eqiad.wmnet with reason: host reimage
  • 17:25 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@520fa55]: (no justification provided)
  • 17:16 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1105']
  • 17:16 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1105']
  • 17:15 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1105']
  • 17:14 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1105']
  • 17:10 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS bullseye
  • 17:08 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1115']
  • 17:01 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1115']
  • 16:58 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1105']
  • 16:57 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1105.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:49 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1105.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:43 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1105.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:43 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1105.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:42 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1105']
  • 16:42 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1105']
  • 16:41 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1105']
  • 16:41 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1105']
  • 16:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1105.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:29 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1008.wikimedia.org with OS bullseye
  • 16:29 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T343198)', diff saved to https://phabricator.wikimedia.org/P52936 and previous config saved to /var/cache/conftool/dbconfig/20231013-162902-arnaudb.json
  • 16:28 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 16:28 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 16:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T343198)', diff saved to https://phabricator.wikimedia.org/P52935 and previous config saved to /var/cache/conftool/dbconfig/20231013-162840-arnaudb.json
  • 16:24 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1114']
  • 16:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P52934 and previous config saved to /var/cache/conftool/dbconfig/20231013-161333-arnaudb.json
  • 16:12 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1114']
  • 16:11 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1113']
  • 16:10 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1115']
  • 16:06 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1114.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:00 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1115']
  • 15:59 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1113']
  • 15:59 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1115.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P52933 and previous config saved to /var/cache/conftool/dbconfig/20231013-155827-arnaudb.json
  • 15:55 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1111']
  • 15:55 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1113.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:55 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1111']
  • 15:54 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1112']
  • 15:45 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1111']
  • 15:45 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1110']
  • 15:44 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1112']
  • 15:44 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1112']
  • 15:44 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1112']
  • 15:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T343198)', diff saved to https://phabricator.wikimedia.org/P52932 and previous config saved to /var/cache/conftool/dbconfig/20231013-154321-arnaudb.json
  • 15:43 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1112.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:41 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1115.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:41 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1114.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:40 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1115
  • 15:40 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1114
  • 15:39 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1115
  • 15:39 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1114
  • 15:37 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1113.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:35 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1111']
  • 15:35 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1111.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:35 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1110']
  • 15:33 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1109']
  • 15:32 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1113
  • 15:32 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1110.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:32 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1108']
  • 15:32 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1107']
  • 15:31 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1113
  • 15:25 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1112.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:23 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1109']
  • 15:23 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1106']
  • 15:22 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1108']
  • 15:21 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1107']
  • 15:20 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1109.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:19 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1107.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:19 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1108.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:18 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1107.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:16 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1112
  • 15:16 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1107.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:15 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1112
  • 15:15 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1111.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:12 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1106']
  • 15:12 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1106.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:10 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 15:10 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1111
  • 15:08 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1110.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:07 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1111
  • 15:07 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1110
  • 15:06 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1110
  • 15:02 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1109.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:01 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1109
  • 14:59 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1109
  • 14:58 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1108.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:56 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1108
  • 14:55 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1107.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:55 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1107
  • 14:54 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1108
  • 14:53 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1107
  • 14:51 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1106.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:51 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1106
  • 14:51 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1106
  • 14:43 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 14:39 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be2003.codfw.wmnet with OS bookworm
  • 14:30 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1106
  • 14:30 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1106
  • 14:29 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1008.wikimedia.org with reason: host reimage
  • 14:28 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1106.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:26 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1008.wikimedia.org with reason: host reimage
  • 14:21 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be2003.codfw.wmnet with reason: host reimage
  • 14:19 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cp1105.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:18 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be2003.codfw.wmnet with reason: host reimage
  • 14:17 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1105.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:17 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1105
  • 14:17 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1105
  • 14:12 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1008.wikimedia.org with OS bullseye
  • 14:06 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 14:03 sukhe: remove redundant 208.80.154.238/32 dev from /e/n/i on A:dns-rec and A:eqiad (superseded by label lo:anycast): T348041
  • 13:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 13:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 13:20 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bookworm
  • 13:07 mvernon@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host moss-be2003.codfw.wmnet with OS bookworm
  • 13:04 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bookworm
  • 13:04 mvernon@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host moss-be2003.codfw.wmnet with OS bookworm
  • 13:04 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 12:53 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@9a8cfd2]: (no justification provided) (duration: 00m 39s)
  • 12:52 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@9a8cfd2]: (no justification provided)
  • 12:52 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@9a8cfd2]: (no justification provided) (duration: 01m 26s)
  • 12:50 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@9a8cfd2]: (no justification provided)
  • 12:47 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bookworm
  • 11:53 urandom: starting decommission of restbase2012-c — T328490
  • 11:07 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 10:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 10:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 10:29 ladsgroup@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 10:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 09:10 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 08:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15133
  • 07:54 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 06:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 15133
  • 06:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 150552
  • 06:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 150552
  • 06:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T343198)', diff saved to https://phabricator.wikimedia.org/P52925 and previous config saved to /var/cache/conftool/dbconfig/20231013-064400-arnaudb.json
  • 06:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 06:43 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 06:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T343198)', diff saved to https://phabricator.wikimedia.org/P52924 and previous config saved to /var/cache/conftool/dbconfig/20231013-064328-arnaudb.json
  • 06:43 moritzm: installing Linux 5.10.197 updates from Bullseye point release (no reboots, just installing the new kernels)
  • 06:39 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync
  • 06:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: setup in progress
  • 06:38 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: setup in progress
  • 06:38 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams: sync
  • 06:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on debmonitor2002.codfw.wmnet with reason: setup in progress
  • 06:38 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on debmonitor2002.codfw.wmnet with reason: setup in progress
  • 06:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P52923 and previous config saved to /var/cache/conftool/dbconfig/20231013-062821-arnaudb.json
  • 06:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P52922 and previous config saved to /var/cache/conftool/dbconfig/20231013-061315-arnaudb.json
  • 05:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T343198)', diff saved to https://phabricator.wikimedia.org/P52921 and previous config saved to /var/cache/conftool/dbconfig/20231013-055809-arnaudb.json
  • 03:20 TimStarling: on non-CentralAuth wikis, created the loginnotify_seen_net table T346989
  • 03:08 TimStarling: on x1 wikishared, created loginnotify_seen_net table T346989
  • 01:11 cstone: payments-wiki upgraded from aa5cd24d to 7f4da789

2023-10-12

  • 21:59 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1008.wikimedia.org with OS bullseye
  • 21:27 thcipriani@deploy2002: Finished scap: Backport for Set UseParserMigration true in wmf-config (T333179) (duration: 15m 20s)
  • 21:22 thcipriani@deploy2002: sbailey and thcipriani: Continuing with sync
  • 21:13 thcipriani@deploy2002: sbailey and thcipriani: Backport for Set UseParserMigration true in wmf-config (T333179) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:12 thcipriani@deploy2002: Started scap: Backport for Set UseParserMigration true in wmf-config (T333179)
  • 21:10 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 21:10 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 21:10 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 21:10 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 21:09 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 21:09 thcipriani: mwmaint2002:foreachwikiindblist 'group2 & s6' extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --current --all --touched-after=20230613000000 | tee /tmp/persistentRevisionThreadItems-s6.log
  • 21:09 thcipriani: mwmaint2002:foreachwikiindblist 'group2 & s7' extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --current --all --touched-after=20230613000000 | tee /tmp/persistentRevisionThreadItems-s7.log
  • 21:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2103 (T343198)', diff saved to https://phabricator.wikimedia.org/P52920 and previous config saved to /var/cache/conftool/dbconfig/20231012-210646-arnaudb.json
  • 21:06 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 21:06 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 21:06 thcipriani@deploy2002: Finished scap: Backport for Enable wgDiscussionToolsEnablePermalinksBackend on s7 group2 (T315353) (duration: 07m 55s)
  • 21:00 thcipriani@deploy2002: thcipriani and matmarex: Continuing with sync
  • 20:59 thcipriani@deploy2002: thcipriani and matmarex: Backport for Enable wgDiscussionToolsEnablePermalinksBackend on s7 group2 (T315353) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:58 thcipriani@deploy2002: Started scap: Backport for Enable wgDiscussionToolsEnablePermalinksBackend on s7 group2 (T315353)
  • 20:50 dr0ptp4kt@deploy2002: Finished scap: Backport for Revert "Growth: Enable Welcome survey user research for enwiki" (T342353) (duration: 08m 32s)
  • 20:45 dr0ptp4kt@deploy2002: dr0ptp4kt and urbanecm: Continuing with sync
  • 20:43 dr0ptp4kt@deploy2002: dr0ptp4kt and urbanecm: Backport for Revert "Growth: Enable Welcome survey user research for enwiki" (T342353) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:41 dr0ptp4kt@deploy2002: Started scap: Backport for Revert "Growth: Enable Welcome survey user research for enwiki" (T342353)
  • 20:38 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1008.wikimedia.org with OS bullseye
  • 20:37 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1008
  • 20:37 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1008
  • 20:26 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
  • 20:26 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 20:26 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.wikimedia.org with OS bullseye
  • 20:26 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 20:26 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.wikimedia.org with OS bullseye
  • 20:25 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 20:22 dr0ptp4kt@deploy2002: Finished scap: Backport for Allow FundraiseUp scripts in Donatewiki CSP (T345379) (duration: 07m 40s)
  • 20:21 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 20:17 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 20:17 dr0ptp4kt@deploy2002: dr0ptp4kt and ejegg: Continuing with sync
  • 20:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1008']
  • 20:16 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 20:16 dr0ptp4kt@deploy2002: dr0ptp4kt and ejegg: Backport for Allow FundraiseUp scripts in Donatewiki CSP (T345379) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1008']
  • 20:16 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 20:15 dr0ptp4kt@deploy2002: Started scap: Backport for Allow FundraiseUp scripts in Donatewiki CSP (T345379)
  • 20:10 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
  • 20:10 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 20:10 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
  • 20:10 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 20:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1008']
  • 20:09 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 20:07 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1008']
  • 20:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 20:06 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
  • 20:06 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 20:06 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1008']
  • 20:06 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 20:05 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
  • 20:05 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 20:05 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.wikimedia.org with reason: host reimage
  • 20:04 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
  • 20:04 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 20:02 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.wikimedia.org with reason: host reimage
  • 20:00 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.wikimedia.org with reason: host reimage
  • 19:59 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.wikimedia.org with reason: host reimage
  • 19:58 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1008']
  • 19:58 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 19:58 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:57 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 19:56 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1008']
  • 19:56 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 19:55 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1008']
  • 19:55 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 19:55 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudelastic1008.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:47 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1008
  • 19:47 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1008
  • 19:46 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudelastic1008.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:46 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
  • 19:45 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 19:45 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1010.wikimedia.org with OS bullseye
  • 19:45 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1009.wikimedia.org with OS bullseye
  • 19:43 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
  • 19:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 19:43 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1008']
  • 19:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 19:43 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1008']
  • 19:43 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1009']
  • 19:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 19:38 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1010']
  • 19:38 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
  • 19:37 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 19:36 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
  • 19:36 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
  • 19:35 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 19:34 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
  • 19:34 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 19:33 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1009']
  • 19:31 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1010']
  • 19:03 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1008.wikimedia.org with OS bullseye
  • 19:03 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1009.wikimedia.org with OS bullseye
  • 19:03 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1010.wikimedia.org with OS bullseye
  • 19:02 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on wdqs[1022-1024].eqiad.wmnet with reason: new graph split hosts T347505
  • 19:01 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on wdqs[1022-1024].eqiad.wmnet with reason: new graph split hosts T347505
  • 17:57 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1106.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:53 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1102']
  • 17:43 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1008.wikimedia.org with OS bullseye
  • 17:43 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1009.wikimedia.org with OS bullseye
  • 17:43 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1010.wikimedia.org with OS bullseye
  • 17:37 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010
  • 17:36 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010
  • 17:35 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009
  • 17:35 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1008.wikimedia.org with OS bullseye
  • 17:35 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1010.wikimedia.org with OS bullseye
  • 17:35 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1009.wikimedia.org with OS bullseye
  • 17:34 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009
  • 17:33 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1008
  • 17:32 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:32 bking@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:31 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1008
  • 17:27 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1106
  • 17:26 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1106
  • 17:23 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1105.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:22 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1105
  • 17:21 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1105
  • 17:19 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
  • 17:13 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 17:13 pt1979@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1001"
  • 17:12 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1104']
  • 17:12 pt1979@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1001"
  • 16:57 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1067.eqiad.wmnet with reason: host reimage
  • 16:55 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 16:55 pt1979@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1001"
  • 16:53 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1067.eqiad.wmnet with reason: host reimage
  • 16:50 pt1979@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1001"
  • 16:41 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 16:35 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1064.eqiad.wmnet with reason: host reimage
  • 16:33 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1103']
  • 16:31 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1064.eqiad.wmnet with reason: host reimage
  • 16:28 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bullseye
  • 16:28 pt1979@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1001"
  • 16:27 pt1979@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1001"
  • 16:27 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1104']
  • 16:26 sukhe: enable puppet on A:dns-rec and force agent run: T348041
  • 16:25 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1104.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:24 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1104.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:22 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1103']
  • 16:19 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 16:19 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1103.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:17 sukhe: disable puppet on A:dns-rec to roll out CR: 965187 T348041
  • 16:14 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1008.wikimedia.org with OS bullseye
  • 16:14 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1010.wikimedia.org with OS bullseye
  • 16:14 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1009.wikimedia.org with OS bullseye
  • 16:14 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:14 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cp1101 - jclark@cumin1001"
  • 16:13 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cp1101 - jclark@cumin1001"
  • 16:12 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage
  • 16:11 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 16:09 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage
  • 16:09 moritzm: installing batik security updates
  • 16:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on ldap-rw[1001,2001].wikimedia.org with reason: setup in progress
  • 16:03 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on ldap-rw[1001,2001].wikimedia.org with reason: setup in progress
  • 16:03 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2001.codfw.wmnet
  • 15:57 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 15:56 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bullseye
  • 15:56 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2001.codfw.wmnet
  • 15:54 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2002.codfw.wmnet
  • 15:48 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1102.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:48 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2002.codfw.wmnet
  • 15:46 moritzm: restart FPM on mediawiki canaries to pick up new libxpm
  • 15:44 moritzm: installing libxpm security updates
  • 15:42 Lucas_WMDE: (mostly?) Finished scap: Backport for specials: Use correct title in NewPagesPager (T348665) (duration: 07m 13s) – scap failed in the purgeMessageBlobStore step (php-fpm-restarts finished)
  • 15:35 lucaswerkmeister-wmde@deploy2002: jforrester and lucaswerkmeister-wmde: Continuing with sync
  • 15:34 lucaswerkmeister-wmde@deploy2002: jforrester and lucaswerkmeister-wmde: Backport for specials: Use correct title in NewPagesPager (T348665) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:33 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for specials: Use correct title in NewPagesPager (T348665)
  • 15:31 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1007.wikimedia.org with OS bullseye
  • 15:31 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 15:30 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 15:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16591
  • 15:16 lucaswerkmeister-wmde@deploy2002: Backport cancelled.
  • 15:14 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1007.wikimedia.org with reason: host reimage
  • 15:11 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1007.wikimedia.org with reason: host reimage
  • 15:08 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 15:04 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-be2003.codfw.wmnet with OS bookworm
  • 15:00 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2003.codfw.wmnet
  • 15:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16591
  • 14:57 sukhe: stopping gdnsd on dns2006 to simulate bird prefix withdrawal
  • 14:57 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.wikimedia.org with OS bullseye
  • 14:56 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1007.wikimedia.org with OS bullseye
  • 14:56 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.wikimedia.org with OS bullseye
  • 14:53 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2003.codfw.wmnet
  • 14:52 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1003.eqiad.wmnet
  • 14:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 35008
  • 14:51 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 35008
  • 14:51 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudelastic1007.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudelastic1007.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12200
  • 14:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12200
  • 14:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 28458
  • 14:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 28458
  • 14:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 400474
  • 14:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 400474
  • 14:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 398196
  • 14:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 398196
  • 14:47 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:47 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudelastic1007 - jclark@cumin1001"
  • 14:46 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudelastic1007 - jclark@cumin1001"
  • 14:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3267
  • 14:46 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3267
  • 14:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 30132
  • 14:45 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1003.eqiad.wmnet
  • 14:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 30132
  • 14:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15703
  • 14:44 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 14:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15703
  • 14:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 25542
  • 14:42 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudelastic1007.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 25542
  • 14:42 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1002.eqiad.wmnet
  • 14:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15435
  • 14:38 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudelastic1007.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:37 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15435
  • 14:37 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudelastic1007.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 46562
  • 14:35 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1002.eqiad.wmnet
  • 14:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 46562
  • 14:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6412
  • 14:34 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudelastic1007.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6412
  • 14:33 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams: sync
  • 14:33 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams: sync
  • 14:32 sukhe: completed restarts of pdns-recursor in doh* and dns*
  • 14:30 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
  • 14:23 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
  • 14:17 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: sync
  • 14:16 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: sync
  • 14:16 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: sync
  • 14:15 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: sync
  • 14:12 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bookworm
  • 14:11 urbanecm: mwmaint2002: stop previous instance of `refreshLinkRecommendations` maintenance job (T348719)
  • 14:07 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 14:04 sukhe: sudo cumin -b1 -s120 'A:dns-rec and not P{dns6002*}' 'systemctl restart pdns-recursor.service'
  • 14:03 hashar@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.30 refs T347081
  • 14:00 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough and A:wikidough
  • 13:50 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 13:50 bking@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 13:50 bking@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 13:49 bking@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:43 sukhe: remove old ns2 IP 91.198.174.239/32 from /e/n/i on A:dns-rec: T329219
  • 13:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 54994
  • 13:37 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 54994
  • 13:35 sukhe: remove redundant 208.80.153.231/32 from /e/n/i on A:dns-rec and A:codfw (superseded by label lo:anycast): T348041
  • 13:34 kartik@deploy2002: Finished scap: Backport for Add Akan language (T333765) (duration: 09m 39s)
  • 13:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 139901
  • 13:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 139901
  • 13:28 kartik@deploy2002: kartik and srishakatux: Continuing with sync
  • 13:26 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host archiva1002.wikimedia.org
  • 13:26 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15133
  • 13:25 kartik@deploy2002: kartik and srishakatux: Backport for Add Akan language (T333765) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:25 kartik@deploy2002: Started scap: Backport for Add Akan language (T333765)
  • 13:24 sukhe@cumin2002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough and A:wikidough
  • 13:24 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15133
  • 13:23 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host archiva1002.wikimedia.org
  • 13:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 40317
  • 13:19 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 40317
  • 13:18 hashar@deploy2002: Finished scap: Backport for LinkRecommendationUpdater: Update $linkRecommendationTaskType declaration (T348719) (duration: 06m 51s)
  • 13:13 hashar@deploy2002: phuedx and hashar: Continuing with sync
  • 13:13 hashar@deploy2002: phuedx and hashar: Backport for LinkRecommendationUpdater: Update $linkRecommendationTaskType declaration (T348719) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:11 hashar@deploy2002: Started scap: Backport for LinkRecommendationUpdater: Update $linkRecommendationTaskType declaration (T348719)
  • 12:26 jayme: re-enable puppet on A:cp - T347544
  • 12:18 jayme: disable puppet on A:cp - T347544
  • 12:16 jayme: disable puppet on A:cp-text - T347544
  • 11:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 11:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 11:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 11:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 11:37 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 11:36 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 11:34 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 11:33 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: testing
  • 11:30 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: testing
  • 11:27 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 11:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 11:21 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 11:20 jayme@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 10:52 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: sync
  • 10:51 elukey@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: sync
  • 10:50 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: sync
  • 10:49 elukey@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: sync
  • 10:26 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: sync
  • 10:26 elukey@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: sync
  • 10:26 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: sync
  • 10:15 elukey@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: sync
  • 10:13 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: sync
  • 10:03 elukey@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: sync
  • 09:40 fabfur: repooling cp4040 (depooled for T347837 and forgot)
  • 09:37 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-coord1002.eqiad.wmnet
  • 09:31 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-coord1002.eqiad.wmnet
  • 09:31 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for an-master1002.eqiad.wmnet
  • 09:31 btullis@cumin1001: START - Cookbook sre.hosts.remove-downtime for an-master1002.eqiad.wmnet
  • 09:17 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on an-master1002.eqiad.wmnet with reason: Rebooting misbehaving an-master1002
  • 09:16 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on an-master1002.eqiad.wmnet with reason: Rebooting misbehaving an-master1002
  • 08:53 hashar@deploy2002: rebuilt and synchronized wikiversions files: Revert "group2 wikis to 1.41.0-wmf.30" # T347081
  • 08:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 56099
  • 08:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 56099
  • 08:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 38195
  • 08:41 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 38195
  • 08:40 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 38195
  • 08:39 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 38195
  • 08:38 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 08:38 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 08:38 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 08:38 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 08:35 godog: add 200G to prometheus/ops in eqiad
  • 08:28 hashar@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.30 refs T347081
  • 08:15 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 06:59 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Arturo Borrero Gonzalez out of all services on: 2156 hosts
  • 06:58 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Arturo Borrero Gonzalez out of all services on: 2156 hosts
  • 06:46 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 00:09 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1063.eqiad.wmnet with OS bullseye

2023-10-11

  • 23:23 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bullseye
  • 23:22 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 23:09 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 23:05 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host moss-be2003.codfw.wmnet with OS bullseye
  • 22:47 eileen: civicrm upgraded from f2f1e23e to ceaeaa19
  • 22:46 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bullseye
  • 22:18 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:18 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:15 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:15 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:05 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir1001.eqiad.wmnet with OS bookworm
  • 21:49 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir1001.eqiad.wmnet with reason: host reimage
  • 21:47 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir1001.eqiad.wmnet with reason: host reimage
  • 21:30 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for apifeatureusage2001.codfw.wmnet,apifeatureusage1001.eqiad.wmnet
  • 21:30 ryankemper@cumin1001: START - Cookbook sre.hosts.remove-downtime for apifeatureusage2001.codfw.wmnet,apifeatureusage1001.eqiad.wmnet
  • 21:30 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir1001.eqiad.wmnet with OS bookworm
  • 21:26 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir1002.eqiad.wmnet with OS bookworm
  • 21:20 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on apifeatureusage2001.codfw.wmnet with reason: reboot T348418
  • 21:20 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on apifeatureusage2001.codfw.wmnet with reason: reboot T348418
  • 21:11 ryankemper: T348418 Rebooting `apifeatureusage1001.eqiad.wmnet`
  • 21:09 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir1002.eqiad.wmnet with reason: host reimage
  • 21:07 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir1002.eqiad.wmnet with reason: host reimage
  • 21:06 taavi@deploy2002: Finished scap: Backport for Set WRITE_NEW for CA wikis on OATHAuth multiple devices (T242031) (duration: 10m 33s)
  • 21:01 taavi@deploy2002: taavi: Continuing with sync
  • 20:57 taavi@deploy2002: taavi: Backport for Set WRITE_NEW for CA wikis on OATHAuth multiple devices (T242031) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:55 taavi@deploy2002: Started scap: Backport for Set WRITE_NEW for CA wikis on OATHAuth multiple devices (T242031)
  • 20:54 cstone: payments-wiki upgraded from d6ad0376 to aa5cd24d
  • 20:54 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir1002.eqiad.wmnet with OS bookworm
  • 20:45 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:45 bking@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:44 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:43 bking@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:40 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir2001.codfw.wmnet with OS bookworm
  • 20:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir2001.codfw.wmnet with reason: host reimage
  • 20:22 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir2001.codfw.wmnet with reason: host reimage
  • 20:19 samtar@deploy2002: Finished scap: Backport for Remove override to allow mobile edit notices to display on all wikis (T316178) (duration: 08m 18s)
  • 20:14 samtar@deploy2002: kemayo and samtar: Continuing with sync
  • 20:13 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bullseye
  • 20:13 samtar@deploy2002: kemayo and samtar: Backport for Remove override to allow mobile edit notices to display on all wikis (T316178) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:12 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-be2003.codfw.wmnet with OS bullseye
  • 20:11 samtar@deploy2002: Started scap: Backport for Remove override to allow mobile edit notices to display on all wikis (T316178)
  • 20:11 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bullseye
  • 20:09 samtar@deploy2002: Finished scap: Backport for Enable Edit Check on initial partner wikis (T347908) (duration: 07m 32s)
  • 20:07 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:07 bking@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:04 bking@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 20:04 bking@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 20:04 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir2001.codfw.wmnet with OS bookworm
  • 20:04 samtar@deploy2002: samtar and kemayo: Continuing with sync
  • 20:04 bking@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 20:04 bking@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 20:03 samtar@deploy2002: samtar and kemayo: Backport for Enable Edit Check on initial partner wikis (T347908) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:03 bking@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 20:03 bking@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 20:03 bking@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 20:02 bking@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 20:02 samtar@deploy2002: Started scap: Backport for Enable Edit Check on initial partner wikis (T347908)
  • 20:00 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1104']
  • 20:00 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1104']
  • 19:54 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir2002.codfw.wmnet with OS bookworm
  • 19:52 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1103.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:44 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1102.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:40 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir2002.codfw.wmnet with reason: host reimage
  • 19:37 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir2002.codfw.wmnet with reason: host reimage
  • 19:12 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir2002.codfw.wmnet with OS bookworm
  • 19:10 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir3003.esams.wmnet with OS bookworm
  • 19:08 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1101']
  • 19:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T343198)', diff saved to https://phabricator.wikimedia.org/P52914 and previous config saved to /var/cache/conftool/dbconfig/20231011-190408-arnaudb.json
  • 18:49 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host stat1011.eqiad.wmnet with OS bullseye
  • 18:49 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 18:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P52913 and previous config saved to /var/cache/conftool/dbconfig/20231011-184902-arnaudb.json
  • 18:48 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 18:46 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir3003.esams.wmnet with reason: host reimage
  • 18:43 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir3003.esams.wmnet with reason: host reimage
  • 18:36 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 18:36 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 18:36 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 18:36 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 18:35 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:35 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P52911 and previous config saved to /var/cache/conftool/dbconfig/20231011-183355-arnaudb.json
  • 18:33 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 18:33 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 18:33 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 18:32 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 18:31 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:31 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:25 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:24 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:24 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 18:23 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 18:23 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 18:23 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on stat1011.eqiad.wmnet with reason: host reimage
  • 18:22 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 18:21 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:21 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:19 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on stat1011.eqiad.wmnet with reason: host reimage
  • 18:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T343198)', diff saved to https://phabricator.wikimedia.org/P52910 and previous config saved to /var/cache/conftool/dbconfig/20231011-181849-arnaudb.json
  • 18:18 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir3003.esams.wmnet with OS bookworm
  • 18:08 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 18:07 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 18:07 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 18:07 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 18:05 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:04 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 17:56 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 17:55 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir3004.esams.wmnet with OS bookworm
  • 17:55 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 17:47 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 17:47 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 17:32 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir3004.esams.wmnet with reason: host reimage
  • 17:28 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir3004.esams.wmnet with reason: host reimage
  • 17:27 sukhe: repool cp2030 for service=cdn
  • 17:03 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir3004.esams.wmnet with OS bookworm
  • 16:57 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:57 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:53 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host stat1011.eqiad.wmnet with OS bullseye
  • 16:48 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host stat1011.eqiad.wmnet with OS bullseye
  • 16:48 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host stat1011.eqiad.wmnet with OS bullseye
  • 16:47 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['stat1011']
  • 16:46 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['stat1011']
  • 16:44 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host stat1011.eqiad.wmnet with OS bullseye
  • 16:44 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host stat1011.eqiad.wmnet with OS bullseye
  • 16:43 taavi@deploy2002: Finished scap: Backport for Don't double-escape link contents (T348669) (duration: 07m 35s)
  • 16:38 taavi@deploy2002: taavi: Continuing with sync
  • 16:37 taavi@deploy2002: taavi: Backport for Don't double-escape link contents (T348669) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:36 taavi@deploy2002: Started scap: Backport for Don't double-escape link contents (T348669)
  • 16:29 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir5001.eqsin.wmnet with OS bookworm
  • 15:57 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir5001.eqsin.wmnet with reason: host reimage
  • 15:54 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir5001.eqsin.wmnet with reason: host reimage
  • 15:53 jayme@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mw-wikifunctions.discovery.wmnet on codfw recursors
  • 15:53 jayme@cumin1001: START - Cookbook sre.dns.wipe-cache mw-wikifunctions.discovery.wmnet on codfw recursors
  • 15:53 jayme@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mw-wikifunctions.discovery.wmnet on eqiad recursors
  • 15:53 jayme@cumin1001: START - Cookbook sre.dns.wipe-cache mw-wikifunctions.discovery.wmnet on eqiad recursors
  • 15:52 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host stat1011.eqiad.wmnet with OS bullseye
  • 15:52 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host stat1011.eqiad.wmnet with OS bullseye
  • 15:25 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 15:25 vgutierrez: depool ncredir5001
  • 15:23 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:22 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 15:22 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 15:21 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 15:20 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:20 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 15:18 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir5001.eqsin.wmnet with OS bookworm
  • 15:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on apt1002.wikimedia.org with reason: setup in progress
  • 15:04 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on apt1002.wikimedia.org with reason: setup in progress
  • 14:55 jayme: restarting pybal on lvs1019 and lvs2013
  • 14:52 jayme: restarting pybal on lvs1020 and lvs2014
  • 14:49 jayme: running puppet on 'O:lvs::balancer'
  • 14:45 jayme: disabling puppet on 'P{O:lvs::balancer} and (A:codfw or A:eqiad)'
  • 14:28 claime: Running authdns-update - T348631
  • 14:25 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 14:25 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 14:25 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 14:25 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 14:24 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 14:23 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 14:22 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1101']
  • 14:21 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 14:21 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 14:21 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:18 moritzm: installing curl security updates on bullseye/bookworm
  • 14:17 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:15 jayme@deploy2002: Finished scap: (no justification provided) (duration: 02m 15s)
  • 14:13 jayme@deploy2002: Started scap: (no justification provided)
  • 14:07 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:06 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Edit check: Simplify "experience" config to "maximumEditcount" (duration: 07m 13s)
  • 14:05 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and kemayo: Continuing with sync
  • 14:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and kemayo: Backport for Edit check: Simplify "experience" config to "maximumEditcount" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:58 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Edit check: Simplify "experience" config to "maximumEditcount"
  • 13:58 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 13:58 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 13:50 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 13:45 elukey: restart kube-apiserver on ml-serve-ctrl1002
  • 13:42 elukey: restart kube-apiserver on ml-serve-ctrl1001 as attempt to clear a weird golang/protobuf issue while retrieving secrets
  • 13:40 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 13:40 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 13:39 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 13:39 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 13:38 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 13:38 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 13:37 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:37 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:37 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 150552
  • 13:37 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 150552
  • 13:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 38628
  • 13:36 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 13:36 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 38628
  • 13:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 40317
  • 13:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 40317
  • 13:34 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 13:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 38195
  • 13:30 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 38195
  • 13:28 sukhe: disable puppet on P:bird::anycast: T348041
  • 13:28 sukhe: disable puppet on P:bird::anycast
  • 13:27 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['sretest1003.eqiad.wmnet']
  • 13:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9031
  • 13:27 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9031
  • 13:26 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6368
  • 13:26 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6368
  • 13:25 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2497
  • 13:25 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2497
  • 13:24 urandom: starting decommission of restbase2012-a — T328490
  • 13:24 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 13:23 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 13:16 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003.eqiad.wmnet']
  • 13:16 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sretest1003.eqiad.wmnet']
  • 13:16 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003.eqiad.wmnet']
  • 13:15 jbond@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['sretest1003.eqiad.wmnet']
  • 13:14 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 13:14 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 13:02 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 12:59 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003.eqiad.wmnet']
  • 12:56 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 12:55 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:53 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:53 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:52 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 12:52 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:51 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 12:38 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:38 cgoubert@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Cleanup decommissioned services apple-search and graphoid - cgoubert@cumin1001"
  • 12:37 cgoubert@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Cleanup decommissioned services apple-search and graphoid - cgoubert@cumin1001"
  • 12:34 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
  • 12:34 cgoubert@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 12:33 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
  • 12:16 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:16 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove ORES svc records - elukey@cumin1001"
  • 12:15 elukey@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove ORES svc records - elukey@cumin1001"
  • 12:12 elukey@cumin1001: START - Cookbook sre.dns.netbox
  • 12:00 kart_: Updated cxserver to 2023-10-11-114410-production (T341478, T347939)
  • 12:00 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 11:59 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 11:58 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 11:57 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 11:55 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 11:54 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 11:28 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on druid1011.eqiad.wmnet with reason: Downtime as we setup the host to join the druid and zookeper cluster
  • 11:27 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on druid1011.eqiad.wmnet with reason: Downtime as we setup the host to join the druid and zookeper cluster
  • 11:12 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:12 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 11:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T343198)', diff saved to https://phabricator.wikimedia.org/P52901 and previous config saved to /var/cache/conftool/dbconfig/20231011-110127-arnaudb.json
  • 11:01 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 11:01 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 11:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T343198)', diff saved to https://phabricator.wikimedia.org/P52900 and previous config saved to /var/cache/conftool/dbconfig/20231011-110105-arnaudb.json
  • 10:52 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:52 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P52899 and previous config saved to /var/cache/conftool/dbconfig/20231011-104558-arnaudb.json
  • 10:30 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P52898 and previous config saved to /var/cache/conftool/dbconfig/20231011-103052-arnaudb.json
  • 10:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T343198)', diff saved to https://phabricator.wikimedia.org/P52897 and previous config saved to /var/cache/conftool/dbconfig/20231011-101545-arnaudb.json
  • 10:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
  • 09:52 moritzm: rebuilding RAID after disk replacement T348429
  • 09:52 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 09:49 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 09:34 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 09:31 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1001.eqiad.wmnet with OS bullseye
  • 09:23 jayme@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:23 jayme@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add VIPs for mw-wikifunction - jayme@cumin1001"
  • 09:23 jayme@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add VIPs for mw-wikifunction - jayme@cumin1001"
  • 09:19 jayme@cumin1001: START - Cookbook sre.dns.netbox
  • 09:15 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 08:53 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 08:44 hashar@deploy2002: Synchronized php: group1 wikis to 1.41.0-wmf.30 refs T347081 (duration: 06m 00s)
  • 08:38 hashar@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.30 refs T347081
  • 08:00 hashar@deploy2002: Synchronized php-1.41.0-wmf.30/skins/Vector: Backports for Vector styling issues T348572 T348530 (duration: 06m 16s)
  • 07:35 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 07:35 sgimeno@deploy2002: Finished scap: Backport for GrowthExperiments: enable AddLink backend 15th round of wikis (T308141) (duration: 07m 45s)
  • 07:29 sgimeno@deploy2002: sgimeno: Continuing with sync
  • 07:28 sgimeno@deploy2002: sgimeno: Backport for GrowthExperiments: enable AddLink backend 15th round of wikis (T308141) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:27 sgimeno@deploy2002: Started scap: Backport for GrowthExperiments: enable AddLink backend 15th round of wikis (T308141)
  • 07:24 sgimeno@deploy2002: Finished scap: Backport for GrowthExperiments: enable AddLink frontend 14th round of wikis (T308139) (duration: 09m 05s)
  • 07:19 sgimeno@deploy2002: sgimeno: Continuing with sync
  • 07:17 sgimeno@deploy2002: sgimeno: Backport for GrowthExperiments: enable AddLink frontend 14th round of wikis (T308139) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:15 sgimeno@deploy2002: Started scap: Backport for GrowthExperiments: enable AddLink frontend 14th round of wikis (T308139)
  • 05:46 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 05:45 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 05:45 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 05:45 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 05:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 05:44 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 05:24 kart_: Updated cxserver to 2023-10-11-045323-production (T341478, T344982, T338432, T347939)
  • 05:21 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 05:21 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 05:19 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 05:18 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 05:11 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 05:10 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 03:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T343198)', diff saved to https://phabricator.wikimedia.org/P52896 and previous config saved to /var/cache/conftool/dbconfig/20231011-030054-arnaudb.json
  • 03:00 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 03:00 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 03:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T343198)', diff saved to https://phabricator.wikimedia.org/P52895 and previous config saved to /var/cache/conftool/dbconfig/20231011-030032-arnaudb.json
  • 02:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P52894 and previous config saved to /var/cache/conftool/dbconfig/20231011-024526-arnaudb.json
  • 02:30 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P52893 and previous config saved to /var/cache/conftool/dbconfig/20231011-023019-arnaudb.json
  • 02:18 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1104.mgmt.eqiad.wmnet with reboot policy FORCED
  • 02:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T343198)', diff saved to https://phabricator.wikimedia.org/P52892 and previous config saved to /var/cache/conftool/dbconfig/20231011-021513-arnaudb.json
  • 02:03 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1104.mgmt.eqiad.wmnet with reboot policy FORCED
  • 02:02 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1104
  • 02:01 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1104

2023-10-10

  • 22:45 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ncredir5001.eqsin.wmnet with OS bookworm
  • 22:41 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 22:40 cstone: SmashPig upgraded from a78a91d9 to 211284b9
  • 22:13 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 21:45 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f6-eqiad
  • 21:43 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-f6-eqiad
  • 21:34 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir5001.eqsin.wmnet with OS bookworm
  • 21:33 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ncredir5001.eqsin.wmnet with OS bookworm
  • 20:48 taavi@deploy2002: Finished scap: Backport for Set READ_NEW for CA wikis on OATHAuth multiple devices (T242031) (duration: 08m 24s)
  • 20:43 taavi@deploy2002: taavi: Continuing with sync
  • 20:41 taavi@deploy2002: taavi: Backport for Set READ_NEW for CA wikis on OATHAuth multiple devices (T242031) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:40 taavi@deploy2002: Started scap: Backport for Set READ_NEW for CA wikis on OATHAuth multiple devices (T242031)
  • 20:19 hmonroy@deploy2002: Finished scap: Backport for diffs: add line number headings to inline diffs (T346460) (duration: 30m 26s)
  • 20:17 eileen: civicrm upgraded from 4329014b to f2f1e23e
  • 20:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir5001.eqsin.wmnet with OS bookworm
  • 20:13 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host ncredir5001.eqsin.wmnet with OS bookworm
  • 20:07 hmonroy@deploy2002: musikanimal and hmonroy: Continuing with sync
  • 20:07 hmonroy@deploy2002: musikanimal and hmonroy: Backport for diffs: add line number headings to inline diffs (T346460) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 19:49 hmonroy@deploy2002: Started scap: Backport for diffs: add line number headings to inline diffs (T346460)
  • 19:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T343198)', diff saved to https://phabricator.wikimedia.org/P52890 and previous config saved to /var/cache/conftool/dbconfig/20231010-194311-arnaudb.json
  • 19:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 19:42 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 19:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T343198)', diff saved to https://phabricator.wikimedia.org/P52889 and previous config saved to /var/cache/conftool/dbconfig/20231010-194249-arnaudb.json
  • 19:33 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
  • 19:33 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/mathoid: apply
  • 19:33 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 19:32 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 19:32 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 19:31 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 19:29 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: changing bgp rr config
  • 19:29 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: changing bgp rr config
  • 19:29 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: changing bgp rr config
  • 19:29 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: changing bgp rr config
  • 19:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P52888 and previous config saved to /var/cache/conftool/dbconfig/20231010-192742-arnaudb.json
  • 19:26 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 19:26 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 19:26 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 19:25 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 19:24 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 19:23 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 19:22 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 19:22 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 19:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir5001.eqsin.wmnet with OS bookworm
  • 19:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P52887 and previous config saved to /var/cache/conftool/dbconfig/20231010-191236-arnaudb.json
  • 18:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T343198)', diff saved to https://phabricator.wikimedia.org/P52886 and previous config saved to /var/cache/conftool/dbconfig/20231010-185730-arnaudb.json
  • 18:15 bvibber: brion running TimedMediaHandler requeueTranscodes.php batch jobs on mwmaint2002. expect many deletions & new file stores on swift
  • 18:11 ejegg: fundraising python tools upgraded from 2e19cd39 to 0c17296c
  • 18:10 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: changing bgp rr config
  • 18:09 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: changing bgp rr config
  • 18:07 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: changing bgp rr config
  • 18:06 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: changing bgp rr config
  • 18:01 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 17:59 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 17:56 topranks: disable BGP RR_CLIENT peerings on lsw1-e1-eqiad
  • 17:52 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f5-eqiad
  • 17:50 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-f5-eqiad
  • 17:46 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e6-eqiad
  • 17:44 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-e6-eqiad
  • 17:41 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e5-eqiad
  • 17:39 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-e5-eqiad
  • 17:23 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f7-eqiad
  • 17:22 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-f7-eqiad
  • 17:21 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e7-eqiad
  • 17:21 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-e7-eqiad
  • 17:15 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add eqiad new row switches - cmooney@cumin1001"
  • 17:14 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add eqiad new row switches - cmooney@cumin1001"
  • 17:14 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add eqiad new row switches - cmooney@cumin1001"
  • 17:13 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add eqiad new row switches - cmooney@cumin1001"
  • 16:32 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 16:21 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:21 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:18 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:18 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:18 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cp1101 - jclark@cumin1001"
  • 16:17 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cp1101 - jclark@cumin1001"
  • 16:14 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 16:11 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:09 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:06 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:05 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:05 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:03 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:03 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:02 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:02 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:00 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:00 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:58 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:58 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:54 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:54 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:52 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:52 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:46 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 15:34 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1100']
  • 15:23 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:23 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:06 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1100']
  • 14:15 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1229.eqiad.wmnet with OS bullseye
  • 14:10 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 14:06 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 14:06 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 14:05 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 14:05 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 14:02 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
  • 13:58 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 13:57 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1063.eqiad.wmnet with OS bullseye
  • 13:57 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 13:54 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 13:54 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 13:52 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 13:52 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 13:50 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 13:49 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 13:48 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 13:44 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 13:44 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 13:42 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:40 urbanecm@deploy2002: Finished scap: Backport for Growth: Enable Welcome survey user research for enwiki (T342353) (duration: 13m 19s)
  • 13:39 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:37 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 13:36 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 13:35 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 13:33 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 13:32 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 13:32 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 13:29 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 13:28 urbanecm@deploy2002: urbanecm: Backport for Growth: Enable Welcome survey user research for enwiki (T342353) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:27 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bullseye
  • 13:26 urbanecm@deploy2002: Started scap: Backport for Growth: Enable Welcome survey user research for enwiki (T342353)
  • 13:26 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:25 urbanecm@deploy2002: Finished scap: Backport for cswiki: Remove engineer group (T348279) (duration: 07m 24s)
  • 13:24 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:24 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 13:23 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 13:22 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 13:22 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 13:20 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 13:19 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 13:19 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 13:19 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 13:19 urbanecm@deploy2002: urbanecm: Backport for cswiki: Remove engineer group (T348279) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:18 urbanecm@deploy2002: Started scap: Backport for cswiki: Remove engineer group (T348279)
  • 13:17 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:17 urbanecm@deploy2002: Finished scap: Backport for growth: Enable section-image recommendations on 10 new wikis (T345940) (duration: 09m 59s)
  • 13:16 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 13:15 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:11 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 13:08 urbanecm@deploy2002: urbanecm: Backport for growth: Enable section-image recommendations on 10 new wikis (T345940) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:07 urbanecm@deploy2002: Started scap: Backport for growth: Enable section-image recommendations on 10 new wikis (T345940)
  • 13:02 fnegri@cumin1001: START - Cookbook sre.dns.netbox
  • 12:19 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1100.eqiad.wmnet']
  • 12:18 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1100.eqiad.wmnet']
  • 12:02 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 12:01 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 11:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3312 (T343198)', diff saved to https://phabricator.wikimedia.org/P52885 and previous config saved to /var/cache/conftool/dbconfig/20231010-114024-arnaudb.json
  • 11:40 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 11:40 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 11:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T343198)', diff saved to https://phabricator.wikimedia.org/P52884 and previous config saved to /var/cache/conftool/dbconfig/20231010-114002-arnaudb.json
  • 11:33 volans: installed spicerack 7.4.1 on the cumin hosts
  • 11:33 cgoubert@cumin1001: END (PASS) - Cookbook sre.mediawiki.restart-appservers (exit_code=0)
  • 11:32 cgoubert@cumin1001: START - Cookbook sre.mediawiki.restart-appservers
  • 11:30 cgoubert@cumin1001: END (PASS) - Cookbook sre.mediawiki.restart-appservers (exit_code=0)
  • 11:29 cgoubert@cumin1001: START - Cookbook sre.mediawiki.restart-appservers
  • 11:24 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P52883 and previous config saved to /var/cache/conftool/dbconfig/20231010-112456-arnaudb.json
  • 11:09 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P52882 and previous config saved to /var/cache/conftool/dbconfig/20231010-110950-arnaudb.json
  • 10:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T343198)', diff saved to https://phabricator.wikimedia.org/P52880 and previous config saved to /var/cache/conftool/dbconfig/20231010-105443-arnaudb.json
  • 10:52 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1100.eqiad.wmnet']
  • 10:52 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1100.eqiad.wmnet']
  • 09:56 ladsgroup@deploy2002: Finished scap: Backport for Set pagelinks migration stage of cebwiki to write both (T345732) (duration: 09m 10s)
  • 09:50 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 09:48 ladsgroup@deploy2002: ladsgroup: Backport for Set pagelinks migration stage of cebwiki to write both (T345732) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:47 ladsgroup@deploy2002: Started scap: Backport for Set pagelinks migration stage of cebwiki to write both (T345732)
  • 09:33 volans: uploaded spicerack_7.4.1 to apt.wikimedia.org bullseye-wikimedia
  • 08:35 hashar@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.30 refs T347081
  • 08:24 taavi: wikitech-static: cleanup image archive directory: T348503
  • 08:09 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T343198)', diff saved to https://phabricator.wikimedia.org/P52879 and previous config saved to /var/cache/conftool/dbconfig/20231010-080924-arnaudb.json
  • 08:09 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 08:09 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 08:09 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 08:08 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 08:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T343198)', diff saved to https://phabricator.wikimedia.org/P52878 and previous config saved to /var/cache/conftool/dbconfig/20231010-080847-arnaudb.json
  • 08:00 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 07:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P52877 and previous config saved to /var/cache/conftool/dbconfig/20231010-075340-arnaudb.json
  • 07:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P52876 and previous config saved to /var/cache/conftool/dbconfig/20231010-073834-arnaudb.json
  • 07:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T343198)', diff saved to https://phabricator.wikimedia.org/P52875 and previous config saved to /var/cache/conftool/dbconfig/20231010-072327-arnaudb.json
  • 07:19 kostajh: UTC morning deploys done
  • 07:18 kharlan@deploy2002: Finished scap: Backport for ReportIncident: Set developer mode to false (duration: 10m 17s)
  • 07:12 kharlan@deploy2002: kharlan: Continuing with sync
  • 07:09 kharlan@deploy2002: kharlan: Backport for ReportIncident: Set developer mode to false synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:08 kharlan@deploy2002: Started scap: Backport for ReportIncident: Set developer mode to false
  • 06:42 moritzm: installing qemu security updates on bookworm
  • 03:54 mwpresync@deploy2002: Pruned MediaWiki: 1.41.0-wmf.28 (duration: 02m 08s)
  • 03:52 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.41.0-wmf.30 refs T347081 (duration: 49m 56s)
  • 03:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.41.0-wmf.30 refs T347081

2023-10-09

  • 22:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T343198)', diff saved to https://phabricator.wikimedia.org/P52873 and previous config saved to /var/cache/conftool/dbconfig/20231009-225429-arnaudb.json
  • 22:54 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 22:54 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 22:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T343198)', diff saved to https://phabricator.wikimedia.org/P52872 and previous config saved to /var/cache/conftool/dbconfig/20231009-225407-arnaudb.json
  • 22:39 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P52871 and previous config saved to /var/cache/conftool/dbconfig/20231009-223900-arnaudb.json
  • 22:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P52870 and previous config saved to /var/cache/conftool/dbconfig/20231009-222354-arnaudb.json
  • 22:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T343198)', diff saved to https://phabricator.wikimedia.org/P52869 and previous config saved to /var/cache/conftool/dbconfig/20231009-220848-arnaudb.json
  • 20:42 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1156.eqiad.wmnet
  • 20:34 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1156.eqiad.wmnet
  • 20:34 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1155.eqiad.wmnet
  • 20:26 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1155.eqiad.wmnet
  • 20:26 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1154.eqiad.wmnet
  • 20:17 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1154.eqiad.wmnet
  • 20:17 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1153.eqiad.wmnet
  • 20:09 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1153.eqiad.wmnet
  • 20:09 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1152.eqiad.wmnet
  • 20:02 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1152.eqiad.wmnet
  • 20:02 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1151.eqiad.wmnet
  • 19:54 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1151.eqiad.wmnet
  • 19:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1150.eqiad.wmnet
  • 19:47 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1150.eqiad.wmnet
  • 19:47 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1149.eqiad.wmnet
  • 19:40 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1149.eqiad.wmnet
  • 19:40 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1148.eqiad.wmnet
  • 19:32 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1148.eqiad.wmnet
  • 19:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1147.eqiad.wmnet
  • 19:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T343198)', diff saved to https://phabricator.wikimedia.org/P52868 and previous config saved to /var/cache/conftool/dbconfig/20231009-193219-arnaudb.json
  • 19:32 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 19:31 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 19:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1147.eqiad.wmnet
  • 19:23 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1146.eqiad.wmnet
  • 19:16 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1146.eqiad.wmnet
  • 19:16 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1145.eqiad.wmnet
  • 19:08 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1145.eqiad.wmnet
  • 19:08 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1144.eqiad.wmnet
  • 19:01 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1144.eqiad.wmnet
  • 19:01 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1143.eqiad.wmnet
  • 18:55 ladsgroup@deploy2002: Finished scap: Backport for Update interwiki cache (duration: 100m 07s)
  • 18:54 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1143.eqiad.wmnet
  • 18:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1142.eqiad.wmnet
  • 18:49 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 18:46 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1142.eqiad.wmnet
  • 18:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1141.eqiad.wmnet
  • 18:39 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1141.eqiad.wmnet
  • 18:39 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1140.eqiad.wmnet
  • 18:36 mforns@deploy2002: Finished deploy [airflow-dags/analytics@c334eaf]: (no justification provided) (duration: 01m 12s)
  • 18:35 mforns@deploy2002: Started deploy [airflow-dags/analytics@c334eaf]: (no justification provided)
  • 18:33 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1140.eqiad.wmnet
  • 18:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1139.eqiad.wmnet
  • 18:25 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1139.eqiad.wmnet
  • 18:24 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1138.eqiad.wmnet
  • 18:15 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1138.eqiad.wmnet
  • 18:15 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1137.eqiad.wmnet
  • 18:08 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1137.eqiad.wmnet
  • 18:08 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1136.eqiad.wmnet
  • 17:58 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1136.eqiad.wmnet
  • 17:58 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1135.eqiad.wmnet
  • 17:49 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1135.eqiad.wmnet
  • 17:49 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1134.eqiad.wmnet
  • 17:42 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1134.eqiad.wmnet
  • 17:41 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1133.eqiad.wmnet
  • 17:35 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1133.eqiad.wmnet
  • 17:35 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1132.eqiad.wmnet
  • 17:27 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1132.eqiad.wmnet
  • 17:27 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1131.eqiad.wmnet
  • 17:24 ladsgroup@deploy2002: ladsgroup: Backport for Update interwiki cache synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 17:20 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1131.eqiad.wmnet
  • 17:20 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1130.eqiad.wmnet
  • 17:15 ladsgroup@deploy2002: Started scap: Backport for Update interwiki cache
  • 17:11 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1130.eqiad.wmnet
  • 17:11 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1129.eqiad.wmnet
  • 17:04 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1129.eqiad.wmnet
  • 17:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1128.eqiad.wmnet
  • 16:56 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1128.eqiad.wmnet
  • 16:56 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1127.eqiad.wmnet
  • 16:47 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1127.eqiad.wmnet
  • 16:47 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1126.eqiad.wmnet
  • 16:40 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1126.eqiad.wmnet
  • 16:39 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1125.eqiad.wmnet
  • 16:32 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1125.eqiad.wmnet
  • 16:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1124.eqiad.wmnet
  • 16:26 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1124.eqiad.wmnet
  • 16:26 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1123.eqiad.wmnet
  • 16:18 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1123.eqiad.wmnet
  • 16:18 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1122.eqiad.wmnet
  • 16:11 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1122.eqiad.wmnet
  • 16:11 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1121.eqiad.wmnet
  • 16:11 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:11 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:03 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1121.eqiad.wmnet
  • 16:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1120.eqiad.wmnet
  • 15:55 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1120.eqiad.wmnet
  • 15:55 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1119.eqiad.wmnet
  • 15:49 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1119.eqiad.wmnet
  • 15:49 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1118.eqiad.wmnet
  • 15:42 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1118.eqiad.wmnet
  • 15:42 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1117.eqiad.wmnet
  • 15:34 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1117.eqiad.wmnet
  • 15:34 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1116.eqiad.wmnet
  • 15:31 moritzm: installing qemu security updates on bookworm
  • 15:27 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1116.eqiad.wmnet
  • 15:27 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1115.eqiad.wmnet
  • 15:20 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1115.eqiad.wmnet
  • 15:19 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1114.eqiad.wmnet
  • 15:12 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1114.eqiad.wmnet
  • 15:12 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1113.eqiad.wmnet
  • 15:09 volans: installed spicerack 7.4.0 to cumin2002
  • 15:08 moritzm: installing nftables bugfix updates from Bookworm point release
  • 15:02 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1113.eqiad.wmnet
  • 15:02 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1112.eqiad.wmnet
  • 14:55 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1112.eqiad.wmnet
  • 14:55 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1111.eqiad.wmnet
  • 14:47 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1111.eqiad.wmnet
  • 14:47 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1110.eqiad.wmnet
  • 14:40 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1110.eqiad.wmnet
  • 14:40 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1109.eqiad.wmnet
  • 14:32 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1109.eqiad.wmnet
  • 14:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1108.eqiad.wmnet
  • 14:27 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1108.eqiad.wmnet
  • 14:25 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1107.eqiad.wmnet
  • 14:18 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1107.eqiad.wmnet
  • 14:17 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1106.eqiad.wmnet
  • 14:10 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1106.eqiad.wmnet
  • 14:10 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1105.eqiad.wmnet
  • 14:02 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1105.eqiad.wmnet
  • 14:02 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1104.eqiad.wmnet
  • 13:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 13:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 13:54 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1104.eqiad.wmnet
  • 13:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1103.eqiad.wmnet
  • 13:48 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1001.eqiad.wmnet
  • 13:48 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1001.eqiad.wmnet
  • 13:48 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1001.eqiad.wmnet
  • 13:47 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1001.eqiad.wmnet
  • 13:47 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1001.eqiad.wmnet
  • 13:46 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1001.eqiad.wmnet
  • 13:46 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1103.eqiad.wmnet
  • 13:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1102.eqiad.wmnet
  • 13:46 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1001.eqiad.wmnet
  • 13:46 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1001.eqiad.wmnet
  • 13:46 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1001.eqiad.wmnet
  • 13:43 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1001.eqiad.wmnet
  • 13:40 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1102.eqiad.wmnet
  • 13:40 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1101.eqiad.wmnet
  • 13:35 volans: uploaded spicerack_7.4.0 to apt.wikimedia.org bullseye-wikimedia
  • 13:32 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1101.eqiad.wmnet
  • 13:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1100.eqiad.wmnet
  • 13:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1100.eqiad.wmnet
  • 13:24 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1099.eqiad.wmnet
  • 13:16 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1099.eqiad.wmnet
  • 13:16 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1098.eqiad.wmnet
  • 13:06 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1098.eqiad.wmnet
  • 13:06 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1097.eqiad.wmnet
  • 12:58 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1097.eqiad.wmnet
  • 12:58 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1095.eqiad.wmnet
  • 12:52 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1095.eqiad.wmnet
  • 12:52 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1094.eqiad.wmnet
  • 12:46 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1094.eqiad.wmnet
  • 12:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1093.eqiad.wmnet
  • 12:40 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1093.eqiad.wmnet
  • 12:40 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1092.eqiad.wmnet
  • 12:35 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1092.eqiad.wmnet
  • 12:35 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1091.eqiad.wmnet
  • 12:28 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1091.eqiad.wmnet
  • 12:28 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1090.eqiad.wmnet
  • 12:23 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1090.eqiad.wmnet
  • 12:23 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1089.eqiad.wmnet
  • 12:16 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1089.eqiad.wmnet
  • 12:16 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1088.eqiad.wmnet
  • 12:10 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1088.eqiad.wmnet
  • 12:10 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1087.eqiad.wmnet
  • 12:04 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1087.eqiad.wmnet
  • 12:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1086.eqiad.wmnet
  • 11:51 godog: restart k8s-aux in eqiad to pick up new certs - T343529
  • 11:47 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1086.eqiad.wmnet
  • 11:47 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1085.eqiad.wmnet
  • 11:32 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1085.eqiad.wmnet
  • 11:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1084.eqiad.wmnet
  • 11:26 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1084.eqiad.wmnet
  • 11:26 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1083.eqiad.wmnet
  • 11:18 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1083.eqiad.wmnet
  • 11:18 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1082.eqiad.wmnet
  • 11:12 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1082.eqiad.wmnet
  • 11:12 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1081.eqiad.wmnet
  • 11:06 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1081.eqiad.wmnet
  • 11:06 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1080.eqiad.wmnet
  • 11:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 11:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 11:00 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1080.eqiad.wmnet
  • 11:00 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1079.eqiad.wmnet
  • 10:59 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 10:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 10:53 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1079.eqiad.wmnet
  • 10:53 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1078.eqiad.wmnet
  • 10:50 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 10:48 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1078.eqiad.wmnet
  • 10:40 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host analytics1077.eqiad.wmnet
  • 10:34 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host analytics1077.eqiad.wmnet
  • 10:34 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host analytics1076.eqiad.wmnet
  • 10:29 moritzm: installing Linux 6.1.55 on Bookworm hosts
  • 10:29 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host analytics1076.eqiad.wmnet
  • 10:29 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host analytics1075.eqiad.wmnet
  • 10:22 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host analytics1075.eqiad.wmnet
  • 10:22 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host analytics1074.eqiad.wmnet
  • 10:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2031.codfw.wmnet
  • 10:13 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host analytics1074.eqiad.wmnet
  • 10:13 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host analytics1073.eqiad.wmnet
  • 10:10 ladsgroup@deploy2002: Finished scap: Backport for Set virtual domain mapping for url shortener (T330590) (duration: 15m 35s)
  • 10:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2031.codfw.wmnet
  • 10:05 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 10:04 Lucas_WMDE: lucaswerkmeister-wmde@mwdebug1002:~$ sudo -u mwdeploy sh -c 'rm /srv/mediawiki/php-1.40.0-wmf.17/cache/l10n/l10n_cache-*.cdb && rmdir /srv/mediawiki/php-1.40.0-wmf.17/cache/l10n/ /srv/mediawiki/php-1.40.0-wmf.17/cache/ /srv/mediawiki/php-1.40.0-wmf.17/ # clean up old l10n cache'
  • 10:03 ladsgroup@deploy2002: ladsgroup: Backport for Set virtual domain mapping for url shortener (T330590) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:55 ladsgroup@deploy2002: Started scap: Backport for Set virtual domain mapping for url shortener (T330590)
  • 09:49 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host analytics1073.eqiad.wmnet
  • 09:49 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host analytics1072.eqiad.wmnet
  • 09:07 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host analytics1072.eqiad.wmnet
  • 09:07 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host analytics1071.eqiad.wmnet
  • 09:01 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host analytics1071.eqiad.wmnet
  • 09:01 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host analytics1070.eqiad.wmnet
  • 08:55 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host analytics1070.eqiad.wmnet
  • 08:53 moritzm: rebuilt bookworm d-i image for the Bookworm 12.2 point release T348326
  • 08:23 moritzm: rebuilt bullseye d-i image for the Bullseye 11.9 point release T348327
  • 07:06 taavi: kill stuck updateSpecialPages.php process on mwmaint2002 which was trying to re-connect to an unreachable db host
  • 07:02 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8 days, 0:00:00 on db2109.codfw.wmnet with reason: investigating db2109
  • 07:01 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 8 days, 0:00:00 on db2109.codfw.wmnet with reason: investigating db2109

2023-10-08

  • 22:58 ryankemper: [WDQS] Depooled `wdqs1014` while it catches up on a day of lag
  • 22:57 ryankemper: [WDQS] Restarted `wdqs1014`; blazegraph has been deadlocked since `2023-10-07 12:30:00`

2023-10-07

  • 09:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T343198)', diff saved to https://phabricator.wikimedia.org/P52863 and previous config saved to /var/cache/conftool/dbconfig/20231007-092249-arnaudb.json
  • 09:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P52862 and previous config saved to /var/cache/conftool/dbconfig/20231007-090742-arnaudb.json
  • 08:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P52861 and previous config saved to /var/cache/conftool/dbconfig/20231007-085236-arnaudb.json
  • 08:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T343198)', diff saved to https://phabricator.wikimedia.org/P52860 and previous config saved to /var/cache/conftool/dbconfig/20231007-083729-arnaudb.json
  • 02:33 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1030.eqiad.wmnet
  • 02:33 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase1030.eqiad.wmnet

2023-10-06

  • 23:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2054.codfw.wmnet with OS bullseye
  • 23:04 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:03 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2054.codfw.wmnet with reason: host reimage
  • 22:47 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2054.codfw.wmnet with reason: host reimage
  • 22:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2181 (T343198)', diff saved to https://phabricator.wikimedia.org/P52859 and previous config saved to /var/cache/conftool/dbconfig/20231006-224306-arnaudb.json
  • 22:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 22:42 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 22:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T343198)', diff saved to https://phabricator.wikimedia.org/P52858 and previous config saved to /var/cache/conftool/dbconfig/20231006-224245-arnaudb.json
  • 22:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P52857 and previous config saved to /var/cache/conftool/dbconfig/20231006-222738-arnaudb.json
  • 22:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2054.codfw.wmnet with OS bullseye
  • 22:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P52856 and previous config saved to /var/cache/conftool/dbconfig/20231006-221232-arnaudb.json
  • 21:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T343198)', diff saved to https://phabricator.wikimedia.org/P52855 and previous config saved to /var/cache/conftool/dbconfig/20231006-215725-arnaudb.json
  • 20:45 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:45 bking@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:35 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:34 bking@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:29 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:29 bking@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:11 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:10 bking@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:46 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 19:45 bking@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 19:44 bking@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 19:43 bking@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 19:43 bking@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 19:41 bking@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 19:40 bking@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 19:39 bking@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 18:43 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@3b7df78]: Update rdf-spark-tools to 0.3.135 to fix query mapping job failure (duration: 00m 29s)
  • 18:42 ebernhardson@deploy2002: Started deploy [airflow-dags/search@3b7df78]: Update rdf-spark-tools to 0.3.135 to fix query mapping job failure
  • 18:42 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:32 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:31 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:31 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:30 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1101
  • 18:30 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1101
  • 17:10 pt1979@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1100']
  • 17:10 pt1979@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1100']
  • 17:08 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1100.eqiad.wmnet']
  • 17:08 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1100.eqiad.wmnet']
  • 17:05 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1100']
  • 17:05 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1100']
  • 17:03 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1100']
  • 17:03 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1100']
  • 17:02 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1100']
  • 17:02 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1100']
  • 16:54 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1100.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:41 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1100.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:37 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1100.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:28 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1100.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:27 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1100.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:19 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1063.eqiad.wmnet with OS bullseye
  • 16:19 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 16:19 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 16:13 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1100.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:31 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1229.eqiad.wmnet with OS bullseye
  • 14:59 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 14:59 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 14:59 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bullseye
  • 14:58 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1067.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:58 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1064.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:58 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1063.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:55 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1067.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:55 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1064.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:55 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1063.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2054.codfw.wmnet with OS bullseye
  • 14:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-master1003.eqiad.wmnet with OS bullseye
  • 14:44 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 14:42 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 14:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
  • 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1229.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:25 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-master1003.eqiad.wmnet with reason: host reimage
  • 14:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db1229.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1229.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:22 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-master1003.eqiad.wmnet with reason: host reimage
  • 14:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db1229.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:02 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:02 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 13:55 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-master1004.eqiad.wmnet with OS bullseye
  • 13:55 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 13:53 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 13:52 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2054.codfw.wmnet with OS bullseye
  • 13:38 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 13:35 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:34 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 13:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "ganeti-test2004 - ayounsi@cumin1001"
  • 13:26 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "ganeti-test2004 - ayounsi@cumin1001"
  • 13:21 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1003.eqiad.wmnet with OS bullseye
  • 13:21 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-master1004.eqiad.wmnet with reason: host reimage
  • 13:18 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-master1004.eqiad.wmnet with reason: host reimage
  • 13:17 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 13:17 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 13:03 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti-test2004.codfw.wmnet with OS bullseye
  • 13:01 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
  • 12:29 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 12:29 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 12:20 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3318 (T343198)', diff saved to https://phabricator.wikimedia.org/P52852 and previous config saved to /var/cache/conftool/dbconfig/20231006-122022-arnaudb.json
  • 12:20 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 12:20 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 12:20 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T343198)', diff saved to https://phabricator.wikimedia.org/P52851 and previous config saved to /var/cache/conftool/dbconfig/20231006-122000-arnaudb.json
  • 12:17 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 12:16 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 12:15 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 12:15 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 12:15 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 12:14 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 12:13 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti-test2004.codfw.wmnet with OS bullseye
  • 12:13 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti-test2004.codfw.wmnet with OS bullseye
  • 12:11 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 12:10 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 12:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P52850 and previous config saved to /var/cache/conftool/dbconfig/20231006-120454-arnaudb.json
  • 12:02 moritzm: rebalancing ganeti row D/eqiad
  • 11:55 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti-test2004.codfw.wmnet with OS bullseye
  • 11:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P52848 and previous config saved to /var/cache/conftool/dbconfig/20231006-114947-arnaudb.json
  • 11:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T343198)', diff saved to https://phabricator.wikimedia.org/P52847 and previous config saved to /var/cache/conftool/dbconfig/20231006-113441-arnaudb.json
  • 10:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 10:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
  • 10:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 10:21 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2023.codfw.wmnet to cluster codfw and group A
  • 10:20 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2023.codfw.wmnet to cluster codfw and group A
  • 10:13 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 10:13 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host apt1002.wikimedia.org
  • 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host apt1002.wikimedia.org with OS bookworm
  • 09:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on apt1002.wikimedia.org with reason: host reimage
  • 09:51 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on apt1002.wikimedia.org with reason: host reimage
  • 09:42 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host apt1002.wikimedia.org with OS bookworm
  • 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM apt1002.wikimedia.org - jmm@cumin2002"
  • 09:26 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM apt1002.wikimedia.org - jmm@cumin2002"
  • 09:26 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) apt1002.wikimedia.org on all recursors
  • 09:26 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache apt1002.wikimedia.org on all recursors
  • 09:26 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:26 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM apt1002.wikimedia.org - jmm@cumin2002"
  • 09:25 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM apt1002.wikimedia.org - jmm@cumin2002"
  • 09:22 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 09:22 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host apt1002.wikimedia.org
  • 09:19 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host apt2002.wikimedia.org
  • 09:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) apt2002.wikimedia.org on all recursors
  • 09:19 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache apt2002.wikimedia.org on all recursors
  • 09:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:19 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM apt2002.wikimedia.org - jmm@cumin2002"
  • 09:18 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM apt2002.wikimedia.org - jmm@cumin2002"
  • 09:11 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 09:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) apt2002.wikimedia.org on all recursors
  • 09:11 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache apt2002.wikimedia.org on all recursors
  • 09:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM apt2002.wikimedia.org - jmm@cumin2002"
  • 09:10 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM apt2002.wikimedia.org - jmm@cumin2002"
  • 09:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 09:05 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host apt2002.wikimedia.org
  • 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
  • 09:03 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
  • 09:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
  • 09:03 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
  • 08:43 moritzm: installing vim security updates
  • 08:26 elukey@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 08:24 elukey@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 08:22 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2023.codfw.wmnet to cluster codfw and group A
  • 08:22 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2023.codfw.wmnet to cluster codfw and group A
  • 08:18 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2023.codfw.wmnet
  • 08:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2023.codfw.wmnet
  • 07:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2023.codfw.wmnet with OS bullseye
  • 07:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2023.codfw.wmnet with reason: host reimage
  • 07:09 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2023.codfw.wmnet with reason: host reimage
  • 06:53 moritzm: installing bind9 security updates (client side libs/tools only)
  • 06:52 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2023.codfw.wmnet with OS bullseye
  • 02:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3318 (T343198)', diff saved to https://phabricator.wikimedia.org/P52843 and previous config saved to /var/cache/conftool/dbconfig/20231006-020509-arnaudb.json
  • 02:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 02:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 02:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T343198)', diff saved to https://phabricator.wikimedia.org/P52842 and previous config saved to /var/cache/conftool/dbconfig/20231006-020447-arnaudb.json
  • 01:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P52841 and previous config saved to /var/cache/conftool/dbconfig/20231006-014941-arnaudb.json
  • 01:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P52840 and previous config saved to /var/cache/conftool/dbconfig/20231006-013434-arnaudb.json
  • 01:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T343198)', diff saved to https://phabricator.wikimedia.org/P52839 and previous config saved to /var/cache/conftool/dbconfig/20231006-011928-arnaudb.json
  • 00:39 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1003.eqiad.wmnet with OS bullseye
  • 00:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 00:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:19 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1004.eqiad.wmnet with OS bullseye
  • 00:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti-test2004.codfw.wmnet with OS bullseye

2023-10-05

  • 23:22 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1003.eqiad.wmnet with OS bullseye
  • 23:22 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-master1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:19 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-master1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:02 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
  • 23:02 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-master1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:00 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-master1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:59 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 22:58 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-master1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:58 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-master1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:37 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 21:34 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1003.eqiad.wmnet with OS bullseye
  • 21:32 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1004.eqiad.wmnet with OS bullseye
  • 21:17 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase2020.codfw.wmnet: Maybe cleanup leaked file descriptors(?) - eevans@cumin1001
  • 21:07 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase2020.codfw.wmnet: Maybe cleanup leaked file descriptors(?) - eevans@cumin1001
  • 21:03 thcipriani@deploy2002: Finished scap: Backport for [foundationwiki] Add Endowment, Agenda, Committee, and Memory namespaces (T347762 T347822 T348268), [foundationwiki] Provide 'translationadmin' group with 'edit-legal' right (T346187) (duration: 09m 56s)
  • 20:58 thcipriani@deploy2002: thcipriani and varnent: Continuing with sync
  • 20:57 eileen: civicrm upgraded from 05545fbc to 4329014b
  • 20:55 thcipriani@deploy2002: thcipriani and varnent: Backport for [foundationwiki] Add Endowment, Agenda, Committee, and Memory namespaces (T347762 T347822 T348268), [foundationwiki] Provide 'translationadmin' group with 'edit-legal' right (T346187) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:54 thcipriani@deploy2002: Started scap: Backport for [foundationwiki] Add Endowment, Agenda, Committee, and Memory namespaces (T347762 T347822 T348268), [foundationwiki] Provide 'translationadmin' group with 'edit-legal' right (T346187)
  • 20:49 thcipriani@deploy2002: Finished scap: Backport for [Prototype] Add screen resolution to Typography prototype, [Prototype] Edit project link page on reading prototype (duration: 23m 57s)
  • 20:39 thcipriani@deploy2002: jdrewniak and thcipriani: Continuing with sync
  • 20:37 thcipriani@deploy2002: jdrewniak and thcipriani: Backport for [Prototype] Add screen resolution to Typography prototype, [Prototype] Edit project link page on reading prototype synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:25 thcipriani@deploy2002: Started scap: Backport for [Prototype] Add screen resolution to Typography prototype, [Prototype] Edit project link page on reading prototype
  • 20:22 thcipriani@deploy2002: Finished scap: Backport for Enable Minerva site notice for Nepali Wikipedia (newiki) (T347814) (duration: 08m 57s)
  • 20:16 thcipriani@deploy2002: ammarpad and thcipriani: Continuing with sync
  • 20:14 thcipriani@deploy2002: ammarpad and thcipriani: Backport for Enable Minerva site notice for Nepali Wikipedia (newiki) (T347814) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:14 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1003.eqiad.wmnet with OS bullseye
  • 20:14 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
  • 20:13 thcipriani@deploy2002: Started scap: Backport for Enable Minerva site notice for Nepali Wikipedia (newiki) (T347814)
  • 18:51 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti-test2004.codfw.wmnet with OS bullseye
  • 18:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti-test2004']
  • 18:47 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti-test2004']
  • 18:45 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 18:34 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.29 refs T347080
  • 18:17 sukhe: running authdns-update: T347054
  • 18:15 jhuneidi@deploy2002: Synchronized php: group1 wikis to 1.41.0-wmf.29 refs T347080 (duration: 06m 12s)
  • 18:08 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.29 refs T347080
  • 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti-test2004']
  • 17:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti-test2004']
  • 17:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:25 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 16:57 bvibber: scaling back batch jobs for T312153 and T312152, will run these in further chunks as the new config rolls out
  • 16:47 bvibber: brion running requeueTranscodes.php on mwmaint2002 for VP9 transcode cleanup for T312153
  • 16:22 volans: installed 7.3.1 on cumin1001
  • 16:19 jbond@cumin2002: END (PASS) - Cookbook sre.puppetboard.restart-reboot (exit_code=0) rolling reboot on A:puppetboard
  • 16:15 jbond@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet. on all recursors
  • 16:15 jbond@cumin2002: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet. on all recursors
  • 16:12 dcausse: cleaning up rdf-streaming-updater-staging swift bucket
  • 16:11 jbond@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet. on all recursors
  • 16:10 jbond@cumin2002: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet. on all recursors
  • 16:10 jbond@cumin2002: START - Cookbook sre.puppetboard.restart-reboot rolling reboot on A:puppetboard
  • 16:10 jbond@cumin2002: END (ERROR) - Cookbook sre.puppet.renew-cert (exit_code=97) for sretest1001.eqiad.wmnet: Renew puppet certificate - jbond@cumin2002
  • 16:09 jbond@cumin2002: START - Cookbook sre.puppet.renew-cert for sretest1001.eqiad.wmnet: Renew puppet certificate - jbond@cumin2002
  • 16:07 cgoubert@deploy2002: Finished scap: Testing mw-on-k8s deployment for T348228 (duration: 02m 15s)
  • 16:06 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for sretest1001.eqiad.wmnet: Renew puppet certificate - jbond@cumin2002
  • 16:05 cgoubert@deploy2002: Started scap: Testing mw-on-k8s deployment for T348228
  • 16:05 jbond@cumin2002: END (PASS) - Cookbook sre.puppetboard.restart-reboot (exit_code=0) rolling reboot on A:puppetboard
  • 16:05 jbond@cumin2002: START - Cookbook sre.puppet.renew-cert for sretest1001.eqiad.wmnet: Renew puppet certificate - jbond@cumin2002
  • 16:01 jbond@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet. on all recursors
  • 16:01 jbond@cumin2002: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet. on all recursors
  • 16:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2166 (T343198)', diff saved to https://phabricator.wikimedia.org/P52837 and previous config saved to /var/cache/conftool/dbconfig/20231005-160030-arnaudb.json
  • 16:00 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 16:00 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 16:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T343198)', diff saved to https://phabricator.wikimedia.org/P52836 and previous config saved to /var/cache/conftool/dbconfig/20231005-160009-arnaudb.json
  • 15:54 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1002.eqiad.wmnet
  • 15:37 volans: installed 7.3.1 on cumin2002
  • 15:36 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1002.eqiad.wmnet
  • 15:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:31 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1002.eqiad.wmnet
  • 15:31 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti-test2004.codfw.wmnet with OS bullseye
  • 15:30 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubemaster2001.codfw.wmnet
  • 15:30 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubemaster2001.codfw.wmnet
  • 15:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P52834 and previous config saved to /var/cache/conftool/dbconfig/20231005-152956-arnaudb.json
  • 15:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 15:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti2023.codfw.wmnet with reason: reimage to bullseye
  • 15:26 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti2023.codfw.wmnet with reason: reimage to bullseye
  • 15:26 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kubemaster2001.codfw.wmnet with reason: Pick up vcpu change
  • 15:25 claime: rebooting kubemaster2001.codfw.wmnet - T348228
  • 15:25 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kubemaster2001.codfw.wmnet with reason: Pick up vcpu change
  • 15:25 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubemaster2002.codfw.wmnet
  • 15:24 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubemaster2002.codfw.wmnet
  • 15:20 claime: rebooting kubemaster2002.codfw.wmnet - T348228
  • 15:20 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kubemaster2002.codfw.wmnet with reason: Pick up vcpu change
  • 15:19 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kubemaster2002.codfw.wmnet with reason: Pick up vcpu change
  • 15:16 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubetcd2004.codfw.wmnet
  • 15:16 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubetcd2004.codfw.wmnet
  • 15:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T343198)', diff saved to https://phabricator.wikimedia.org/P52832 and previous config saved to /var/cache/conftool/dbconfig/20231005-151450-arnaudb.json
  • 15:13 claime: rebooting kubetcd2004.codfw.wmnet - T348228
  • 15:13 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kubetcd2004.codfw.wmnet with reason: Pick up vcpu change
  • 15:12 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti2023.codfw.wmnet
  • 15:12 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kubetcd2004.codfw.wmnet with reason: Pick up vcpu change
  • 15:11 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubetcd2005.codfw.wmnet
  • 15:10 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubetcd2005.codfw.wmnet
  • 15:10 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kubetcd2005.codfw.wmnet with reason: Pick up vcpu change
  • 15:09 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kubetcd2005.codfw.wmnet with reason: Pick up vcpu change
  • 15:09 claime: rebooting kubetcd2005.codfw.wmnet - T348228
  • 15:08 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubetcd2006.codfw.wmnet
  • 15:08 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubetcd2006.codfw.wmnet
  • 15:07 claime: rebooting kubetcd2006.codfw.wmnet - T348228
  • 15:07 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kubetcd2006.codfw.wmnet with reason: Pick up vcpu change
  • 15:07 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kubetcd2006.codfw.wmnet with reason: Pick up vcpu change
  • 15:06 claime: Bumping kubetcd200[4-6].eqiad.wmnet vcpu to 2 - T348228
  • 15:04 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubemaster1001.eqiad.wmnet
  • 15:03 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubemaster1001.eqiad.wmnet
  • 15:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2023.codfw.wmnet
  • 15:03 claime: rebooting kubemaster1001.eqiad.wmnet - T348228
  • 15:03 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kubemaster1001.eqiad.wmnet with reason: Pick up vcpu change
  • 14:59 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kubemaster1001.eqiad.wmnet with reason: Pick up vcpu change
  • 14:57 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubemaster1002.eqiad.wmnet
  • 14:57 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubemaster1002.eqiad.wmnet
  • 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
  • 14:53 claime: rebooting kubemaster1002.eqiad.wmnet - T348228
  • 14:53 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kubemaster1002.eqiad.wmnet with reason: Pick up vcpu change
  • 14:53 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kubemaster1002.eqiad.wmnet with reason: Pick up vcpu change
  • 14:52 claime: Bumping kubemaster100[1-2].eqiad.wmnet vcpu to 2, ram to 4G - T348228
  • 14:50 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubetcd1004.eqiad.wmnet
  • 14:50 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubetcd1004.eqiad.wmnet
  • 14:47 claime: rebooting kubetcd1004.eqiad.wmnet - T348228
  • 14:47 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kubetcd1004.eqiad.wmnet with reason: Pick up vcpu change
  • 14:47 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kubetcd1004.eqiad.wmnet with reason: Pick up vcpu change
  • 14:46 claime: rebooted kubetcd1005.eqiad.wmnet - T348228
  • 14:46 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubetcd1005.eqiad.wmnet
  • 14:46 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubetcd1005.eqiad.wmnet
  • 14:44 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kubetcd1005.eqiad.wmnet with reason: Pick up vcpu change
  • 14:44 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kubetcd1005.eqiad.wmnet with reason: Pick up vcpu change
  • 14:44 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubetcd1006.eqiad.wmnet
  • 14:44 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubetcd1006.eqiad.wmnet
  • 14:41 claime: rebooting kubetcd1006.eqiad.wmnet - T348228
  • 14:41 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kubetcd1006.eqiad.wmnet with reason: Pick up vcpu change
  • 14:41 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kubetcd1006.eqiad.wmnet with reason: Pick up vcpu change
  • 14:38 claime: Bumping kubetcd100[4-6].eqiad.wmnet vcpu to 2 - T348228
  • 14:38 claime: Bumping kubectd100[4-6].eqiad.wmnet vcpu to 2 - T348228
  • 14:33 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:33 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:29 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:29 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:25 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
  • 14:24 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti-test2004.codfw.wmnet with OS bullseye
  • 14:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti-test2004']
  • 14:22 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti-test2004']
  • 14:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:18 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:17 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Revert "Use HookHandlers for core hooks" (T348181) (duration: 08m 50s)
  • 14:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:11 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:09 lucaswerkmeister-wmde@deploy2002: umherirrender and lucaswerkmeister-wmde: Continuing with sync
  • 14:09 lucaswerkmeister-wmde@deploy2002: umherirrender and lucaswerkmeister-wmde: Backport for Revert "Use HookHandlers for core hooks" (T348181) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 14:08 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 14:08 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Revert "Use HookHandlers for core hooks" (T348181)
  • 14:05 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 14:04 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 14:04 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 13:53 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 13:50 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 13:49 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Drop old VP8 video transcodes, enable HLS on testwiki (T312152 T309823) (duration: 12m 07s)
  • 13:47 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 13:44 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 13:42 lucaswerkmeister-wmde@deploy2002: brion and lucaswerkmeister-wmde: Continuing with sync
  • 13:41 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 13:38 lucaswerkmeister-wmde@deploy2002: brion and lucaswerkmeister-wmde: Backport for Drop old VP8 video transcodes, enable HLS on testwiki (T312152 T309823) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:36 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Drop old VP8 video transcodes, enable HLS on testwiki (T312152 T309823)
  • 13:36 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 13:36 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:36 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 13:35 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 13:35 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 13:35 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 13:35 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 13:34 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 13:32 urandom: starting Cassandra rebuild, restbase1030-c — T346803
  • 13:22 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 13:22 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1063.eqiad.wmnet with OS bullseye
  • 13:22 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-db1002.eqiad.wmnet
  • 13:21 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 13:15 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-db1002.eqiad.wmnet
  • 13:14 urbanecm@deploy2002: Finished scap: Backport for [Growth] enwiki: Enable mentorship for 50% of new users (T341399) (duration: 10m 08s)
  • 13:12 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-master1002.eqiad.wmnet
  • 13:11 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
  • 13:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
  • 13:11 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
  • 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host matomo1002.eqiad.wmnet
  • 13:08 claime: respawning two misbehaving thumbor pods in codfw
  • 13:08 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 13:05 urbanecm@deploy2002: urbanecm: Backport for [Growth] enwiki: Enable mentorship for 50% of new users (T341399) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:05 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-master1002.eqiad.wmnet
  • 13:04 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-druid1001.eqiad.wmnet
  • 13:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host matomo1002.eqiad.wmnet
  • 13:04 urbanecm@deploy2002: Started scap: Backport for [Growth] enwiki: Enable mentorship for 50% of new users (T341399)
  • 12:59 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-druid1001.eqiad.wmnet
  • 12:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-presto1001.eqiad.wmnet
  • 12:51 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-presto1001.eqiad.wmnet
  • 12:51 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 12:50 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 12:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lists1001.wikimedia.org
  • 12:42 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
  • 12:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host lists1001.wikimedia.org
  • 12:38 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
  • 12:27 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts puppetdb2002.codfw.wmnet,puppetdb1002.eqiad.wmnet
  • 12:27 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:27 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetdb2002.codfw.wmnet,puppetdb1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jbond@cumin1001"
  • 12:27 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts puppetboard2002.codfw.wmnet,puppetboard1002.eqiad.wmnet
  • 12:27 jbond@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 12:26 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetdb2002.codfw.wmnet,puppetdb1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jbond@cumin1001"
  • 12:24 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 12:22 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 12:13 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts puppetdb2002.codfw.wmnet,puppetdb1002.eqiad.wmnet
  • 12:10 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts puppetboard2002.codfw.wmnet,puppetboard1002.eqiad.wmnet
  • 12:08 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudvirt1063']
  • 12:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1063']
  • 12:07 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1063']
  • 12:07 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1064']
  • 12:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1064']
  • 12:06 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1063']
  • 12:01 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 12:01 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bullseye
  • 12:01 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 11:57 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudnet2005-dev
  • 11:57 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudnet2005-dev
  • 11:46 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.failover (exit_code=0) Failover of gitlab from gitlab1004.wikimedia.org to gitlab2002.wikimedia.org
  • 11:36 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 11:36 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 11:24 jelto@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) https://gitlab.wikimedia.org/ https://gitlab-replica.wikimedia.org/ on all recursors
  • 11:24 jelto@cumin1001: START - Cookbook sre.dns.wipe-cache https://gitlab.wikimedia.org/ https://gitlab-replica.wikimedia.org/ on all recursors
  • 11:23 jelto@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) https://gitlab.wikimedia.org/ https://gitlab-replica.wikimedia.org/ on all recursors
  • 11:23 jelto@cumin1001: START - Cookbook sre.dns.wipe-cache https://gitlab.wikimedia.org/ https://gitlab-replica.wikimedia.org/ on all recursors
  • 11:23 jelto@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) https://gitlab.wikimedia.org/ https://gitlab-replica.wikimedia.org/ on all recursors
  • 11:23 jelto@cumin1001: START - Cookbook sre.dns.wipe-cache https://gitlab.wikimedia.org/ https://gitlab-replica.wikimedia.org/ on all recursors
  • 10:23 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts orespoolcounter[2003-2004].codfw.wmnet,orespoolcounter[1003-1004].eqiad.wmnet
  • 10:23 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:23 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: orespoolcounter[2003-2004].codfw.wmnet,orespoolcounter[1003-1004].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
  • 10:21 klausman@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: orespoolcounter[2003-2004].codfw.wmnet,orespoolcounter[1003-1004].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
  • 10:16 klausman@cumin1001: START - Cookbook sre.dns.netbox
  • 10:09 klausman@cumin1001: START - Cookbook sre.hosts.decommission for hosts orespoolcounter[2003-2004].codfw.wmnet,orespoolcounter[1003-1004].eqiad.wmnet
  • 10:09 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ores1001.eqiad.wmnet
  • 10:09 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:09 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ores1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
  • 10:08 klausman@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ores1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
  • 10:00 klausman@cumin1001: START - Cookbook sre.dns.netbox
  • 09:59 moritzm: installing python2.7 security updates
  • 09:55 klausman@cumin1001: START - Cookbook sre.hosts.decommission for hosts ores1001.eqiad.wmnet
  • 09:01 jelto@cumin1001: START - Cookbook sre.gitlab.failover Failover of gitlab from gitlab1004.wikimedia.org to gitlab2002.wikimedia.org
  • 07:59 moritzm: installing jetty9 security updates
  • 07:51 godog: bounce vopsbot on alert1001
  • 05:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2165 (T343198)', diff saved to https://phabricator.wikimedia.org/P52831 and previous config saved to /var/cache/conftool/dbconfig/20231005-055637-arnaudb.json
  • 05:56 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 05:56 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 05:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T343198)', diff saved to https://phabricator.wikimedia.org/P52830 and previous config saved to /var/cache/conftool/dbconfig/20231005-055615-arnaudb.json
  • 05:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P52829 and previous config saved to /var/cache/conftool/dbconfig/20231005-054109-arnaudb.json
  • 05:29 denisse: Deleting old Jenkins builds on pcc-worker1002 to free disk space
  • 05:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P52828 and previous config saved to /var/cache/conftool/dbconfig/20231005-052602-arnaudb.json
  • 05:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T343198)', diff saved to https://phabricator.wikimedia.org/P52827 and previous config saved to /var/cache/conftool/dbconfig/20231005-051056-arnaudb.json
  • 02:50 eileen: civicrm upgraded from 44800fc0 to 05545fbc

2023-10-04

  • 23:44 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 23:38 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 23:32 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1063.eqiad.wmnet with OS bullseye
  • 22:40 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1065.eqiad.wmnet with OS bullseye
  • 22:40 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 22:39 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 22:24 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1065.eqiad.wmnet with reason: host reimage
  • 22:23 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 22:21 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1065.eqiad.wmnet with reason: host reimage
  • 22:18 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 22:13 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2054.codfw.wmnet with OS bullseye
  • 22:11 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bullseye
  • 22:06 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 22:06 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 22:06 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1063.eqiad.wmnet with OS bullseye
  • 22:05 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1065.eqiad.wmnet with OS bullseye
  • 22:02 urandom: starting Cassandra rebuild, restbase1030-b — T346803
  • 22:02 brennen@deploy2002: Finished scap: Backport for Revert "Deprecate TOC mutation in OutputPageParserOutput hook" (T348134) (duration: 09m 13s)
  • 21:59 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1065.eqiad.wmnet with OS bullseye
  • 21:58 volans: uploaded spicerack_7.3.1 to apt.wikimedia.org bullseye-wikimedia
  • 21:56 brennen@deploy2002: brennen and ssastry: Continuing with sync
  • 21:54 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1066.eqiad.wmnet with OS bullseye
  • 21:54 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 21:54 brennen@deploy2002: brennen and ssastry: Backport for Revert "Deprecate TOC mutation in OutputPageParserOutput hook" (T348134) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:53 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 21:53 brennen@deploy2002: Started scap: Backport for Revert "Deprecate TOC mutation in OutputPageParserOutput hook" (T348134)
  • 21:46 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1065.eqiad.wmnet with OS bullseye
  • 21:44 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1065.eqiad.wmnet with OS bullseye
  • 21:40 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1062.eqiad.wmnet with OS bullseye
  • 21:40 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 21:38 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 21:38 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1066.eqiad.wmnet with reason: host reimage
  • 21:34 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1066.eqiad.wmnet with reason: host reimage
  • 21:23 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1062.eqiad.wmnet with reason: host reimage
  • 21:20 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1062.eqiad.wmnet with reason: host reimage
  • 21:02 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1062.eqiad.wmnet with OS bullseye
  • 20:59 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1062.eqiad.wmnet with OS bullseye
  • 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2054.codfw.wmnet with OS bullseye
  • 20:54 urbanecm@deploy2002: Finished scap: Backport for SpecialManageMentors: Skip OOUI initialization when transcluding (T346760), SpecialManageMentors: Skip OOUI initialization when transcluding (T346760), Fix phan for GrowthExperiments (T347571) (duration: 07m 49s)
  • 20:48 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 20:48 urbanecm@deploy2002: urbanecm: Backport for SpecialManageMentors: Skip OOUI initialization when transcluding (T346760), SpecialManageMentors: Skip OOUI initialization when transcluding (T346760), Fix phan for GrowthExperiments (T347571) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:46 urbanecm@deploy2002: Started scap: Backport for SpecialManageMentors: Skip OOUI initialization when transcluding (T346760), SpecialManageMentors: Skip OOUI initialization when transcluding (T346760), Fix phan for GrowthExperiments (T347571)
  • 20:46 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 20:46 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1066.eqiad.wmnet with OS bullseye
  • 20:46 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1065.eqiad.wmnet with OS bullseye
  • 20:46 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 20:46 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bullseye
  • 20:45 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1062.eqiad.wmnet with OS bullseye
  • 20:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1065.eqiad.wmnet with OS bullseye
  • 20:21 eileen: re-enable process control (more better hopefully) config revision changed from 89231b1b to d66626f6
  • 19:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2164 (T343198)', diff saved to https://phabricator.wikimedia.org/P52826 and previous config saved to /var/cache/conftool/dbconfig/20231004-195023-arnaudb.json
  • 19:50 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 19:50 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 19:50 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 19:49 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 19:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T343198)', diff saved to https://phabricator.wikimedia.org/P52825 and previous config saved to /var/cache/conftool/dbconfig/20231004-194946-arnaudb.json
  • 19:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P52824 and previous config saved to /var/cache/conftool/dbconfig/20231004-193439-arnaudb.json
  • 19:33 eileen: config revision changed from 89231b1b to d66626f6
  • 19:30 eileen: civicrm upgraded from 169c3288 to 44800fc0
  • 19:29 eileen: config revision changed from 4ae7bd71 to 89231b1b
  • 19:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P52823 and previous config saved to /var/cache/conftool/dbconfig/20231004-191933-arnaudb.json
  • 19:19 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:12 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 19:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T343198)', diff saved to https://phabricator.wikimedia.org/P52822 and previous config saved to /var/cache/conftool/dbconfig/20231004-190427-arnaudb.json
  • 18:54 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1063.eqiad.wmnet with OS bullseye
  • 18:53 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 18:19 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.29 refs T347080
  • 18:19 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1066.eqiad.wmnet with OS bullseye
  • 18:19 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1062.eqiad.wmnet with OS bullseye
  • 18:09 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.29 refs T347080
  • 17:52 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 17:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testreduce1002.eqiad.wmnet
  • 17:43 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 17:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testreduce1002.eqiad.wmnet
  • 17:33 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1066.eqiad.wmnet with OS bullseye
  • 17:33 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1065.eqiad.wmnet with OS bullseye
  • 17:33 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 17:33 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bullseye
  • 17:33 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1062.eqiad.wmnet with OS bullseye
  • 17:32 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1066']
  • 17:32 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1065']
  • 17:32 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1064']
  • 17:32 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1063']
  • 17:31 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1062']
  • 17:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1063.eqiad.wmnet with OS bullseye
  • 17:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 17:22 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 17:22 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 17:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1066.eqiad.wmnet with OS bullseye
  • 17:03 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1065.eqiad.wmnet with OS bullseye
  • 17:03 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1062.eqiad.wmnet with OS bullseye
  • 16:59 fabfur: merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/963326 (T347837). `purged` daemon will be restarted by puppet in esams in the next 30m
  • 16:54 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 16:49 taavi: taavi@mwmaint2002 ~ $ mwscript extensions/OATHAuth/maintenance/UpdateForMultipleDevicesSupport.php metawiki | tee T242031-sul.log # T242031
  • 16:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1003.eqiad.wmnet with OS bullseye
  • 16:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1004.eqiad.wmnet with OS bullseye
  • 16:34 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1067']
  • 16:34 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1066']
  • 16:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1066']
  • 16:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1067']
  • 16:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1067']
  • 16:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1066']
  • 16:26 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1067']
  • 16:26 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1066']
  • 16:25 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1066']
  • 16:25 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1066']
  • 16:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1066']
  • 16:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1066']
  • 16:25 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1067']
  • 16:24 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1067']
  • 16:24 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1066']
  • 16:24 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1067']
  • 16:24 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1066']
  • 16:24 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1067']
  • 16:24 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1066']
  • 16:23 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1067']
  • 16:23 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1066']
  • 16:23 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1067']
  • 16:23 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1066']
  • 16:23 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 16:23 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1066.eqiad.wmnet with OS bullseye
  • 16:22 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1066']
  • 16:22 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1066']
  • 16:22 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1066']
  • 16:22 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1066']
  • 16:21 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1067']
  • 16:21 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1066']
  • 16:21 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1067']
  • 16:21 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1066']
  • 16:21 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1063']
  • 16:21 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1065']
  • 16:21 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1064']
  • 16:21 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1062']
  • 16:15 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1065']
  • 16:15 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1064']
  • 16:15 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1063']
  • 16:15 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1062']
  • 16:07 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1065.eqiad.wmnet with OS bullseye
  • 16:07 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 16:07 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bullseye
  • 16:07 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1062.eqiad.wmnet with OS bullseye
  • 15:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:55 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:47 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
  • 15:47 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1003.eqiad.wmnet with OS bullseye
  • 15:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:45 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1062.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:44 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1065.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1066.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1067.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1064.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1063.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:39 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 15:37 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 15:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:32 hashar@deploy2002: Finished deploy [integration/docroot@b3b712f]: (no justification provided) (duration: 00m 06s)
  • 15:32 hashar@deploy2002: Started deploy [integration/docroot@b3b712f]: (no justification provided)
  • 15:21 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 15:17 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 15:13 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1067.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:12 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1066.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:12 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1065.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:12 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1064.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:12 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1063.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:12 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1062.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:08 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:08 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudvirt1062-67 - jclark@cumin1001"
  • 15:07 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudvirt1062-67 - jclark@cumin1001"
  • 15:05 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 15:00 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 14:59 taavi: revoke a bot password, https://phabricator.wikimedia.org/T348132
  • 14:56 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 14:55 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1003.eqiad.wmnet with OS bullseye
  • 14:39 klausman@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ores[2001-2004].codfw.wmnet
  • 14:39 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:39 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ores[2001-2004].codfw.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
  • 14:38 klausman@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ores[1002-1009].eqiad.wmnet
  • 14:38 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:38 Lucas_WMDE: spontaneously extended UTC afternoon backport+config window done now
  • 14:37 klausman@cumin1001: START - Cookbook sre.dns.netbox
  • 14:36 klausman@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ores[2001-2004].codfw.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
  • 14:34 klausman@cumin1001: START - Cookbook sre.dns.netbox
  • 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for prod: Enable wgCampaignEventsEnableEmail in meta and officewiki (T347065) (duration: 18m 26s)
  • 14:25 lucaswerkmeister-wmde@deploy2002: daimona and lucaswerkmeister-wmde: Continuing with sync
  • 14:24 fabfur: merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/963321 (T347837). `purged` daemon will be restarted by puppet in drmrs in the next 30m
  • 14:22 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 14:22 klausman@cumin1001: START - Cookbook sre.hosts.decommission for hosts ores[2001-2004].codfw.wmnet
  • 14:21 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ores2009.codfw.wmnet
  • 14:21 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:21 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ores2009.codfw.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
  • 14:20 klausman@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ores2009.codfw.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
  • 14:18 klausman@cumin1001: START - Cookbook sre.hosts.decommission for hosts ores[1002-1009].eqiad.wmnet
  • 14:18 klausman@cumin1001: START - Cookbook sre.dns.netbox
  • 14:18 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ores2006.codfw.wmnet
  • 14:17 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:17 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ores2006.codfw.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
  • 14:17 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ores2005.codfw.wmnet
  • 14:17 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:17 klausman@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ores2006.codfw.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
  • 14:16 urandom: starting Cassandra rebuild, restbase1030-a — T346803
  • 14:16 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ores2007.codfw.wmnet
  • 14:16 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:16 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ores2007.codfw.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
  • 14:15 klausman@cumin1001: START - Cookbook sre.dns.netbox
  • 14:14 klausman@cumin1001: START - Cookbook sre.dns.netbox
  • 14:14 lucaswerkmeister-wmde@deploy2002: daimona and lucaswerkmeister-wmde: Backport for prod: Enable wgCampaignEventsEnableEmail in meta and officewiki (T347065) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:13 klausman@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ores2007.codfw.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
  • 14:12 klausman@cumin1001: START - Cookbook sre.hosts.decommission for hosts ores2009.codfw.wmnet
  • 14:12 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for prod: Enable wgCampaignEventsEnableEmail in meta and officewiki (T347065)
  • 14:10 klausman@cumin1001: START - Cookbook sre.dns.netbox
  • 14:10 klausman@cumin1001: START - Cookbook sre.hosts.decommission for hosts ores2005.codfw.wmnet
  • 14:09 klausman@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ores2008.codfw.wmnet
  • 14:09 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:08 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ores2008.codfw.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
  • 14:08 klausman@cumin1001: START - Cookbook sre.hosts.decommission for hosts ores2006.codfw.wmnet
  • 14:07 klausman@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ores2008.codfw.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
  • 14:05 klausman@cumin1001: START - Cookbook sre.hosts.decommission for hosts ores2007.codfw.wmnet
  • 14:04 klausman@cumin1001: START - Cookbook sre.dns.netbox
  • 14:00 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for beta: Explicitly assign campaignevents-email-participants to all users (T336939), metawiki: Restrict campaignevents-email-participants right (T336939) (duration: 10m 40s)
  • 13:57 klausman@cumin1001: START - Cookbook sre.hosts.decommission for hosts ores2008.codfw.wmnet
  • 13:54 lucaswerkmeister-wmde@deploy2002: daimona and lucaswerkmeister-wmde: Continuing with sync
  • 13:53 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1003.eqiad.wmnet with OS bullseye
  • 13:51 lucaswerkmeister-wmde@deploy2002: daimona and lucaswerkmeister-wmde: Backport for beta: Explicitly assign campaignevents-email-participants to all users (T336939), metawiki: Restrict campaignevents-email-participants right (T336939) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:49 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for beta: Explicitly assign campaignevents-email-participants to all users (T336939), metawiki: Restrict campaignevents-email-participants right (T336939)
  • 13:47 Lucas_WMDE: mwscript namespaceDupes fonwiki --fix # T347939 – 0 pages to fix, 0 resolvable; 0 links to fix, 0 resolvable, 0 deleted
  • 13:46 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for fonwiki: add wgSiteName, wgMetaNamespace and timezone (T347939) (duration: 13m 46s)
  • 13:34 lucaswerkmeister-wmde@deploy2002: anzx and lucaswerkmeister-wmde: Backport for fonwiki: add wgSiteName, wgMetaNamespace and timezone (T347939) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:33 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for fonwiki: add wgSiteName, wgMetaNamespace and timezone (T347939)
  • 13:27 fabfur: merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/963147 (T347837). `purged` daemon will be restarted by puppet in eqiad in the next 30m
  • 13:25 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 13:25 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
  • 13:24 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 13:24 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 13:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 13:23 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
  • 13:20 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for fonwiki: add logos (T347939) (duration: 11m 43s)
  • 13:19 rook@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2006-dev.codfw.wmnet with OS bullseye
  • 13:14 urandom: Cassandra bootstrap, restbase1030-a (`auto_bootstrap: false`) — T346803
  • 13:14 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and anzx: Continuing with sync
  • 13:10 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and anzx: Backport for fonwiki: add logos (T347939) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:09 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for fonwiki: add logos (T347939)
  • 13:03 rook@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2006-dev.codfw.wmnet with reason: host reimage
  • 13:00 rook@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2006-dev.codfw.wmnet with reason: host reimage
  • 12:56 klausman: powering off orespoolcounter{1004,2003,2004}.{eqiad,codfw}.wmnet (1003 is kept powered-on in case we need access to files from the old install). The machines have a 90d downtime already put in.
  • 12:53 klausman: powering off ores200{2..9}.codfw.wmnet (2001 is kept powered-on in case we need access to files from the old install). The machines have a 90d downtime already put in.
  • 12:51 klausman: powering off ores100{2..9}.eqiad.wmnet (1001 is kept powered-on in case we need access to files from the old install). The machines have a 90d downtime already put in
  • 12:46 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 90 days, 0:00:00 on 22 hosts with reason: Downtime for graceful shutdown and later decom
  • 12:46 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 90 days, 0:00:00 on 22 hosts with reason: Downtime for graceful shutdown and later decom
  • 12:43 rook@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol2006-dev.codfw.wmnet with OS bullseye
  • 11:45 moritzm: installing exim4 security updates
  • 11:33 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe2004.codfw.wmnet with OS bullseye
  • 11:30 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 11:20 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 11:17 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2004.codfw.wmnet with reason: host reimage
  • 11:14 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2004.codfw.wmnet with reason: host reimage
  • 11:14 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Zsoo out of all services on: 2175 hosts
  • 11:02 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Zsoo out of all services on: 2175 hosts
  • 10:58 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe2004.codfw.wmnet with OS bullseye
  • 10:29 filippo@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['thanos-fe2004']
  • 10:29 filippo@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe2004']
  • 10:21 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 10:20 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-fe2004.codfw.wmnet with OS bullseye
  • 10:20 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
  • 10:20 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 10:20 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 10:20 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 10:20 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
  • 10:02 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe2004.codfw.wmnet with OS bullseye
  • 10:02 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe2004.codfw.wmnet with OS bullseye
  • 09:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2163 (T343198)', diff saved to https://phabricator.wikimedia.org/P52817 and previous config saved to /var/cache/conftool/dbconfig/20231004-094320-arnaudb.json
  • 09:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 09:43 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 09:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T343198)', diff saved to https://phabricator.wikimedia.org/P52816 and previous config saved to /var/cache/conftool/dbconfig/20231004-094258-arnaudb.json
  • 09:39 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 09:39 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 09:39 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 09:39 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 09:39 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 09:38 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 09:38 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 09:38 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 09:38 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 09:38 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 09:38 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 09:38 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 09:38 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 09:37 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 09:37 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 09:37 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 09:37 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 09:37 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 09:37 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 09:37 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 09:35 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe2004.codfw.wmnet with OS bullseye
  • 09:33 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe2004.codfw.wmnet with OS bullseye
  • 09:28 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging TsepoThoabala out of all services on: 2175 hosts
  • 09:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P52815 and previous config saved to /var/cache/conftool/dbconfig/20231004-092752-arnaudb.json
  • 09:27 jmm@cumin2002: START - Cookbook sre.idm.logout Logging TsepoThoabala out of all services on: 2175 hosts
  • 09:26 sg912@deploy2002: Finished deploy [airflow-dags/analytics@3b374a9]: (no justification provided) (duration: 00m 45s)
  • 09:25 sg912@deploy2002: Started deploy [airflow-dags/analytics@3b374a9]: (no justification provided)
  • 09:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P52814 and previous config saved to /var/cache/conftool/dbconfig/20231004-091245-arnaudb.json
  • 09:08 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging KMorgan out of all services on: 2175 hosts
  • 09:08 jmm@cumin2002: START - Cookbook sre.idm.logout Logging KMorgan out of all services on: 2175 hosts
  • 09:02 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 09:01 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 08:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T343198)', diff saved to https://phabricator.wikimedia.org/P52813 and previous config saved to /var/cache/conftool/dbconfig/20231004-085739-arnaudb.json
  • 08:44 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging EllenR out of all services on: 2175 hosts
  • 08:43 jmm@cumin2002: START - Cookbook sre.idm.logout Logging EllenR out of all services on: 2175 hosts
  • 08:19 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe2004.codfw.wmnet with OS bullseye
  • 08:14 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe2003.codfw.wmnet with OS bullseye
  • 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Eigyan out of all services on: 2176 hosts
  • 08:00 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Eigyan out of all services on: 2176 hosts
  • 07:56 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2003.codfw.wmnet with reason: host reimage
  • 07:53 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2003.codfw.wmnet with reason: host reimage
  • 07:34 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe2003.codfw.wmnet with OS bullseye
  • 07:19 XioNoX: Remove static routes for anycast prefixes - T347494
  • 06:30 moritzm: installing glibc security updates
  • 06:19 Surbhi_: Deployed refinery using scap, then deployed onto hdfs
  • 05:54 sg912@deploy2002: Finished deploy [analytics/refinery@e954b12] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@e954b12a] (duration: 03m 00s)
  • 05:51 sg912@deploy2002: Started deploy [analytics/refinery@e954b12] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@e954b12a]
  • 05:50 sg912@deploy2002: Finished deploy [analytics/refinery@e954b12] (thin): Regular analytics weekly train THIN [analytics/refinery@e954b12a] (duration: 00m 06s)
  • 05:50 sg912@deploy2002: Started deploy [analytics/refinery@e954b12] (thin): Regular analytics weekly train THIN [analytics/refinery@e954b12a]
  • 05:49 sg912@deploy2002: Finished deploy [analytics/refinery@e954b12]: Regular analytics weekly train [analytics/refinery@e954b12a] (duration: 06m 02s)
  • 05:43 sg912@deploy2002: Started deploy [analytics/refinery@e954b12]: Regular analytics weekly train [analytics/refinery@e954b12a]
  • 03:56 kart_: Updated cxserver to 2023-09-28-043003-production (T343450, T347389, T338689)
  • 03:56 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 03:55 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 03:51 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 03:51 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 03:48 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 03:48 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply

2023-10-03

  • 23:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2162 (T343198)', diff saved to https://phabricator.wikimedia.org/P52812 and previous config saved to /var/cache/conftool/dbconfig/20231003-234343-arnaudb.json
  • 23:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 23:43 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 23:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T343198)', diff saved to https://phabricator.wikimedia.org/P52811 and previous config saved to /var/cache/conftool/dbconfig/20231003-234322-arnaudb.json
  • 23:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P52810 and previous config saved to /var/cache/conftool/dbconfig/20231003-232815-arnaudb.json
  • 23:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P52809 and previous config saved to /var/cache/conftool/dbconfig/20231003-231309-arnaudb.json
  • 22:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T343198)', diff saved to https://phabricator.wikimedia.org/P52808 and previous config saved to /var/cache/conftool/dbconfig/20231003-225803-arnaudb.json
  • 22:22 jdrewniak@deploy2002: Finished scap: Backport for Web typography prototype survey (T347208), Correct a recently-added message, [Prototype] Change i18n message (T347208) (duration: 39m 08s)
  • 22:11 jdrewniak@deploy2002: jdrewniak: Continuing with sync
  • 22:01 jdrewniak@deploy2002: jdrewniak: Backport for Web typography prototype survey (T347208), Correct a recently-added message, [Prototype] Change i18n message (T347208) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:43 jdrewniak@deploy2002: Started scap: Backport for Web typography prototype survey (T347208), Correct a recently-added message, [Prototype] Change i18n message (T347208)
  • 21:32 jdrewniak@deploy2002: Finished scap: Backport for Promote several Wikipedias to Vector 2022 as default skin (T347321) (duration: 09m 26s)
  • 21:26 jdrewniak@deploy2002: jdlrobson and jdrewniak: Continuing with sync
  • 21:24 jdrewniak@deploy2002: jdlrobson and jdrewniak: Backport for Promote several Wikipedias to Vector 2022 as default skin (T347321) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:23 jdrewniak@deploy2002: Started scap: Backport for Promote several Wikipedias to Vector 2022 as default skin (T347321)
  • 20:56 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1004.eqiad.wmnet with OS bullseye
  • 20:56 eileen: tools upgraded from 130ca87e to 2e19cd39
  • 20:50 jdrewniak@deploy2002: Finished scap: Backport for Re-enable Extension:ParserMigration on labs (T333179) (duration: 38m 52s)
  • 20:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1003.eqiad.wmnet with OS bullseye
  • 20:35 jdrewniak@deploy2002: jdrewniak and sbailey: Continuing with sync
  • 20:34 jdrewniak@deploy2002: jdrewniak and sbailey: Backport for Re-enable Extension:ParserMigration on labs (T333179) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:16 fabfur: merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/963081 (T347837). `purged` daemon will be restarted by puppet in eqsin in the next 30m
  • 20:11 jdrewniak@deploy2002: Started scap: Backport for Re-enable Extension:ParserMigration on labs (T333179)
  • 19:41 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
  • 19:38 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1004.eqiad.wmnet with OS bullseye
  • 19:16 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
  • 19:16 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1003.eqiad.wmnet with OS bullseye
  • 19:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1004.eqiad.wmnet with OS bullseye
  • 19:15 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1003.eqiad.wmnet with OS bullseye
  • 19:15 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
  • 19:15 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1003.eqiad.wmnet with OS bullseye
  • 18:48 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1003.eqiad.wmnet with OS bullseye
  • 18:25 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.29 refs T347080
  • 18:13 jhuneidi@deploy2002: Pruned MediaWiki: 1.41.0-wmf.27 (duration: 02m 14s)
  • 18:11 jhuneidi@deploy2002: Finished scap: testwikis wikis to 1.41.0-wmf.29 refs T347080 (duration: 43m 24s)
  • 17:34 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1004.eqiad.wmnet with OS bullseye
  • 17:34 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
  • 17:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1004.eqiad.wmnet with OS bullseye
  • 17:33 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
  • 17:33 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1003.eqiad.wmnet with OS bullseye
  • 17:28 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1027.eqiad.wmnet
  • 17:28 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase1027.eqiad.wmnet
  • 17:27 jhuneidi@deploy2002: Started scap: testwikis wikis to 1.41.0-wmf.29 refs T347080
  • 17:17 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1027.eqiad.wmnet with OS bullseye
  • 17:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1003.eqiad.wmnet with OS bullseye
  • 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly racked kubernetes2054 hosts in codfw - jhancock@cumin2002"
  • 17:08 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly racked kubernetes2054 hosts in codfw - jhancock@cumin2002"
  • 17:04 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1004.eqiad.wmnet with OS bullseye
  • 17:02 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 16:59 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 16:59 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 16:57 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly racked kubernetes2054 hosts in codfw - jhancock@cumin2002"
  • 16:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly racked kubernetes2054 hosts in codfw - jhancock@cumin2002"
  • 16:54 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 16:52 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1027.eqiad.wmnet with reason: host reimage
  • 16:50 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1027.eqiad.wmnet with reason: host reimage
  • 16:37 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1027.eqiad.wmnet with OS bullseye
  • 16:36 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1027.eqiad.wmnet
  • 16:36 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1027.eqiad.wmnet
  • 16:24 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1027.eqiad.wmnet
  • 16:23 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1027.eqiad.wmnet
  • 16:20 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
  • 16:20 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
  • 16:19 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
  • 16:19 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
  • 16:09 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
  • 16:08 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
  • 16:07 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1026.eqiad.wmnet with OS bullseye
  • 16:06 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
  • 16:06 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
  • 16:04 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
  • 16:03 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
  • 16:03 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
  • 16:03 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
  • 16:02 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
  • 16:01 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
  • 15:57 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
  • 15:57 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
  • 15:57 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
  • 15:51 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
  • 15:49 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
  • 15:49 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1003.eqiad.wmnet with OS bullseye
  • 15:40 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1026.eqiad.wmnet with reason: host reimage
  • 15:37 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1004.eqiad.wmnet with OS bullseye
  • 15:37 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1026.eqiad.wmnet with reason: host reimage
  • 15:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:32 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:26 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:26 otto@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:24 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1026.eqiad.wmnet with OS bullseye
  • 15:24 ottomata: mw-page-content-change-enrich - backfill is done, set replicas to 2 in eqiad and codfw
  • 15:23 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:23 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1026.eqiad.wmnet
  • 15:23 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1026.eqiad.wmnet
  • 15:22 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:19 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1003.eqiad.wmnet with OS bullseye
  • 15:11 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1026.eqiad.wmnet
  • 15:10 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1026.eqiad.wmnet
  • 15:10 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1033.eqiad.wmnet
  • 15:10 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase1033.eqiad.wmnet
  • 15:07 brennen@deploy2002: Finished deploy [phabricator/deployment@6f19600]: deploy to phab1004 for T348007 (duration: 00m 44s)
  • 15:06 brennen@deploy2002: Started deploy [phabricator/deployment@6f19600]: deploy to phab1004 for T348007
  • 15:06 brennen@deploy2002: Finished deploy [phabricator/deployment@6f19600]: test deploy to phab2002 for T348007 (duration: 00m 32s)
  • 15:06 brennen@deploy2002: Started deploy [phabricator/deployment@6f19600]: test deploy to phab2002 for T348007
  • 15:05 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1004.eqiad.wmnet with reason: Phabricator deploys
  • 15:05 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1004.eqiad.wmnet with reason: Phabricator deploys
  • 14:55 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-master1004']
  • 14:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.4 - ayounsi@cumin1001
  • 14:49 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004']
  • 14:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1004']
  • 14:49 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004']
  • 14:48 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.4 - ayounsi@cumin1001
  • 14:47 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004']
  • 14:47 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1004']
  • 14:47 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004']
  • 14:46 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-master1003']
  • 14:46 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1004']
  • 14:46 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004']
  • 14:46 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1004']
  • 14:46 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004']
  • 14:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1004']
  • 14:45 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004']
  • 14:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1004']
  • 14:44 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004']
  • 14:44 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1004']
  • 14:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004']
  • 14:43 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1003']
  • 14:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1003']
  • 14:43 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1004']
  • 14:42 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004']
  • 14:42 jclark@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['an-master1003']
  • 14:42 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1003']
  • 14:40 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1003']
  • 14:39 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1003']
  • 14:39 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1003']
  • 14:39 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1003']
  • 14:39 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1003']
  • 14:39 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1004']
  • 14:39 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004']
  • 14:38 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1004']
  • 14:38 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004']
  • 14:38 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1003']
  • 14:38 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1003']
  • 14:38 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1003']
  • 14:38 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1003']
  • 14:38 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 14:37 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1003']
  • 14:37 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 14:37 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 14:37 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 14:37 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 14:37 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1003']
  • 14:37 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 14:37 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 14:37 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 14:37 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 14:36 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 14:36 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 14:36 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 14:36 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 14:36 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 14:36 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 14:36 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 14:36 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:35 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:35 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:35 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:31 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1033.eqiad.wmnet with OS bullseye
  • 14:07 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe2002.codfw.wmnet with OS bullseye
  • 14:01 fabfur: merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/963020 (T347837). `purged` daemon will be restarted by puppet in codfw in the next 30m
  • 14:01 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1033.eqiad.wmnet with reason: host reimage
  • 13:59 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1003.eqiad.wmnet with OS bullseye
  • 13:58 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-master1003
  • 13:57 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host an-master1003
  • 13:57 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1033.eqiad.wmnet with reason: host reimage
  • 13:50 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-master1004
  • 13:50 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:50 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Revert allocation of LVS VIPs for recommendation-api-ng - klausman@cumin1001"
  • 13:49 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host an-master1004
  • 13:49 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt-wdqs1001.eqiad.wmnet
  • 13:49 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:49 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2002.codfw.wmnet with reason: host reimage
  • 13:48 klausman@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Revert allocation of LVS VIPs for recommendation-api-ng - klausman@cumin1001"
  • 13:48 taavi@cumin1001: START - Cookbook sre.dns.netbox
  • 13:46 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2002.codfw.wmnet with reason: host reimage
  • 13:44 klausman@cumin1001: START - Cookbook sre.dns.netbox
  • 13:43 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1033.eqiad.wmnet with OS bullseye
  • 13:43 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 13:43 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
  • 13:43 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 13:42 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1025.eqiad.wmnet
  • 13:42 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase1025.eqiad.wmnet
  • 13:41 taavi@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1001.eqiad.wmnet
  • 13:38 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:38 otto@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:38 ottomata: mw-page-content-change-enrich codfw - bump to 1.27.0 and set replicas to 12 while processing backlog - T347676
  • 13:34 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:34 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:34 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1025.eqiad.wmnet with OS bullseye
  • 13:34 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt-wdqs1003.eqiad.wmnet
  • 13:34 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:34 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt-wdqs1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - taavi@cumin1001"
  • 13:34 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1026.eqiad.wmnet
  • 13:33 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1026.eqiad.wmnet
  • 13:33 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt-wdqs1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - taavi@cumin1001"
  • 13:30 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:30 otto@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:30 taavi@cumin1001: START - Cookbook sre.dns.netbox
  • 13:27 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe2002.codfw.wmnet with OS bullseye
  • 13:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2154 (T343198)', diff saved to https://phabricator.wikimedia.org/P52807 and previous config saved to /var/cache/conftool/dbconfig/20231003-132733-arnaudb.json
  • 13:27 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 13:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 13:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T343198)', diff saved to https://phabricator.wikimedia.org/P52806 and previous config saved to /var/cache/conftool/dbconfig/20231003-132700-arnaudb.json
  • 13:23 taavi@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1003.eqiad.wmnet
  • 13:23 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt-wdqs1002.eqiad.wmnet
  • 13:23 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:23 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt-wdqs1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - taavi@cumin1001"
  • 13:22 samtar@deploy2002: Finished scap: Backport for arwiki: add importsources (T347563), add throttle rules for Ada Lovelace Day October 10, 2023 and fix throttle rule for UIUC Wikipedia edit-a-thon October 13, 2023 (T347719) (duration: 09m 03s)
  • 13:21 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt-wdqs1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - taavi@cumin1001"
  • 13:19 taavi@cumin1001: START - Cookbook sre.dns.netbox
  • 13:16 samtar@deploy2002: anzx and samtar: Continuing with sync
  • 13:14 samtar@deploy2002: anzx and samtar: Backport for arwiki: add importsources (T347563), add throttle rules for Ada Lovelace Day October 10, 2023 and fix throttle rule for UIUC Wikipedia edit-a-thon October 13, 2023 (T347719) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:13 samtar@deploy2002: Started scap: Backport for arwiki: add importsources (T347563), add throttle rules for Ada Lovelace Day October 10, 2023 and fix throttle rule for UIUC Wikipedia edit-a-thon October 13, 2023 (T347719)
  • 13:12 taavi@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1002.eqiad.wmnet
  • 13:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P52805 and previous config saved to /var/cache/conftool/dbconfig/20231003-131154-arnaudb.json
  • 13:10 samtar@deploy2002: Finished scap: Backport for New donor experience stream for apps event schema (duration: 08m 26s)
  • 13:07 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1025.eqiad.wmnet with reason: host reimage
  • 13:04 samtar@deploy2002: sharvaniharan and samtar: Continuing with sync
  • 13:03 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1025.eqiad.wmnet with reason: host reimage
  • 13:03 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe2001.codfw.wmnet with OS bullseye
  • 13:03 samtar@deploy2002: sharvaniharan and samtar: Backport for New donor experience stream for apps event schema synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:01 samtar@deploy2002: Started scap: Backport for New donor experience stream for apps event schema
  • 12:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P52804 and previous config saved to /var/cache/conftool/dbconfig/20231003-125647-arnaudb.json
  • 12:50 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1025.eqiad.wmnet with OS bullseye
  • 12:45 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: host reimage
  • 12:42 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: host reimage
  • 12:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T343198)', diff saved to https://phabricator.wikimedia.org/P52803 and previous config saved to /var/cache/conftool/dbconfig/20231003-124141-arnaudb.json
  • 12:23 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe2001.codfw.wmnet with OS bullseye
  • 11:54 fabfur: merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/963004 (T347837). `purged` daemon will be restarted by puppet in ulsfo in the next 30m
  • 11:51 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 11:33 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1004.eqiad.wmnet with reason: host reimage
  • 11:29 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:29 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: openstack - aborrero@cumin1001"
  • 11:29 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1004.eqiad.wmnet with reason: host reimage
  • 11:29 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: openstack - aborrero@cumin1001"
  • 11:26 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 11:11 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 10:54 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe1003.eqiad.wmnet with OS bullseye
  • 10:36 vgutierrez@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:36 vgutierrez@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix katran-test.svc.eqiad.wmnet IP allocation - vgutierrez@cumin1001"
  • 10:35 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1003.eqiad.wmnet with reason: host reimage
  • 10:34 vgutierrez@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix katran-test.svc.eqiad.wmnet IP allocation - vgutierrez@cumin1001"
  • 10:32 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1003.eqiad.wmnet with reason: host reimage
  • 10:32 vgutierrez@cumin1001: START - Cookbook sre.dns.netbox
  • 10:30 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 10:19 vgutierrez@cumin1001: START - Cookbook sre.dns.netbox
  • 10:15 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1003.eqiad.wmnet with OS bullseye
  • 09:50 claime: Uncordoned kubernetes2010.codfw.wmnet
  • 09:50 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes2010.codfw.wmnet
  • 09:49 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubernetes2010.codfw.wmnet
  • 09:45 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe1002.eqiad.wmnet with OS bullseye
  • 09:42 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 09:42 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 09:38 ladsgroup@deploy2002: Finished scap: Creating fonwiki (T347935) (duration: 07m 34s)
  • 09:30 ladsgroup@deploy2002: Started scap: Creating fonwiki (T347935)
  • 09:28 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on kubernetes2010.codfw.wmnet with reason: BIOS setting change
  • 09:28 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on kubernetes2010.codfw.wmnet with reason: BIOS setting change
  • 09:27 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1002.eqiad.wmnet with reason: host reimage
  • 09:26 claime: Draining kubernetes2010.codfw.wmnet for reboot to change BIOS setting
  • 09:24 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1002.eqiad.wmnet with reason: host reimage
  • 09:07 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1002.eqiad.wmnet with OS bullseye
  • 09:06 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 09:06 isaranto@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 09:05 isaranto@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 08:27 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ml-staging2001.codfw.wmnet with reason: Check chassis internals for GPU hosting
  • 08:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ml-staging2001.codfw.wmnet with reason: Check chassis internals for GPU hosting
  • 08:26 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe1001.eqiad.wmnet with OS bullseye
  • 08:17 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:15 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 08:14 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:13 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 08:12 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 08:09 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 08:03 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: host reimage
  • 08:01 taavi: taavi@mwmaint2002 ~ $ mwscript resetAuthenticationThrottle.php --wiki=enwiki --signup --ip=155.232.7.202 # T347874
  • 07:59 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: host reimage
  • 07:57 taavi@deploy2002: Finished scap: T347874 and T347069 (duration: 29m 22s)
  • 07:42 taavi@deploy2002: taavi: Continuing with sync
  • 07:42 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1001.eqiad.wmnet with OS bullseye
  • 07:40 taavi@deploy2002: taavi: T347874 and T347069 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:27 taavi@deploy2002: Started scap: T347874 and T347069
  • 07:03 kart_: Updated MinT to 2023-09-28-043052-production (T343450, T341478)
  • 07:03 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 06:59 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 06:56 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 06:51 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 06:45 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 06:42 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 06:42 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 06:42 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 05:52 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on druid1009.eqiad.wmnet with reason: Downtime as we setup the host to join the druid and zookeper cluster
  • 05:52 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on druid1009.eqiad.wmnet with reason: Downtime as we setup the host to join the druid and zookeper cluster
  • 04:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 04:20 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 04:20 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 04:13 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 04:12 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 04:11 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 04:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 04:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 04:09 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 04:08 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 04:08 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 04:07 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 04:05 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 04:05 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 04:05 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 03:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2152 (T343198)', diff saved to https://phabricator.wikimedia.org/P52802 and previous config saved to /var/cache/conftool/dbconfig/20231003-034640-arnaudb.json
  • 03:46 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 03:46 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 02:41 krinkle@deploy2002: Finished scap: (no justification provided) (duration: 07m 34s)
  • 02:33 krinkle@deploy2002: Started scap: (no justification provided)
  • 02:17 krinkle@deploy2002: Synchronized docroot/noc/: (no justification provided) (duration: 08m 03s)
  • 01:48 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1033.eqiad.wmnet
  • 01:48 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1033.eqiad.wmnet
  • 01:35 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1033.eqiad.wmnet
  • 01:34 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1033.eqiad.wmnet
  • 01:33 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1025.eqiad.wmnet
  • 01:33 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1025.eqiad.wmnet
  • 01:21 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1025.eqiad.wmnet
  • 01:18 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1025.eqiad.wmnet
  • 01:06 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1025.eqiad.wmnet
  • 01:06 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1025.eqiad.wmnet
  • 00:39 ejegg: fundraising civicrm upgraded from c1b28287 to 995a3d5b
  • 00:38 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1033.eqiad.wmnet
  • 00:29 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1033.eqiad.wmnet
  • 00:28 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1033.eqiad.wmnet
  • 00:28 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1033.eqiad.wmnet
  • 00:28 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1025.eqiad.wmnet
  • 00:28 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1025.eqiad.wmnet

2023-10-02

  • 23:09 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1032.eqiad.wmnet with OS bullseye
  • 22:46 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1032.eqiad.wmnet with reason: host reimage
  • 22:43 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1032.eqiad.wmnet with reason: host reimage
  • 22:30 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1032.eqiad.wmnet with OS bullseye
  • 22:30 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1032.eqiad.wmnet with OS bullseye
  • 22:16 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1032.eqiad.wmnet with OS bullseye
  • 22:09 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1032.eqiad.wmnet
  • 22:01 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host restbase1032.eqiad.wmnet
  • 22:01 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1032.eqiad.wmnet
  • 22:00 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1032.eqiad.wmnet
  • 21:53 maryum: Deployed patch for T347704
  • 21:32 kindrobot: end UTC late backport window
  • 21:28 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1029.eqiad.wmnet
  • 21:28 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1029.eqiad.wmnet
  • 21:23 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1032.eqiad.wmnet
  • 21:23 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1032.eqiad.wmnet
  • 21:22 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1032.eqiad.wmnet
  • 21:22 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1032.eqiad.wmnet
  • 21:21 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1032.eqiad.wmnet
  • 21:21 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1032.eqiad.wmnet
  • 21:21 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1029.eqiad.wmnet with OS bullseye
  • 21:17 kindrobot@deploy2002: Finished scap: Backport for Ignore only site notices (T347645), HookUtils: Fix checking page props (T347878), Fix diff title escaping (T347578), Diff: Add missing .mw-diff-inline-moved selector (duration: 10m 06s)
  • 21:11 kindrobot@deploy2002: kindrobot and matmarex: Continuing with sync
  • 21:09 kindrobot@deploy2002: kindrobot and matmarex: Backport for Ignore only site notices (T347645), HookUtils: Fix checking page props (T347878), Fix diff title escaping (T347578), Diff: Add missing .mw-diff-inline-moved selector synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:07 kindrobot@deploy2002: Started scap: Backport for Ignore only site notices (T347645), HookUtils: Fix checking page props (T347878), Fix diff title escaping (T347578), Diff: Add missing .mw-diff-inline-moved selector
  • 20:59 ottomata: mw-page-content-change-enrich - CORRECTION - increase replicas to 20 to process backlog - T347676
  • 20:58 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:58 otto@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:57 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1029.eqiad.wmnet with reason: host reimage
  • 20:57 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:57 otto@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:56 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:56 otto@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:56 ottomata: mw-page-content-change-enrich - increase replicas to 24 to process backlog - T347676
  • 20:54 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1029.eqiad.wmnet with reason: host reimage
  • 20:42 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1029.eqiad.wmnet with OS bullseye
  • 20:40 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase1029.eqiad.wmnet with OS bullseye
  • 20:40 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1029.eqiad.wmnet with OS bullseye
  • 20:37 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase1029.eqiad.wmnet with OS bullseye
  • 20:37 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1029.eqiad.wmnet with OS bullseye
  • 20:36 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase1029.eqiad.wmnet with OS bullseye
  • 20:35 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1029.eqiad.wmnet with OS bullseye
  • 20:32 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1029.eqiad.wmnet with OS bullseye
  • 20:31 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1004.eqiad.wmnet with OS bullseye
  • 20:27 ottomata: mw-page-content-change-enrich - increase replicas to 12 to process backlog - T347676
  • 20:27 kindrobot@deploy2002: Finished scap: Backport for Undeploy Reader Demographics 2 pilot survey (T345951), DiscussionTools: Disable timestamp links in production initially (duration: 08m 49s)
  • 20:27 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1029.eqiad.wmnet with OS bullseye
  • 20:22 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1003.eqiad.wmnet with OS bullseye
  • 20:21 kindrobot@deploy2002: esanders and dani and kindrobot: Continuing with sync
  • 20:19 kindrobot@deploy2002: esanders and dani and kindrobot: Backport for Undeploy Reader Demographics 2 pilot survey (T345951), DiscussionTools: Disable timestamp links in production initially synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:18 kindrobot@deploy2002: Started scap: Backport for Undeploy Reader Demographics 2 pilot survey (T345951), DiscussionTools: Disable timestamp links in production initially
  • 20:13 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:13 otto@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:12 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:12 eileen: process control revision changed from b370644b to 9760851c
  • 20:12 otto@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:12 eileen: revision changed from b370644b to 9760851c
  • 20:11 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1029.eqiad.wmnet with OS bullseye
  • 20:11 kindrobot@deploy2002: Backport cancelled.
  • 20:01 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1029.eqiad.wmnet with OS bullseye
  • 20:01 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1029.eqiad.wmnet with OS bullseye
  • 19:54 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1029.eqiad.wmnet with OS bullseye
  • 19:53 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1029.eqiad.wmnet
  • 19:53 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1029.eqiad.wmnet
  • 19:53 moritzm: installing libvpx security updates
  • 19:41 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1029.eqiad.wmnet
  • 19:40 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1029.eqiad.wmnet
  • 19:40 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1024.eqiad.wmnet with OS bullseye
  • 19:38 eileen: civicrm upgraded from 7406cdf3 to c1b28287
  • 19:19 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1024.eqiad.wmnet with reason: host reimage
  • 19:16 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1024.eqiad.wmnet with reason: host reimage
  • 19:13 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-master1003']
  • 19:13 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1003']
  • 19:11 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
  • 19:02 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1024.eqiad.wmnet with OS bullseye
  • 19:02 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1029.eqiad.wmnet
  • 19:02 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1029.eqiad.wmnet
  • 19:01 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1003.eqiad.wmnet with OS bullseye
  • 19:00 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-master1003.eqiad.wmnet']
  • 19:00 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1003.eqiad.wmnet']
  • 19:00 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1004.eqiad.wmnet with OS bullseye
  • 19:00 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1003.eqiad.wmnet with OS bullseye
  • 18:56 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1024.eqiad.wmnet
  • 18:56 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1024.eqiad.wmnet
  • 18:44 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1024.eqiad.wmnet
  • 18:44 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1024.eqiad.wmnet
  • 18:42 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1023.eqiad.wmnet
  • 18:42 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase1023.eqiad.wmnet
  • 18:40 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1023.eqiad.wmnet with OS bullseye
  • 18:16 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1023.eqiad.wmnet with reason: host reimage
  • 18:13 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1023.eqiad.wmnet with reason: host reimage
  • 17:59 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1023.eqiad.wmnet with OS bullseye
  • 17:59 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1022.eqiad.wmnet
  • 17:59 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase1022.eqiad.wmnet
  • 17:57 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 17:56 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 17:39 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
  • 17:39 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1003.eqiad.wmnet with OS bullseye
  • 17:38 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1022.eqiad.wmnet with OS bullseye
  • 17:30 sukhe: A:dns-rec enable puppet and run agent
  • 17:24 sukhe: sudo cumin "A:dns-rec" "disable-puppet 'merging CR 962648'"
  • 17:18 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 17:18 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 17:17 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 17:17 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 17:17 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 17:17 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 17:12 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1022.eqiad.wmnet with reason: host reimage
  • 17:09 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1022.eqiad.wmnet with reason: host reimage
  • 17:00 fabfur: upgrade purged package to version 0.21+deb12u1 cp4052 (bookworm) (T347837)
  • 16:56 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1022.eqiad.wmnet with OS bullseye
  • 16:55 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1031.eqiad.wmnet with OS bullseye
  • 16:39 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T347624, testing new cookbook changes) xfer categories from wdqs2024.codfw.wmnet -> wdqs2025.codfw.wmnet, repooling both afterwards w/ encryption
  • 16:30 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer (T347624, testing new cookbook changes) xfer categories from wdqs2024.codfw.wmnet -> wdqs2025.codfw.wmnet, repooling both afterwards w/ encryption
  • 16:29 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1031.eqiad.wmnet with reason: host reimage
  • 16:26 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1031.eqiad.wmnet with reason: host reimage
  • 16:13 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1031.eqiad.wmnet with OS bullseye
  • 16:08 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1028.eqiad.wmnet with OS bullseye
  • 16:06 fabfur: importing into bookworm-wikimedia package purged_0.21+deb12u1_amd64 (T347837)
  • 15:44 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 15:43 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
  • 15:43 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1028.eqiad.wmnet with reason: host reimage
  • 15:40 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1028.eqiad.wmnet with reason: host reimage
  • 15:29 sukhe: enable puppet on A:dns-rec and force agent run
  • 15:28 joal@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:28 joal@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:27 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1028.eqiad.wmnet with OS bullseye
  • 15:27 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1028.eqiad.wmnet
  • 15:27 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase1028.eqiad.wmnet
  • 15:24 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1021.eqiad.wmnet
  • 15:24 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase1021.eqiad.wmnet
  • 15:23 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.failover (exit_code=0) Failover of gitlab from gitlab1003.wikimedia.org to gitlab2002.wikimedia.org
  • 15:20 jelto@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) https://gitlab-replica.wikimedia.org/ https://gitlab-replica-old.wikimedia.org/ on all recursors
  • 15:20 jelto@cumin1001: START - Cookbook sre.dns.wipe-cache https://gitlab-replica.wikimedia.org/ https://gitlab-replica-old.wikimedia.org/ on all recursors
  • 15:02 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1021.eqiad.wmnet with OS bullseye
  • 15:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1229.eqiad.wmnet with OS bullseye
  • 14:55 elukey: restart kubelet on ml-serve1001 (high latencies registered)
  • 14:51 fabfur: upgrade purged package to version 0.21+deb11u1 on all cp hosts (T347837)
  • 14:48 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:48 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:47 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:47 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding new host ganeti-test2004 - jhancock@cumin2002"
  • 14:46 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding new host ganeti-test2004 - jhancock@cumin2002"
  • 14:44 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 14:40 stevemunene@cumin1001: END (FAIL) - Cookbook sre.druid.roll-restart-workers (exit_code=99) for Druid public cluster: Roll restart of Druid jvm daemons.
  • 14:37 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1021.eqiad.wmnet with reason: host reimage
  • 14:34 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1021.eqiad.wmnet with reason: host reimage
  • 14:23 fabfur: importing into bullseye-wikimedia package purged_0.21+deb11u1_amd64 (T347837)
  • 14:20 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1021.eqiad.wmnet with OS bullseye
  • 14:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 14:18 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1020.eqiad.wmnet with OS bullseye
  • 14:17 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 14:17 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 14:15 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 14:09 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
  • 14:09 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
  • 14:03 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 14:01 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 13:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
  • 13:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1228.eqiad.wmnet with OS bullseye
  • 13:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 13:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1227.eqiad.wmnet with OS bullseye
  • 13:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 13:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 13:54 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1020.eqiad.wmnet with reason: host reimage
  • 13:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 13:52 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1020.eqiad.wmnet with reason: host reimage
  • 13:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1229.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1228.eqiad.wmnet with reason: host reimage
  • 13:40 taavi@deploy2002: Finished scap: Backport for Add 'testwikis' DB list to MWMultiVersion::DB_LISTS (T341110) (duration: 11m 15s)
  • 13:39 stevemunene@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons.
  • 13:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage
  • 13:39 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
  • 13:38 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/page-analytics: apply
  • 13:38 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1020.eqiad.wmnet with OS bullseye
  • 13:38 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
  • 13:38 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
  • 13:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db1229.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1228.eqiad.wmnet with reason: host reimage
  • 13:36 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1229.eqiad.wmnet with OS bullseye
  • 13:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage
  • 13:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
  • 13:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1228.eqiad.wmnet with OS bullseye
  • 13:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1227.eqiad.wmnet with OS bullseye
  • 13:34 taavi@deploy2002: taavi and dreamyjazz: Continuing with sync
  • 13:30 taavi@deploy2002: taavi and dreamyjazz: Backport for Add 'testwikis' DB list to MWMultiVersion::DB_LISTS (T341110) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:29 taavi@deploy2002: Started scap: Backport for Add 'testwikis' DB list to MWMultiVersion::DB_LISTS (T341110)
  • 13:27 taavi@deploy2002: Sync cancelled.
  • 13:19 taavi@deploy2002: taavi and dreamyjazz: Backport for clienthints: Enable display on testwikis and four production wikis (T341110) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:15 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:13 sukhe: disable puppet on A:dns-rec to merge CR 961818
  • 13:11 taavi@deploy2002: Started scap: Backport for clienthints: Enable display on testwikis and four production wikis (T341110)
  • 13:01 jelto@cumin1001: START - Cookbook sre.gitlab.failover Failover of gitlab from gitlab1003.wikimedia.org to gitlab2002.wikimedia.org
  • 12:39 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) bastion.bastioninfra-codfw1dev.codfw1dev.wmcloud.org on all recursors
  • 12:39 aborrero@cumin1001: START - Cookbook sre.dns.wipe-cache bastion.bastioninfra-codfw1dev.codfw1dev.wmcloud.org on all recursors
  • 12:34 aikochou@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 12:31 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) bastion.bastioninfra-codfw1dev.codfw1dev.wmcloud.org on all recursors
  • 12:31 aborrero@cumin1001: START - Cookbook sre.dns.wipe-cache bastion.bastioninfra-codfw1dev.codfw1dev.wmcloud.org on all recursors
  • 12:29 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:29 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: openstack codfw1dev - aborrero@cumin1001"
  • 12:25 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: openstack codfw1dev - aborrero@cumin1001"
  • 12:22 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 12:18 aikochou@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 12:12 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
  • 12:12 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/page-analytics: apply
  • 12:09 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/page-analytics: apply
  • 12:04 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/page-analytics: apply
  • 11:56 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/page-analytics: apply
  • 11:55 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
  • 11:55 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/media-analytics: apply
  • 11:55 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
  • 11:55 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
  • 11:54 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
  • 11:51 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/media-analytics: apply
  • 11:49 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 11:49 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 11:47 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) bastion.bastioninfra-codfw1dev.codfw1dev.wmcloud.org on all recursors
  • 11:47 aborrero@cumin1001: START - Cookbook sre.dns.wipe-cache bastion.bastioninfra-codfw1dev.codfw1dev.wmcloud.org on all recursors
  • 11:46 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 11:45 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 11:42 hnowlan@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:42 hnowlan@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:40 hnowlan@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:38 hnowlan@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 11:35 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
  • 10:58 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
  • 10:58 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply
  • 10:55 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
  • 10:54 jiji@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply
  • 10:49 fabfur: swap purged on cp4040 to use UDS instead of TCP for Varnish (T347837)
  • 10:43 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
  • 10:43 jiji@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply
  • 10:34 fabfur: depool cp4040 to test new purged version (T347837)
  • 09:48 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add codfw new switches - cmooney@cumin1001"
  • 09:47 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add codfw new switches - cmooney@cumin1001"
  • 09:06 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol1006.eqiad.wmnet with OS bullseye
  • 08:34 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1006.eqiad.wmnet with reason: host reimage
  • 08:31 taavi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1006.eqiad.wmnet with reason: host reimage
  • 08:24 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 08:21 taavi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcontrol1006
  • 08:21 taavi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudcontrol1006
  • 08:18 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 08:17 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 07:49 godog: +150G to prometheus@k8s in codfw
  • 07:47 taavi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1006.eqiad.wmnet with OS bullseye
  • 07:46 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "cloudcontrol1006 - taavi@cumin1001"
  • 07:45 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "cloudcontrol1006 - taavi@cumin1001"
  • 07:37 joal@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 07:37 joal@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 07:36 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:36 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign new IPs to cloudcontrol1006 - taavi@cumin1001"
  • 07:35 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign new IPs to cloudcontrol1006 - taavi@cumin1001"
  • 07:32 taavi@cumin1001: START - Cookbook sre.dns.netbox
  • 07:31 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:31 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign new IPs to cloudcontrol1006 - taavi@cumin1001"
  • 07:30 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign new IPs to cloudcontrol1006 - taavi@cumin1001"
  • 07:28 taavi@cumin1001: START - Cookbook sre.dns.netbox
  • 05:11 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 05:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .

2023-10-01

  • 01:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T343198)', diff saved to https://phabricator.wikimedia.org/P52799 and previous config saved to /var/cache/conftool/dbconfig/20231001-013851-arnaudb.json
  • 01:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P52798 and previous config saved to /var/cache/conftool/dbconfig/20231001-012344-arnaudb.json
  • 01:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P52797 and previous config saved to /var/cache/conftool/dbconfig/20231001-010838-arnaudb.json
  • 00:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T343198)', diff saved to https://phabricator.wikimedia.org/P52796 and previous config saved to /var/cache/conftool/dbconfig/20231001-005332-arnaudb.json


Other archives

2000s

2010s

2020s

.