Jump to content

Server Admin Log/Archive 73

From Wikitech
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.


2023-11-30

  • 23:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2097.codfw.wmnet with reason: host reimage
  • 23:56 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2094.codfw.wmnet with reason: host reimage
  • 23:56 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2099.codfw.wmnet with OS bookworm
  • 23:55 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2096.codfw.wmnet with OS bookworm
  • 23:55 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:54 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2097.codfw.wmnet with reason: host reimage
  • 23:52 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:52 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1105.eqiad.wmnet with reason: host reimage
  • 23:50 krinkle@deploy2002: Synchronized docroot/noc/: (no justification provided) (duration: 08m 28s)
  • 23:49 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1107.eqiad.wmnet with reason: host reimage
  • 23:46 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1105.eqiad.wmnet with reason: host reimage
  • 23:45 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1107.eqiad.wmnet with reason: host reimage
  • 23:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2098.codfw.wmnet with OS bookworm
  • 23:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T348183)', diff saved to https://phabricator.wikimedia.org/P54056 and previous config saved to /var/cache/conftool/dbconfig/20231130-234322-arnaudb.json
  • 23:36 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2097.codfw.wmnet with OS bookworm
  • 23:35 foks: removing 1 file for legal compliance
  • 23:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2096.codfw.wmnet with reason: host reimage
  • 23:31 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2096.codfw.wmnet with reason: host reimage
  • 23:31 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1107.eqiad.wmnet with OS bookworm
  • 23:31 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1105.eqiad.wmnet with OS bookworm
  • 23:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P54055 and previous config saved to /var/cache/conftool/dbconfig/20231130-232815-arnaudb.json
  • 23:18 foks: removing 1 file for legal compliance
  • 23:16 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2094.codfw.wmnet with OS bookworm
  • 23:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P54054 and previous config saved to /var/cache/conftool/dbconfig/20231130-231309-arnaudb.json
  • 23:11 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2096.codfw.wmnet with OS bookworm
  • 23:06 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2094.codfw.wmnet with OS bookworm
  • 23:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2095.codfw.wmnet with OS bookworm
  • 23:05 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:04 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:03 foks: removing 5 files for legal compliance
  • 22:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T348183)', diff saved to https://phabricator.wikimedia.org/P54053 and previous config saved to /var/cache/conftool/dbconfig/20231130-225802-arnaudb.json
  • 22:46 wfan: payments-wiki upgraded from 7feabffe to b37ab50e
  • 22:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2095.codfw.wmnet with reason: host reimage
  • 22:42 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2095.codfw.wmnet with reason: host reimage
  • 22:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2190 (T348183)', diff saved to https://phabricator.wikimedia.org/P54051 and previous config saved to /var/cache/conftool/dbconfig/20231130-222836-arnaudb.json
  • 22:28 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2190.codfw.wmnet with reason: Maintenance
  • 22:28 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2190.codfw.wmnet with reason: Maintenance
  • 22:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T348183)', diff saved to https://phabricator.wikimedia.org/P54050 and previous config saved to /var/cache/conftool/dbconfig/20231130-222814-arnaudb.json
  • 22:24 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2095.codfw.wmnet with OS bookworm
  • 22:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2093.codfw.wmnet with OS bookworm
  • 22:23 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P54048 and previous config saved to /var/cache/conftool/dbconfig/20231130-221308-arnaudb.json
  • 22:08 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:03 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1107.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:02 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1105.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:00 jclark@cumin1001: START - Cookbook sre.hosts.provision for host elastic1105.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:00 jclark@cumin1001: START - Cookbook sre.hosts.provision for host elastic1107.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P54047 and previous config saved to /var/cache/conftool/dbconfig/20231130-215759-arnaudb.json
  • 21:55 dancy@deploy2002: Finished scap: Backport for Increase "large" font-size option for client-preferences (T351693) (duration: 10m 01s)
  • 21:54 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1107.eqiad.wmnet with OS bookworm
  • 21:54 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1105.eqiad.wmnet with OS bookworm
  • 21:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2093.codfw.wmnet with reason: host reimage
  • 21:49 dancy@deploy2002: jdrewniak and dancy: Continuing with sync
  • 21:48 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2093.codfw.wmnet with reason: host reimage
  • 21:46 dancy@deploy2002: jdrewniak and dancy: Backport for Increase "large" font-size option for client-preferences (T351693) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:46 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2094.codfw.wmnet with OS bookworm
  • 21:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2092.codfw.wmnet with OS bookworm
  • 21:45 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:45 dancy@deploy2002: Started scap: Backport for Increase "large" font-size option for client-preferences (T351693)
  • 21:43 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T348183)', diff saved to https://phabricator.wikimedia.org/P54046 and previous config saved to /var/cache/conftool/dbconfig/20231130-214252-arnaudb.json
  • 21:30 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2093.codfw.wmnet with OS bookworm
  • 21:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2092.codfw.wmnet with reason: host reimage
  • 21:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2092.codfw.wmnet with reason: host reimage
  • 21:18 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic1105.eqiad.wmnet with OS bookworm
  • 21:18 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic1107.eqiad.wmnet with OS bookworm
  • 21:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T348183)', diff saved to https://phabricator.wikimedia.org/P54045 and previous config saved to /var/cache/conftool/dbconfig/20231130-211412-arnaudb.json
  • 21:14 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 21:13 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 21:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T348183)', diff saved to https://phabricator.wikimedia.org/P54044 and previous config saved to /var/cache/conftool/dbconfig/20231130-211349-arnaudb.json
  • 21:03 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2092.codfw.wmnet with OS bookworm
  • 20:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P54043 and previous config saved to /var/cache/conftool/dbconfig/20231130-205843-arnaudb.json
  • 20:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P54042 and previous config saved to /var/cache/conftool/dbconfig/20231130-204336-arnaudb.json
  • 20:38 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1104.eqiad.wmnet with OS bookworm
  • 20:38 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 20:37 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 20:37 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1106.eqiad.wmnet with OS bookworm
  • 20:37 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 20:37 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1103.eqiad.wmnet with OS bookworm
  • 20:37 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 20:35 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 20:30 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 20:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T348183)', diff saved to https://phabricator.wikimedia.org/P54041 and previous config saved to /var/cache/conftool/dbconfig/20231130-202830-arnaudb.json
  • 20:17 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1103.eqiad.wmnet with reason: host reimage
  • 20:15 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1104.eqiad.wmnet with reason: host reimage
  • 20:15 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on elastic1106.eqiad.wmnet with reason: host reimage
  • 20:14 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1106.eqiad.wmnet with reason: host reimage
  • 20:12 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1103.eqiad.wmnet with reason: host reimage
  • 20:12 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1104.eqiad.wmnet with reason: host reimage
  • 20:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T348183)', diff saved to https://phabricator.wikimedia.org/P54040 and previous config saved to /var/cache/conftool/dbconfig/20231130-200409-arnaudb.json
  • 20:04 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 20:03 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 20:03 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 20:03 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 20:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T348183)', diff saved to https://phabricator.wikimedia.org/P54039 and previous config saved to /var/cache/conftool/dbconfig/20231130-200342-arnaudb.json
  • 19:58 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1107.eqiad.wmnet with OS bookworm
  • 19:58 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1105.eqiad.wmnet with OS bookworm
  • 19:58 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1106.eqiad.wmnet with OS bookworm
  • 19:58 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1103.eqiad.wmnet with OS bookworm
  • 19:57 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1104.eqiad.wmnet with OS bookworm
  • 19:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic1104']
  • 19:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P54037 and previous config saved to /var/cache/conftool/dbconfig/20231130-194835-arnaudb.json
  • 19:41 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti1038']
  • 19:37 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti1037']
  • 19:37 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti1036']
  • 19:34 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti1035']
  • 19:33 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2028.codfw.wmnet
  • 19:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P54036 and previous config saved to /var/cache/conftool/dbconfig/20231130-193329-arnaudb.json
  • 19:33 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti1038']
  • 19:31 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti1037']
  • 19:30 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti1036']
  • 19:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit1002.wikimedia.org with OS bookworm
  • 19:29 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti1035']
  • 19:29 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic1107']
  • 19:28 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti1035']
  • 19:28 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti1035']
  • 19:27 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic1103']
  • 19:25 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2028.codfw.wmnet
  • 19:24 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1104']
  • 19:24 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1104']
  • 19:24 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1104']
  • 19:24 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1104.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:22 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1107']
  • 19:22 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic1106']
  • 19:22 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1107']
  • 19:21 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1106']
  • 19:21 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic1105']
  • 19:20 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1107.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:20 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1105']
  • 19:20 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1103']
  • 19:20 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1103']
  • 19:19 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1103.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:19 jclark@cumin1001: START - Cookbook sre.hosts.provision for host elastic1107.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:19 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic1105']
  • 19:19 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic1106']
  • 19:18 jclark@cumin1001: START - Cookbook sre.hosts.provision for host elastic1103.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T348183)', diff saved to https://phabricator.wikimedia.org/P54035 and previous config saved to /var/cache/conftool/dbconfig/20231130-191822-arnaudb.json
  • 19:17 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1103.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:17 jclark@cumin1001: START - Cookbook sre.hosts.provision for host elastic1103.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:15 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1103.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:15 jclark@cumin1001: START - Cookbook sre.hosts.provision for host elastic1103.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:15 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic1107']
  • 19:14 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1107']
  • 19:14 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1107.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:14 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1103.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:13 jclark@cumin1001: START - Cookbook sre.hosts.provision for host elastic1103.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:13 jclark@cumin1001: START - Cookbook sre.hosts.provision for host elastic1107.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:13 jclark@cumin1001: START - Cookbook sre.hosts.provision for host elastic1104.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:13 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit1002.wikimedia.org with reason: host reimage
  • 19:12 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic1107']
  • 19:12 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1103']
  • 19:12 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1104']
  • 19:12 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1103']
  • 19:11 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1107']
  • 19:11 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1106']
  • 19:11 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1105']
  • 19:11 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1104']
  • 19:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit1002.wikimedia.org with reason: host reimage
  • 19:09 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1104.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1002.wikimedia.org with OS bookworm
  • 18:56 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit1003.wikimedia.org with OS bookworm
  • 18:50 vriley@cumin1001: START - Cookbook sre.hosts.provision for host elastic1104.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:49 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host elastic1104
  • 18:49 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host elastic1104
  • 18:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T348183)', diff saved to https://phabricator.wikimedia.org/P54034 and previous config saved to /var/cache/conftool/dbconfig/20231130-184900-arnaudb.json
  • 18:48 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 18:48 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 18:40 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit1003.wikimedia.org with reason: host reimage
  • 18:36 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit1003.wikimedia.org with reason: host reimage
  • 18:24 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1003.wikimedia.org with OS bookworm
  • 18:22 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudrabbit1003.wikimedia.org with OS bookworm
  • 18:22 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 18:22 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 18:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T348183)', diff saved to https://phabricator.wikimedia.org/P54033 and previous config saved to /var/cache/conftool/dbconfig/20231130-182155-arnaudb.json
  • 18:15 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 18:14 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 18:13 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 18:13 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 18:13 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 18:12 bd808@deploy2002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 18:09 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1003.wikimedia.org with OS bookworm
  • 18:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P54032 and previous config saved to /var/cache/conftool/dbconfig/20231130-180648-arnaudb.json
  • 18:02 mutante: planet2003 - revoking old puppet cert, following the "fix forward" steps from T349619 - puppet running again
  • 17:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P54031 and previous config saved to /var/cache/conftool/dbconfig/20231130-175141-arnaudb.json
  • 17:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T348183)', diff saved to https://phabricator.wikimedia.org/P54030 and previous config saved to /var/cache/conftool/dbconfig/20231130-173635-arnaudb.json
  • 17:27 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 17:26 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 17:25 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 17:24 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 17:24 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 17:23 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 17:23 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 17:14 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 17:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T348183)', diff saved to https://phabricator.wikimedia.org/P54029 and previous config saved to /var/cache/conftool/dbconfig/20231130-170713-arnaudb.json
  • 17:07 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 17:07 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 17:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T348183)', diff saved to https://phabricator.wikimedia.org/P54028 and previous config saved to /var/cache/conftool/dbconfig/20231130-170650-arnaudb.json
  • 17:01 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ips to restbase servers in codfw - jhancock@cumin2002"
  • 17:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ips to restbase servers in codfw - jhancock@cumin2002"
  • 16:58 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 16:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P54027 and previous config saved to /var/cache/conftool/dbconfig/20231130-165144-arnaudb.json
  • 16:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P54026 and previous config saved to /var/cache/conftool/dbconfig/20231130-163637-arnaudb.json
  • 16:33 ladsgroup@deploy2002: Finished scap: Backport for Revert "PoolCounterConnectionManager: Add support for ipv6" (T352444) (duration: 09m 45s)
  • 16:27 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 16:26 ladsgroup@deploy2002: ladsgroup: Backport for Revert "PoolCounterConnectionManager: Add support for ipv6" (T352444) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:23 ladsgroup@deploy2002: Started scap: Backport for Revert "PoolCounterConnectionManager: Add support for ipv6" (T352444)
  • 16:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T348183)', diff saved to https://phabricator.wikimedia.org/P54025 and previous config saved to /var/cache/conftool/dbconfig/20231130-162131-arnaudb.json
  • 15:54 moritzm: installing stunnel4 bugfix updates from bookworm point release
  • 15:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T348183)', diff saved to https://phabricator.wikimedia.org/P54024 and previous config saved to /var/cache/conftool/dbconfig/20231130-155251-arnaudb.json
  • 15:52 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 15:52 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 15:42 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 15:42 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 15:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T348183)', diff saved to https://phabricator.wikimedia.org/P54023 and previous config saved to /var/cache/conftool/dbconfig/20231130-154227-arnaudb.json
  • 15:36 moritzm: installing minizip security updates
  • 15:33 sukhe: clean-up /etc/hosts on A:dns-rec to remove entries populated by host_core: T347054
  • 15:31 moritzm: installing dbus security updates on buster
  • 15:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P54022 and previous config saved to /var/cache/conftool/dbconfig/20231130-152721-arnaudb.json
  • 15:21 moritzm: installing libbsd bugfix updates from Bullseye point release
  • 15:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P54021 and previous config saved to /var/cache/conftool/dbconfig/20231130-151214-arnaudb.json
  • 15:08 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1105']
  • 15:07 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1103.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:07 arnaudb@cumin1001: dbctl commit (dc=all): 'change es3 master back to es2034', diff saved to https://phabricator.wikimedia.org/P54020 and previous config saved to /var/cache/conftool/dbconfig/20231130-150712-arnaudb.json
  • 15:04 arnaudb@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 100%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54019 and previous config saved to /var/cache/conftool/dbconfig/20231130-150434-arnaudb.json
  • 14:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T348183)', diff saved to https://phabricator.wikimedia.org/P54018 and previous config saved to /var/cache/conftool/dbconfig/20231130-145707-arnaudb.json
  • 14:54 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic1106']
  • 14:53 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1106']
  • 14:53 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1105']
  • 14:50 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1105']
  • 14:50 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic1105']
  • 14:49 arnaudb@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 90%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54017 and previous config saved to /var/cache/conftool/dbconfig/20231130-144929-arnaudb.json
  • 14:49 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1105']
  • 14:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1223 (T348183)', diff saved to https://phabricator.wikimedia.org/P54016 and previous config saved to /var/cache/conftool/dbconfig/20231130-144854-arnaudb.json
  • 14:48 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 14:48 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 14:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T348183)', diff saved to https://phabricator.wikimedia.org/P54015 and previous config saved to /var/cache/conftool/dbconfig/20231130-144831-arnaudb.json
  • 14:48 godog: roll-restart prometheus/ops in eqiad/codfw to apply new size-based retention - T351179
  • 14:46 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1107.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:45 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1105.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:45 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1106.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:36 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1104.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:34 vriley@cumin1001: START - Cookbook sre.hosts.provision for host elastic1106.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:34 vriley@cumin1001: START - Cookbook sre.hosts.provision for host elastic1105.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:34 vriley@cumin1001: START - Cookbook sre.hosts.provision for host elastic1107.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:34 arnaudb@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 80%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54013 and previous config saved to /var/cache/conftool/dbconfig/20231130-143424-arnaudb.json
  • 14:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P54012 and previous config saved to /var/cache/conftool/dbconfig/20231130-143325-arnaudb.json
  • 14:31 vriley@cumin1001: START - Cookbook sre.hosts.provision for host elastic1104.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:29 vriley@cumin1001: START - Cookbook sre.hosts.provision for host elastic1103.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema2004.codfw.wmnet
  • 14:27 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host elastic1104
  • 14:27 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host elastic1104
  • 14:26 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host elastic1107
  • 14:26 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host elastic1107
  • 14:25 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host elastic1106
  • 14:25 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host elastic1106
  • 14:24 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host elastic1105
  • 14:24 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host elastic1105
  • 14:23 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host elastic1103
  • 14:23 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host elastic1104
  • 14:23 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host elastic1104
  • 14:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2095.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:22 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host elastic1103
  • 14:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host schema2004.codfw.wmnet
  • 14:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2093.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2093.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2095.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:19 arnaudb@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 70%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54011 and previous config saved to /var/cache/conftool/dbconfig/20231130-141919-arnaudb.json
  • 14:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P54010 and previous config saved to /var/cache/conftool/dbconfig/20231130-141815-arnaudb.json
  • 14:15 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host planet2003.codfw.wmnet
  • 14:15 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host planet2003.codfw.wmnet
  • 14:14 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2093.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:14 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2095.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:11 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2095.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:11 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2093.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM stewards2001.codfw.wmnet
  • 14:04 arnaudb@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 60%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54009 and previous config saved to /var/cache/conftool/dbconfig/20231130-140414-arnaudb.json
  • 14:03 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM stewards2001.codfw.wmnet
  • 14:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T348183)', diff saved to https://phabricator.wikimedia.org/P54008 and previous config saved to /var/cache/conftool/dbconfig/20231130-140308-arnaudb.json
  • 13:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1212 (T348183)', diff saved to https://phabricator.wikimedia.org/P54007 and previous config saved to /var/cache/conftool/dbconfig/20231130-135453-arnaudb.json
  • 13:54 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 13:54 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 13:54 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 13:54 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 13:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T348183)', diff saved to https://phabricator.wikimedia.org/P54006 and previous config saved to /var/cache/conftool/dbconfig/20231130-135410-arnaudb.json
  • 13:49 arnaudb@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 50%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54005 and previous config saved to /var/cache/conftool/dbconfig/20231130-134909-arnaudb.json
  • 13:39 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P54004 and previous config saved to /var/cache/conftool/dbconfig/20231130-133904-arnaudb.json
  • 13:34 arnaudb@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 40%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54003 and previous config saved to /var/cache/conftool/dbconfig/20231130-133404-arnaudb.json
  • 13:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P54002 and previous config saved to /var/cache/conftool/dbconfig/20231130-132357-arnaudb.json
  • 13:19 arnaudb@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 30%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54001 and previous config saved to /var/cache/conftool/dbconfig/20231130-131859-arnaudb.json
  • 13:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1126.eqiad.wmnet
  • 13:12 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:12 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1126.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 13:11 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1126.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 13:09 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 13:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T348183)', diff saved to https://phabricator.wikimedia.org/P54000 and previous config saved to /var/cache/conftool/dbconfig/20231130-130851-arnaudb.json
  • 13:06 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2108.mgmt.codfw.wmnet with reboot policy FORCED
  • 13:05 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host elastic2108.mgmt.codfw.wmnet with reboot policy FORCED
  • 13:05 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host schema2004.codfw.wmnet
  • 13:04 arnaudb@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 20%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P53999 and previous config saved to /var/cache/conftool/dbconfig/20231130-130354-arnaudb.json
  • 13:03 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1126.eqiad.wmnet
  • 13:03 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 13:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1198 (T348183)', diff saved to https://phabricator.wikimedia.org/P53998 and previous config saved to /var/cache/conftool/dbconfig/20231130-130136-arnaudb.json
  • 13:02 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 13:01 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 13:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T348183)', diff saved to https://phabricator.wikimedia.org/P53997 and previous config saved to /var/cache/conftool/dbconfig/20231130-130113-arnaudb.json
  • 12:59 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host sretest1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 12:53 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host schema2004.codfw.wmnet
  • 12:48 arnaudb@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 10%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P53996 and previous config saved to /var/cache/conftool/dbconfig/20231130-124849-arnaudb.json
  • 12:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P53995 and previous config saved to /var/cache/conftool/dbconfig/20231130-124607-arnaudb.json
  • 12:41 arnaudb@cumin1001: dbctl commit (dc=all): 'es2034 is depooled', diff saved to https://phabricator.wikimedia.org/P53994 and previous config saved to /var/cache/conftool/dbconfig/20231130-124110-arnaudb.json
  • 12:40 arnaudb@cumin1001: dbctl commit (dc=all): 'change es3 master to es2029 as es2034 will reboot', diff saved to https://phabricator.wikimedia.org/P53993 and previous config saved to /var/cache/conftool/dbconfig/20231130-124050-arnaudb.json
  • 12:39 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2034.codfw.wmnet with reason: reboot
  • 12:39 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2034.codfw.wmnet with reason: reboot
  • 12:37 arnaudb@cumin1001: dbctl commit (dc=all): 'change es2 master to es2033 after reboot', diff saved to https://phabricator.wikimedia.org/P53992 and previous config saved to /var/cache/conftool/dbconfig/20231130-123752-arnaudb.json
  • 12:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P53991 and previous config saved to /var/cache/conftool/dbconfig/20231130-123100-arnaudb.json
  • 12:26 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes2059.codfw.wmnet
  • 12:26 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubernetes2059.codfw.wmnet
  • 12:24 claime: Uncordoning kubernetes20(5[4789]|60).codfw.wmnet - T352369
  • 12:22 claime: Pooling kubernetes20(5[4789]|60).codfw.wmnet - T352369
  • 12:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T348183)', diff saved to https://phabricator.wikimedia.org/P53990 and previous config saved to /var/cache/conftool/dbconfig/20231130-121554-arnaudb.json
  • 12:09 arnaudb@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 100%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P53989 and previous config saved to /var/cache/conftool/dbconfig/20231130-120911-arnaudb.json
  • 12:08 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 12:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T348183)', diff saved to https://phabricator.wikimedia.org/P53988 and previous config saved to /var/cache/conftool/dbconfig/20231130-120841-arnaudb.json
  • 12:08 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 12:08 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 12:08 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 12:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T348183)', diff saved to https://phabricator.wikimedia.org/P53987 and previous config saved to /var/cache/conftool/dbconfig/20231130-120819-arnaudb.json
  • 12:06 claime: Running homer 'cr*codfw*' commit T352369
  • 12:03 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 12:03 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 12:02 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 12:02 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 12:02 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2059.codfw.wmnet with OS bullseye
  • 11:54 arnaudb@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 90%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P53986 and previous config saved to /var/cache/conftool/dbconfig/20231130-115406-arnaudb.json
  • 11:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P53985 and previous config saved to /var/cache/conftool/dbconfig/20231130-115312-arnaudb.json
  • 11:45 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2060.codfw.wmnet with OS bullseye
  • 11:43 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2058.codfw.wmnet with OS bullseye
  • 11:39 arnaudb@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 80%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P53984 and previous config saved to /var/cache/conftool/dbconfig/20231130-113901-arnaudb.json
  • 11:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P53983 and previous config saved to /var/cache/conftool/dbconfig/20231130-113804-arnaudb.json
  • 11:33 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2057.codfw.wmnet with OS bullseye
  • 11:28 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2059.codfw.wmnet with reason: host reimage
  • 11:25 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2059.codfw.wmnet with reason: host reimage
  • 11:25 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2060.codfw.wmnet with reason: host reimage
  • 11:23 arnaudb@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 70%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P53982 and previous config saved to /var/cache/conftool/dbconfig/20231130-112356-arnaudb.json
  • 11:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T348183)', diff saved to https://phabricator.wikimedia.org/P53981 and previous config saved to /var/cache/conftool/dbconfig/20231130-112258-arnaudb.json
  • 11:22 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2058.codfw.wmnet with reason: host reimage
  • 11:20 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2060.codfw.wmnet with reason: host reimage
  • 11:19 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2058.codfw.wmnet with reason: host reimage
  • 11:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T348183)', diff saved to https://phabricator.wikimedia.org/P53980 and previous config saved to /var/cache/conftool/dbconfig/20231130-111546-arnaudb.json
  • 11:15 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 11:15 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 11:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T348183)', diff saved to https://phabricator.wikimedia.org/P53979 and previous config saved to /var/cache/conftool/dbconfig/20231130-111524-arnaudb.json
  • 11:14 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2057.codfw.wmnet with reason: host reimage
  • 11:11 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2057.codfw.wmnet with reason: host reimage
  • 11:08 arnaudb@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 60%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P53978 and previous config saved to /var/cache/conftool/dbconfig/20231130-110851-arnaudb.json
  • 11:01 klausman@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 11:00 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2060.codfw.wmnet with OS bullseye
  • 11:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P53977 and previous config saved to /var/cache/conftool/dbconfig/20231130-110017-arnaudb.json
  • 11:00 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2059.codfw.wmnet with OS bullseye
  • 10:59 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2058.codfw.wmnet with OS bullseye
  • 10:53 arnaudb@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 50%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P53976 and previous config saved to /var/cache/conftool/dbconfig/20231130-105346-arnaudb.json
  • 10:52 moritzm: installing python-git security updates
  • 10:50 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2057.codfw.wmnet with OS bullseye
  • 10:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P53975 and previous config saved to /var/cache/conftool/dbconfig/20231130-104510-arnaudb.json
  • 10:38 arnaudb@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 40%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P53974 and previous config saved to /var/cache/conftool/dbconfig/20231130-103841-arnaudb.json
  • 10:30 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T348183)', diff saved to https://phabricator.wikimedia.org/P53973 and previous config saved to /var/cache/conftool/dbconfig/20231130-103004-arnaudb.json
  • 10:23 arnaudb@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 30%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P53972 and previous config saved to /var/cache/conftool/dbconfig/20231130-102336-arnaudb.json
  • 10:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T348183)', diff saved to https://phabricator.wikimedia.org/P53971 and previous config saved to /var/cache/conftool/dbconfig/20231130-102255-arnaudb.json
  • 10:22 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 10:22 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 10:13 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 10:12 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 10:08 arnaudb@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 20%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P53970 and previous config saved to /var/cache/conftool/dbconfig/20231130-100830-arnaudb.json
  • 10:03 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 10:03 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 09:59 vgutierrez: rolling restart of pybal on lvs2011 and lvs2014, effectively enabling IPIP encapsulation on ncredir@codfw - T351069
  • 09:53 arnaudb@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 10%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P53969 and previous config saved to /var/cache/conftool/dbconfig/20231130-095325-arnaudb.json
  • 09:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ncredir3003.esams.wmnet
  • 09:44 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ncredir3003.esams.wmnet
  • 09:38 arnaudb@cumin1001: dbctl commit (dc=all): 'es2033 is depooled', diff saved to https://phabricator.wikimedia.org/P53968 and previous config saved to /var/cache/conftool/dbconfig/20231130-093814-arnaudb.json
  • 09:37 arnaudb@cumin1001: dbctl commit (dc=all): 'change es2 master to es2026 as es2033 is rebooting', diff saved to https://phabricator.wikimedia.org/P53967 and previous config saved to /var/cache/conftool/dbconfig/20231130-093740-arnaudb.json
  • 09:36 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2033.codfw.wmnet with reason: reboot
  • 09:36 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2033.codfw.wmnet with reason: reboot
  • 09:35 arnaudb@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on es2033.codfw.wmnet with reason: reboot
  • 09:35 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2033.codfw.wmnet with reason: reboot
  • 09:34 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow3003.esams.wmnet
  • 09:33 arnaudb@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on es2033.codfw.wmnet with reason: reboot
  • 09:33 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2033.codfw.wmnet with reason: reboot
  • 09:29 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netflow3003.esams.wmnet
  • 09:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1025.eqiad.wmnet with OS bookworm
  • 09:19 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host druid1010.eqiad.wmnet with OS bullseye
  • 09:15 hashar@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.42.0-wmf.7 refs T350083
  • 09:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1025.eqiad.wmnet with reason: host reimage
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 100%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53966 and previous config saved to /var/cache/conftool/dbconfig/20231130-090242-root.json
  • 08:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow6001.drmrs.wmnet
  • 08:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1025.eqiad.wmnet with reason: host reimage
  • 08:55 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1010.eqiad.wmnet with reason: host reimage
  • 08:54 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netflow6001.drmrs.wmnet
  • 08:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM install6002.wikimedia.org
  • 08:52 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1010.eqiad.wmnet with reason: host reimage
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 75%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53965 and previous config saved to /var/cache/conftool/dbconfig/20231130-084737-root.json
  • 08:47 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM install6002.wikimedia.org
  • 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1126 from dbctl T352362', diff saved to https://phabricator.wikimedia.org/P53964 and previous config saved to /var/cache/conftool/dbconfig/20231130-084655-marostegui.json
  • 08:45 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1025.eqiad.wmnet with OS bookworm
  • 08:44 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 T352362', diff saved to https://phabricator.wikimedia.org/P53963 and previous config saved to /var/cache/conftool/dbconfig/20231130-084015-root.json
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 50%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53962 and previous config saved to /var/cache/conftool/dbconfig/20231130-083232-root.json
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53961 and previous config saved to /var/cache/conftool/dbconfig/20231130-083231-root.json
  • 08:28 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host druid1010.eqiad.wmnet with OS bullseye
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 25%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53960 and previous config saved to /var/cache/conftool/dbconfig/20231130-081727-root.json
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 25%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53959 and previous config saved to /var/cache/conftool/dbconfig/20231130-081726-root.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 10%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53958 and previous config saved to /var/cache/conftool/dbconfig/20231130-080222-root.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 10%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53957 and previous config saved to /var/cache/conftool/dbconfig/20231130-080220-root.json
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 5%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53956 and previous config saved to /var/cache/conftool/dbconfig/20231130-074717-root.json
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 5%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53955 and previous config saved to /var/cache/conftool/dbconfig/20231130-074715-root.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 1%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53954 and previous config saved to /var/cache/conftool/dbconfig/20231130-073212-root.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 1%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53953 and previous config saved to /var/cache/conftool/dbconfig/20231130-073210-root.json
  • 07:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1210.eqiad.wmnet with OS bookworm
  • 07:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1126.eqiad.wmnet with OS bookworm
  • 06:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1210.eqiad.wmnet with reason: host reimage
  • 06:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1126.eqiad.wmnet with reason: host reimage
  • 06:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1210.eqiad.wmnet with reason: host reimage
  • 06:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1126.eqiad.wmnet with reason: host reimage
  • 06:45 kart_: Updated Apertium to 2023-11-30-061450-production (T270060)
  • 06:44 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/apertium: apply
  • 06:44 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/apertium: apply
  • 06:43 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
  • 06:42 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/apertium: apply
  • 06:40 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/apertium: apply
  • 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/apertium: apply
  • 06:36 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1126.eqiad.wmnet with OS bookworm
  • 06:36 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1210.eqiad.wmnet with OS bookworm
  • 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1210 T351283', diff saved to https://phabricator.wikimedia.org/P53952 and previous config saved to /var/cache/conftool/dbconfig/20231130-063317-root.json
  • 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 T351283', diff saved to https://phabricator.wikimedia.org/P53951 and previous config saved to /var/cache/conftool/dbconfig/20231130-063258-root.json
  • 06:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1159.eqiad.wmnet with OS bookworm
  • 06:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1159.eqiad.wmnet with reason: host reimage
  • 06:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1159.eqiad.wmnet with reason: host reimage
  • 05:52 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1159.eqiad.wmnet with OS bookworm
  • 05:47 marostegui: Failover m3 from db1159 to db1119 - T352149
  • 05:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2134,2160].codfw.wmnet,db[1119,1159,1217].eqiad.wmnet with reason: m3 master switchover T352149
  • 05:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2134,2160].codfw.wmnet,db[1119,1159,1217].eqiad.wmnet with reason: m3 master switchover T352149
  • 02:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2060.codfw.wmnet with OS bullseye
  • 02:49 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:47 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2059.codfw.wmnet with OS bullseye
  • 02:43 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:42 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2060.codfw.wmnet with reason: host reimage
  • 02:26 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2060.codfw.wmnet with reason: host reimage
  • 02:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2059.codfw.wmnet with reason: host reimage
  • 02:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2058.codfw.wmnet with OS bullseye
  • 02:19 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:18 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2059.codfw.wmnet with reason: host reimage
  • 02:14 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:09 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2060.codfw.wmnet with OS bullseye
  • 02:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2057.codfw.wmnet with OS bullseye
  • 02:07 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:04 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:56 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2059.codfw.wmnet with OS bullseye
  • 01:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2058.codfw.wmnet with reason: host reimage
  • 01:52 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2058.codfw.wmnet with reason: host reimage
  • 01:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2057.codfw.wmnet with reason: host reimage
  • 01:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2057.codfw.wmnet with reason: host reimage
  • 01:35 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2058.codfw.wmnet with OS bullseye
  • 01:25 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2057.codfw.wmnet with OS bullseye
  • 00:24 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2034.codfw.wmnet with OS bullseye
  • 00:24 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2033.codfw.wmnet with OS bullseye
  • 00:23 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:22 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:09 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2034.codfw.wmnet with reason: host reimage
  • 00:01 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2034.codfw.wmnet with reason: host reimage

2023-11-29

  • 23:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2033.codfw.wmnet with reason: host reimage
  • 23:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2091.codfw.wmnet with OS bookworm
  • 23:45 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2033.codfw.wmnet with reason: host reimage
  • 23:45 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:43 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2034.codfw.wmnet with OS bullseye
  • 23:41 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ganeti2034.codfw.wmnet with OS bullseye
  • 23:40 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2034.codfw.wmnet with OS bullseye
  • 23:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2091.codfw.wmnet with reason: host reimage
  • 23:24 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2091.codfw.wmnet with reason: host reimage
  • 23:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2033.codfw.wmnet with OS bullseye
  • 23:15 ladsgroup@deploy2002: Finished scap: Backport for Add virtual domain for botpasswords (T351559) (duration: 09m 28s)
  • 23:08 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 23:07 ladsgroup@deploy2002: ladsgroup: Backport for Add virtual domain for botpasswords (T351559) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:06 ladsgroup@deploy2002: Started scap: Backport for Add virtual domain for botpasswords (T351559)
  • 23:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2090.codfw.wmnet with OS bookworm
  • 23:05 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:01 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:56 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2091.codfw.wmnet with OS bookworm
  • 22:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2090.codfw.wmnet with reason: host reimage
  • 22:40 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2090.codfw.wmnet with reason: host reimage
  • 22:39 cstone: payments-wiki upgraded from 958cacac to 7feabffe
  • 22:31 eileen: civicrm upgraded from f7cdc727 to 83816165
  • 22:28 tgr@deploy2002: Finished scap: Backport for mobile: Remove $wgMobileUrlTemplate (duration: 20m 53s)
  • 22:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2090.codfw.wmnet with OS bookworm
  • 22:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2089.codfw.wmnet with OS bookworm
  • 22:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:19 tgr@deploy2002: tgr: Continuing with sync
  • 22:19 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:10 inflatador: bking@cumin2002 running puppet against cp hosts to apply 978134
  • 22:08 tgr@deploy2002: tgr: Backport for mobile: Remove $wgMobileUrlTemplate synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:07 tgr@deploy2002: Started scap: Backport for mobile: Remove $wgMobileUrlTemplate
  • 22:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2089.codfw.wmnet with reason: host reimage
  • 22:01 tgr@deploy2002: Finished scap: Backport for Update coverage of Reader Demographics 2 surveys (T344393), Fix incorrect client-pref-pinned classes when client pref feature is disabled (T351141 T352257) (duration: 10m 35s)
  • 21:58 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2089.codfw.wmnet with reason: host reimage
  • 21:55 tgr@deploy2002: dani and tgr and jdlrobson: Continuing with sync
  • 21:52 tgr@deploy2002: dani and tgr and jdlrobson: Backport for Update coverage of Reader Demographics 2 surveys (T344393), Fix incorrect client-pref-pinned classes when client pref feature is disabled (T351141 T352257) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:50 tgr@deploy2002: Started scap: Backport for Update coverage of Reader Demographics 2 surveys (T344393), Fix incorrect client-pref-pinned classes when client pref feature is disabled (T351141 T352257)
  • 21:49 tgr@deploy2002: Backport cancelled.
  • 21:47 eileen: civicrm upgraded from 456b4805 to f7cdc727
  • 21:40 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2089.codfw.wmnet with OS bookworm
  • 21:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2088.codfw.wmnet with OS bookworm
  • 21:37 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:34 tgr@deploy2002: Finished scap: Backport for Deploy Annual Plan Core Metrics survey (T351353) (duration: 13m 47s)
  • 21:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:27 tgr@deploy2002: tgr and dani: Continuing with sync
  • 21:21 tgr@deploy2002: tgr and dani: Backport for Deploy Annual Plan Core Metrics survey (T351353) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:20 tgr@deploy2002: Started scap: Backport for Deploy Annual Plan Core Metrics survey (T351353)
  • 21:17 tgr@deploy2002: Finished scap: Backport for Deploy Vector 2022 skin to next set of sister projects (T352074) (duration: 10m 18s)
  • 21:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2088.codfw.wmnet with reason: host reimage
  • 21:12 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2088.codfw.wmnet with reason: host reimage
  • 21:11 tgr@deploy2002: tgr and jdrewniak: Continuing with sync
  • 21:08 tgr@deploy2002: tgr and jdrewniak: Backport for Deploy Vector 2022 skin to next set of sister projects (T352074) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:07 tgr@deploy2002: Started scap: Backport for Deploy Vector 2022 skin to next set of sister projects (T352074)
  • 21:03 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 21:02 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 21:02 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 21:02 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 21:01 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 21:00 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 21:00 sukhe: dummy authdns-update
  • 20:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2088.codfw.wmnet with OS bookworm
  • 20:53 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 20:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2087.codfw.wmnet with OS bookworm
  • 20:53 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:51 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 20:50 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 20:49 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 20:48 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 20:47 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 20:45 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logging-hd2003.codfw.wmnet with OS bullseye
  • 20:30 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:29 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:25 sukhe: [correction] sudo cumin -b1 -s60 "A:dns-rec and not P{dns6001*}" "enable-puppet 'do not enable' && run-puppet-agent": T347054
  • 20:25 sukhe: sudo cumin -s1 -b60 "A:dns-rec and not P{dns6001*}" "enable-puppet 'do not enable' && run-puppet-agent": T347054
  • 20:24 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2087.codfw.wmnet with reason: host reimage
  • 20:22 sukhe: dns6001: running dummy authdns-update
  • 20:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2087.codfw.wmnet with reason: host reimage
  • 20:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logging-hd2003.codfw.wmnet with reason: host reimage
  • 20:04 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logging-hd2003.codfw.wmnet with reason: host reimage
  • 20:02 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2087.codfw.wmnet with OS bookworm
  • 19:39 sukhe: running authdns-update from dns6001
  • 19:26 sukhe: disable puppet on A:dns-rec to roll out CR 977101: T347054
  • 19:21 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 19:21 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 19:18 ejegg: payments-wiki upgraded from 44a41216 to 958cacac
  • 19:17 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 19:07 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 19:00 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 18:59 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 18:59 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 18:58 sukhe: re-enable Puppet on A:dns-rec
  • 18:58 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 18:57 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:56 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:34 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logging-hd2003.codfw.wmnet with OS bullseye
  • 18:31 sukhe: disable puppet on A:dns-rec to roll out CR 976254: T347054
  • 17:50 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host schema2004.codfw.wmnet with OS bookworm
  • 17:47 bking@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 17:47 bking@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 17:43 bking@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 17:43 bking@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 17:41 sukhe: [finished] running dummy authdns-update, all 14 hosts affected
  • 17:40 sukhe: running dummy authdns-update
  • 17:35 sukhe: A:dns-rec: force run-puppet-agent
  • 17:35 sukhe: re-enable Puppet on A:dns-rec
  • 17:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on schema2004.codfw.wmnet with reason: host reimage
  • 17:26 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on schema2004.codfw.wmnet with reason: host reimage
  • 17:22 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host logging-hd2003.codfw.wmnet with OS bullseye
  • 17:17 cdanis: depooling cp2029 for some manual testing
  • 17:14 sukhe: disable puppet on A:dns-rec to roll out CR 975843: T347054
  • 17:10 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host schema2004.codfw.wmnet with OS bookworm
  • 16:55 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 16:54 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 16:54 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 16:53 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 16:53 sukhe: sudo confctl --object-type dnsbox select 'dc=.*' set/pooled=yes T347054
  • 16:52 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:51 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:48 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 16:46 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 16:46 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 16:45 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 16:45 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:44 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:44 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:43 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:39 sukhe: confctl --object-type dnsbox select 'name=<host>' set/ip=<ip>
  • 16:37 bking@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 16:36 bking@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 16:32 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 16:32 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 16:28 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 16:28 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host lvs4010.ulsfo.wmnet
  • 16:28 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 16:21 ejegg: fundraising civicrm upgraded from efa3ea29 to 456b4805
  • 16:18 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host lvs4010.ulsfo.wmnet
  • 16:17 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host lvs4009.ulsfo.wmnet
  • 16:16 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host schema2003.codfw.wmnet with OS bookworm
  • 16:13 elukey: restart pyrra-filesystem on titan*
  • 16:13 elukey: reload all thanos-rule daemons on titan* to pick up new Pyrra Lift Wing rules
  • 16:13 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 16:12 bking@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 16:09 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1171.eqiad.wmnet with OS bullseye
  • 16:08 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:08 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:08 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host lvs4009.ulsfo.wmnet
  • 16:08 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1172.eqiad.wmnet with OS bullseye
  • 16:07 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:07 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:07 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host lvs4008.ulsfo.wmnet
  • 16:07 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1162.eqiad.wmnet with OS bullseye
  • 16:06 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:06 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1161.eqiad.wmnet with OS bullseye
  • 16:05 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:05 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 16:04 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:04 bking@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 16:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on schema2003.codfw.wmnet with reason: host reimage
  • 15:59 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 15:58 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on schema2003.codfw.wmnet with reason: host reimage
  • 15:56 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host lvs4008.ulsfo.wmnet
  • 15:54 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:53 bking@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:52 dancy@deploy2002: Installing scap version "4.64.0" for 570 hosts
  • 15:51 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host dns2004.wikimedia.org
  • 15:50 dancy@deploy2002: Installing scap version "4.64.0" for 570 hosts
  • 15:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logging-hd2002.codfw.wmnet with OS bullseye
  • 15:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:47 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1172.eqiad.wmnet with reason: host reimage
  • 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2096']
  • 15:45 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2096']
  • 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2096.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1171.eqiad.wmnet with reason: host reimage
  • 15:44 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2096.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:42 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2096.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:42 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1162.eqiad.wmnet with reason: host reimage
  • 15:40 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1172.eqiad.wmnet with reason: host reimage
  • 15:39 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:39 bking@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:39 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1161.eqiad.wmnet with reason: host reimage
  • 15:38 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1171.eqiad.wmnet with reason: host reimage
  • 15:38 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1162.eqiad.wmnet with reason: host reimage
  • 15:37 bking@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:36 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:36 bking@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:36 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1161.eqiad.wmnet with reason: host reimage
  • 15:35 bking@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:35 bking@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:35 bking@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:34 bking@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:34 bking@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:34 bking@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:33 bblack: cp3066 - all back to normal ops
  • 15:31 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1172']
  • 15:30 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host schema2003.codfw.wmnet with OS bookworm
  • 15:28 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1171']
  • 15:28 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1162']
  • 15:27 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host dns2004.wikimedia.org
  • 15:26 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:26 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1161']
  • 15:26 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host dns1004.wikimedia.org
  • 15:23 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1171']
  • 15:22 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1171']
  • 15:22 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1171']
  • 15:22 bblack: cp3066 - depool temporarily, log pipe debugging, etc
  • 15:22 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1172']
  • 15:21 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logging-hd2001.codfw.wmnet with OS bullseye
  • 15:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:21 bking@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:21 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1162']
  • 15:20 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint2002:~$ printf '%s\n' https://en.wikipedia.org/static/images/mobile/copyright/{wikibooks,wikivoyage}-{tagline,wordmark}-he.svg | mwscript purgeList enwiki # T351913, T351981
  • 15:20 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1161']
  • 15:20 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for hewikivoyage: update wordmark (T351981), hewikibooks: update wordmark and tagline (T351913) (duration: 09m 10s)
  • 15:17 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host dns1004.wikimedia.org
  • 15:15 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host schema1004.eqiad.wmnet with OS bookworm
  • 15:13 lucaswerkmeister-wmde@deploy2002: anzx and lucaswerkmeister-wmde: Continuing with sync
  • 15:12 lucaswerkmeister-wmde@deploy2002: anzx and lucaswerkmeister-wmde: Backport for hewikivoyage: update wordmark (T351981), hewikibooks: update wordmark and tagline (T351913) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:10 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for hewikivoyage: update wordmark (T351981), hewikibooks: update wordmark and tagline (T351913)
  • 15:09 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logging-hd2003.codfw.wmnet with OS bullseye
  • 15:09 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow5002.eqsin.wmnet
  • 15:05 bblack: cp4052 - back to normal operations
  • 15:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logging-hd2002.codfw.wmnet with reason: host reimage
  • 15:04 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1172.eqiad.wmnet with OS bullseye
  • 15:03 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1162.eqiad.wmnet with OS bullseye
  • 15:03 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1161.eqiad.wmnet with OS bullseye
  • 15:03 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1171.eqiad.wmnet with OS bullseye
  • 15:02 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on schema1004.eqiad.wmnet with reason: host reimage
  • 15:02 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logging-hd2002.codfw.wmnet with reason: host reimage
  • 15:00 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netflow5002.eqsin.wmnet
  • 14:56 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on schema1004.eqiad.wmnet with reason: host reimage
  • 14:55 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:55 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Configure wiki-highlights experiment stream (T348613) (duration: 42m 58s)
  • 14:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logging-hd2001.codfw.wmnet with reason: host reimage
  • 14:45 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logging-hd2001.codfw.wmnet with reason: host reimage
  • 14:43 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host schema1004.eqiad.wmnet with OS bookworm
  • 14:39 bblack: cp4052 - depool and disable puppet agent, more pipe debug
  • 14:38 lucaswerkmeister-wmde@deploy2002: sbisson and lucaswerkmeister-wmde: Continuing with sync
  • 14:38 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host schema1003.eqiad.wmnet with OS bookworm
  • 14:36 lucaswerkmeister-wmde@deploy2002: sbisson and lucaswerkmeister-wmde: Backport for Configure wiki-highlights experiment stream (T348613) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:34 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logging-hd2002.codfw.wmnet with OS bullseye
  • 14:24 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on schema1003.eqiad.wmnet with reason: host reimage
  • 14:21 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on schema1003.eqiad.wmnet with reason: host reimage
  • 14:17 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logging-hd2001.codfw.wmnet with OS bullseye
  • 14:15 elukey: reload thanos-rule on titan[12]001 to pick up new pyrra rec rules
  • 14:12 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Configure wiki-highlights experiment stream (T348613)
  • 14:10 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host schema1003.eqiad.wmnet with OS bookworm
  • 13:42 moritzm: installing tiff security updates
  • 13:33 jbond@cumin1001: END (PASS) - Cookbook sre.swift.audit-labels (exit_code=0) for host ms-be[2044-2073].codfw.wmnet,ms-be[1044-1075].eqiad.wmnet
  • 13:33 jbond@cumin1001: START - Cookbook sre.swift.audit-labels for host ms-be[2044-2073].codfw.wmnet,ms-be[1044-1075].eqiad.wmnet
  • 13:30 jbond@cumin1001: END (FAIL) - Cookbook sre.swift.audit-labels (exit_code=99) for host ms-be[2044-2073].codfw.wmnet,ms-be[1044-1075].eqiad.wmnet
  • 13:30 jbond@cumin1001: START - Cookbook sre.swift.audit-labels for host ms-be[2044-2073].codfw.wmnet,ms-be[1044-1075].eqiad.wmnet
  • 13:09 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:09 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 13:09 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow4002.ulsfo.wmnet
  • 13:05 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on netbox1002.eqiad.wmnet with reason: Restoring DB from backup on netboxdb1002
  • 13:05 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on netbox1002.eqiad.wmnet with reason: Restoring DB from backup on netboxdb1002
  • 13:01 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 13:01 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:00 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netflow4002.ulsfo.wmnet
  • 12:58 topranks: restoring DB snapshot from 11:37 UTC to netboxdb1002
  • 12:52 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:52 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 12:46 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:44 hashar@deploy2002: Finished deploy [gerrit/gerrit@6b23c27]: Verify scap deployment after changing the scap user from gerrit2 to gerrit-deploy - T317412 (duration: 00m 07s)
  • 12:43 hashar@deploy2002: Started deploy [gerrit/gerrit@6b23c27]: Verify scap deployment after changing the scap user from gerrit2 to gerrit-deploy - T317412
  • 12:36 hashar@deploy2002: Finished deploy [gerrit/gerrit@6b23c27]: Verify scap deployment after changing the scap user from gerrit2 to gerrit-deploy - T317412 (duration: 00m 06s)
  • 12:35 hashar@deploy2002: Started deploy [gerrit/gerrit@6b23c27]: Verify scap deployment after changing the scap user from gerrit2 to gerrit-deploy - T317412
  • 12:35 hashar@deploy2002: Finished deploy [gervert/deploy@ca6bba0]: Verify scap deployment after changing the scap user from gerrit2 to gerrit-deploy - T317412 (duration: 00m 12s)
  • 12:35 hashar@deploy2002: Started deploy [gervert/deploy@ca6bba0]: Verify scap deployment after changing the scap user from gerrit2 to gerrit-deploy - T317412
  • 12:25 vgutierrez: rolling restart of pybal on lvs4008 and lvs4010, effectively enabling IPIP encapsulation for ncredir@ulsfo - T351069
  • 12:22 hashar@deploy2002: Finished deploy [gerrit/gerrit@a087269]: Verify scap deployment after changing the scap user from gerrit2 to gerrit-deploy - T317412 (duration: 00m 15s)
  • 12:22 hashar@deploy2002: Started deploy [gerrit/gerrit@a087269]: Verify scap deployment after changing the scap user from gerrit2 to gerrit-deploy - T317412
  • 12:06 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp[1075-1090].eqiad.wmnet
  • 12:06 fabfur@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:05 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp[1075-1090].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - fabfur@cumin1001"
  • 12:05 klausman@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 12:04 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp[1075-1090].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - fabfur@cumin1001"
  • 12:02 hashar: Disabled Puppet agent on gerrit1003 and gerrit2002 to roll https://gerrit.wikimedia.org/r/844998 which requires some manual steps | T317412
  • 11:26 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 11:26 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 11:23 fabfur@cumin1001: START - Cookbook sre.dns.netbox
  • 11:21 vgutierrez: upload tcp-mss-clamper 0.3+deb12u1 to apt.wm.o (bookworm) - T352249
  • 11:15 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:14 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:13 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:13 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:12 btullis: re-enabled all DAGs on all airflow instances after airflow upgrade to 2.7.3
  • 10:57 vgutierrez: upload ipip-multiqueue-optimizer 0.3+deb11u1 to apt.wm.o (bullseye) - T352249
  • 10:56 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 10:56 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
  • 10:53 klausman@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 10:51 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 10:51 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 10:50 klausman@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 10:49 klausman@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 10:37 fabfur@cumin1001: START - Cookbook sre.hosts.decommission for hosts cp[1075-1090].eqiad.wmnet
  • 10:37 btullis: pausing all active dags on all airflow instances
  • 10:36 fabfur: decommissioning cp1075-1090 (T352253)
  • 10:10 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:10 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 09:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1025.eqiad.wmnet with OS bookworm
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 100%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53938 and previous config saved to /var/cache/conftool/dbconfig/20231129-092808-root.json
  • 09:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1025.eqiad.wmnet with reason: host reimage
  • 09:20 hashar@deploy2002: Synchronized php: group1 wikis to 1.42.0-wmf.7 refs T350083 (duration: 07m 23s)
  • 09:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1025.eqiad.wmnet with reason: host reimage
  • 09:13 hashar@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.42.0-wmf.7 refs T350083
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 75%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53937 and previous config saved to /var/cache/conftool/dbconfig/20231129-091303-root.json
  • 09:04 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1025.eqiad.wmnet with OS bookworm
  • 09:02 hashar@deploy2002: Finished scap: Backport for zghwiki: add logos (T350241) (duration: 09m 39s)
  • 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 50%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53936 and previous config saved to /var/cache/conftool/dbconfig/20231129-085758-root.json
  • 08:55 hashar@deploy2002: hashar and anzx: Continuing with sync
  • 08:54 marostegui: Failover m1-master from dbproxy1022 to dbproxy1024 T351864
  • 08:53 hashar@deploy2002: hashar and anzx: Backport for zghwiki: add logos (T350241) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:52 hashar@deploy2002: Started scap: Backport for zghwiki: add logos (T350241)
  • 08:48 hashar@deploy2002: Finished scap: Backport for Enable VisualEditor in the Appendix namespace on enwiktionary (T350926) (duration: 10m 10s)
  • 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 25%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53935 and previous config saved to /var/cache/conftool/dbconfig/20231129-084253-root.json
  • 08:41 hashar@deploy2002: hashar and anzx: Continuing with sync
  • 08:39 hashar@deploy2002: hashar and anzx: Backport for Enable VisualEditor in the Appendix namespace on enwiktionary (T350926) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:38 hashar@deploy2002: Started scap: Backport for Enable VisualEditor in the Appendix namespace on enwiktionary (T350926)
  • 08:33 marostegui: Drop oathauth_users from centralauth T348693
  • 08:28 marostegui: Drop oathauth_users from fishbowl.dblist T348693
  • 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 10%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53934 and previous config saved to /var/cache/conftool/dbconfig/20231129-082748-root.json
  • 08:22 marostegui: Drop oathauth_users from private.dblist T348693
  • 08:19 marostegui@deploy2002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc2" (duration: 08m 01s)
  • 08:13 marostegui@deploy2002: marostegui: Continuing with sync
  • 08:12 marostegui@deploy2002: marostegui: Backport for Revert "ProductionServices.php: Promote pc1014 to pc2" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 5%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53933 and previous config saved to /var/cache/conftool/dbconfig/20231129-081243-root.json
  • 08:11 marostegui@deploy2002: Started scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc2"
  • 08:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1012.eqiad.wmnet with OS bookworm
  • 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 1%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53932 and previous config saved to /var/cache/conftool/dbconfig/20231129-075738-root.json
  • 07:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1027.eqiad.wmnet with OS bookworm
  • 07:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1012.eqiad.wmnet with reason: host reimage
  • 07:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1027.eqiad.wmnet with reason: host reimage
  • 07:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1012.eqiad.wmnet with reason: host reimage
  • 07:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on es1027.eqiad.wmnet with reason: host reimage
  • 07:26 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc1012.eqiad.wmnet with OS bookworm
  • 07:25 marostegui@deploy2002: Finished scap: Backport for ProductionServices.php: Promote pc1014 to pc2 (T351620) (duration: 09m 25s)
  • 07:24 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es1027.eqiad.wmnet with OS bookworm
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1027 T351916', diff saved to https://phabricator.wikimedia.org/P53931 and previous config saved to /var/cache/conftool/dbconfig/20231129-072306-root.json
  • 07:18 marostegui@deploy2002: marostegui: Continuing with sync
  • 07:17 marostegui@deploy2002: marostegui: Backport for ProductionServices.php: Promote pc1014 to pc2 (T351620) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:15 marostegui@deploy2002: Started scap: Backport for ProductionServices.php: Promote pc1014 to pc2 (T351620)
  • 07:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc2012.codfw.wmnet,pc[1012,1014].eqiad.wmnet with reason: Switch
  • 07:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2012.codfw.wmnet,pc[1012,1014].eqiad.wmnet with reason: Switch
  • 04:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1172.eqiad.wmnet with OS bullseye
  • 04:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1171.eqiad.wmnet with OS bullseye
  • 04:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1162.eqiad.wmnet with OS bullseye
  • 04:32 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1161.eqiad.wmnet with OS bullseye
  • 03:13 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1172.eqiad.wmnet with OS bullseye
  • 03:13 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1171.eqiad.wmnet with OS bullseye
  • 03:12 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1162.eqiad.wmnet with OS bullseye
  • 03:12 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1161.eqiad.wmnet with OS bullseye
  • 03:11 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1164.eqiad.wmnet with OS bullseye
  • 03:11 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 03:10 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 03:08 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1168.eqiad.wmnet with OS bullseye
  • 03:08 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 03:06 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 03:05 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1172.eqiad.wmnet with OS bullseye
  • 03:05 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1171.eqiad.wmnet with OS bullseye
  • 02:58 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1161.eqiad.wmnet with OS bullseye
  • 02:58 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1162.eqiad.wmnet with OS bullseye
  • 02:48 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1164.eqiad.wmnet with reason: host reimage
  • 02:45 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1168.eqiad.wmnet with reason: host reimage
  • 02:43 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1164.eqiad.wmnet with reason: host reimage
  • 02:42 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1168.eqiad.wmnet with reason: host reimage
  • 02:33 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1170.eqiad.wmnet with OS bullseye
  • 02:33 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 02:32 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 02:31 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1174.eqiad.wmnet with OS bullseye
  • 02:31 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 02:30 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1167.eqiad.wmnet with OS bullseye
  • 02:30 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 02:30 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 02:28 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1173.eqiad.wmnet with OS bullseye
  • 02:28 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 02:28 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 02:28 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1168.eqiad.wmnet with OS bullseye
  • 02:28 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1164.eqiad.wmnet with OS bullseye
  • 02:27 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 02:27 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1175.eqiad.wmnet with OS bullseye
  • 02:27 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 02:26 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 02:26 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1165.eqiad.wmnet with OS bullseye
  • 02:25 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 02:24 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 02:24 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1169.eqiad.wmnet with OS bullseye
  • 02:24 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 02:23 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 02:21 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1166.eqiad.wmnet with OS bullseye
  • 02:21 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 02:20 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 02:18 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1168.eqiad.wmnet with OS bullseye
  • 02:17 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1163.eqiad.wmnet with OS bullseye
  • 02:17 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 02:15 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 02:12 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1164.eqiad.wmnet with OS bullseye
  • 02:10 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on an-worker1170.eqiad.wmnet with reason: host reimage
  • 02:10 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on an-worker1168.eqiad.wmnet with reason: host reimage
  • 02:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on an-worker1174.eqiad.wmnet with reason: host reimage
  • 02:09 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1175.eqiad.wmnet with reason: host reimage
  • 02:05 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on an-worker1167.eqiad.wmnet with reason: host reimage
  • 02:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1173.eqiad.wmnet with reason: host reimage
  • 02:04 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on an-worker1164.eqiad.wmnet with reason: host reimage
  • 02:03 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on an-worker1165.eqiad.wmnet with reason: host reimage
  • 02:01 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1169.eqiad.wmnet with reason: host reimage
  • 02:00 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1170.eqiad.wmnet with reason: host reimage
  • 02:00 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1168.eqiad.wmnet with reason: host reimage
  • 01:59 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1175.eqiad.wmnet with reason: host reimage
  • 01:59 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1174.eqiad.wmnet with reason: host reimage
  • 01:58 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1173.eqiad.wmnet with reason: host reimage
  • 01:58 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1166.eqiad.wmnet with reason: host reimage
  • 01:57 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1169.eqiad.wmnet with reason: host reimage
  • 01:55 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1163.eqiad.wmnet with reason: host reimage
  • 01:55 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1167.eqiad.wmnet with reason: host reimage
  • 01:55 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1166.eqiad.wmnet with reason: host reimage
  • 01:54 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1164.eqiad.wmnet with reason: host reimage
  • 01:54 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1165.eqiad.wmnet with reason: host reimage
  • 01:52 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1163.eqiad.wmnet with reason: host reimage
  • 01:46 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1175.eqiad.wmnet with OS bullseye
  • 01:46 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1174.eqiad.wmnet with OS bullseye
  • 01:45 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1173.eqiad.wmnet with OS bullseye
  • 01:45 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1172.eqiad.wmnet with OS bullseye
  • 01:45 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1171.eqiad.wmnet with OS bullseye
  • 01:43 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1170.eqiad.wmnet with OS bullseye
  • 01:43 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1169.eqiad.wmnet with OS bullseye
  • 01:42 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1167.eqiad.wmnet with OS bullseye
  • 01:41 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1166.eqiad.wmnet with OS bullseye
  • 01:40 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1164.eqiad.wmnet with OS bullseye
  • 01:40 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1165.eqiad.wmnet with OS bullseye
  • 01:38 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1161.eqiad.wmnet with OS bullseye
  • 01:38 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1163.eqiad.wmnet with OS bullseye
  • 01:38 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1162.eqiad.wmnet with OS bullseye
  • 01:08 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1161.eqiad.wmnet with OS bullseye
  • 00:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2028.codfw.wmnet with OS bullseye
  • 00:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2028.codfw.wmnet with reason: host reimage
  • 00:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2035.codfw.wmnet with OS bullseye
  • 00:35 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:33 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2028.codfw.wmnet with reason: host reimage
  • 00:31 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:27 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1160.eqiad.wmnet with OS bullseye
  • 00:27 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 00:25 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 00:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2035.codfw.wmnet with reason: host reimage
  • 00:14 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2028.codfw.wmnet with OS bullseye
  • 00:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2034.codfw.wmnet with OS bullseye
  • 00:12 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:11 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:11 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2035.codfw.wmnet with reason: host reimage
  • 00:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1159.eqiad.wmnet with OS bullseye
  • 00:04 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 00:03 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1160.eqiad.wmnet with reason: host reimage
  • 00:02 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 00:00 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1160.eqiad.wmnet with reason: host reimage

2023-11-28

  • 23:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2034.codfw.wmnet with reason: host reimage
  • 23:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2033.codfw.wmnet with OS bullseye
  • 23:53 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2035.codfw.wmnet with OS bullseye
  • 23:51 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:51 bblack: cp4052 - all back to normal
  • 23:50 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2034.codfw.wmnet with reason: host reimage
  • 23:48 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1161.eqiad.wmnet with OS bullseye
  • 23:42 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1159.eqiad.wmnet with reason: host reimage
  • 23:42 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1160.eqiad.wmnet with OS bullseye
  • 23:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2032.codfw.wmnet with OS bullseye
  • 23:42 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:39 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:39 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1159.eqiad.wmnet with reason: host reimage
  • 23:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2033.codfw.wmnet with reason: host reimage
  • 23:32 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2034.codfw.wmnet with OS bullseye
  • 23:31 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2033.codfw.wmnet with reason: host reimage
  • 23:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2030.codfw.wmnet with OS bullseye
  • 23:31 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:29 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:25 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1159.eqiad.wmnet with OS bullseye
  • 23:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2032.codfw.wmnet with reason: host reimage
  • 23:19 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2032.codfw.wmnet with reason: host reimage
  • 23:15 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1158.eqiad.wmnet with OS bullseye
  • 23:15 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 23:13 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2033.codfw.wmnet with OS bullseye
  • 23:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2030.codfw.wmnet with reason: host reimage
  • 23:10 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2030.codfw.wmnet with reason: host reimage
  • 23:07 bblack: cp4052 - repool
  • 23:05 bblack: cp4052 - depool temporarily
  • 23:01 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2032.codfw.wmnet with OS bullseye
  • 22:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2030.codfw.wmnet with OS bullseye
  • 22:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2031.codfw.wmnet with OS bullseye
  • 22:51 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:49 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:33 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2030.codfw.wmnet with OS bullseye
  • 22:33 bblack: cp4052 - disabling puppet to experiment on how we gather prometheus stats from ATS...
  • 22:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2031.codfw.wmnet with reason: host reimage
  • 22:27 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2031.codfw.wmnet with reason: host reimage
  • 22:23 urbanecm@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 22:22 urbanecm@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 22:22 urbanecm@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 22:22 urbanecm@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 22:20 urbanecm@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 22:19 urbanecm@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 22:12 urbanecm@deploy2002: Finished scap: Backport for Fixes: Duplicate events for radio buttons (T352075), Fixes: Duplicate events for radio buttons (T352075), Work around Parsoid's messy handling of some extensions (T351461) (duration: 13m 02s)
  • 22:09 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2031.codfw.wmnet with OS bullseye
  • 22:04 urbanecm@deploy2002: urbanecm and ssastry and jdlrobson: Continuing with sync
  • 22:01 urbanecm@deploy2002: urbanecm and ssastry and jdlrobson: Backport for Fixes: Duplicate events for radio buttons (T352075), Fixes: Duplicate events for radio buttons (T352075), Work around Parsoid's messy handling of some extensions (T351461) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:59 urbanecm@deploy2002: Started scap: Backport for Fixes: Duplicate events for radio buttons (T352075), Fixes: Duplicate events for radio buttons (T352075), Work around Parsoid's messy handling of some extensions (T351461)
  • 21:58 urbanecm@deploy2002: Finished scap: Backport for Increase coverage of Reader Demographics 2 surveys (T344393), DefaultOutputTransform::deduplicateStyles: don't match inside an attribute (duration: 31m 09s)
  • 21:52 urbanecm@deploy2002: cscott and urbanecm and dani: Continuing with sync
  • 21:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2029.codfw.wmnet with OS bullseye
  • 21:49 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:47 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:42 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1161.eqiad.wmnet with OS bullseye
  • 21:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2029.codfw.wmnet with reason: host reimage
  • 21:29 urbanecm@deploy2002: cscott and urbanecm and dani: Backport for Increase coverage of Reader Demographics 2 surveys (T344393), DefaultOutputTransform::deduplicateStyles: don't match inside an attribute synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:27 urbanecm@deploy2002: Started scap: Backport for Increase coverage of Reader Demographics 2 surveys (T344393), DefaultOutputTransform::deduplicateStyles: don't match inside an attribute
  • 21:26 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2029.codfw.wmnet with reason: host reimage
  • 21:18 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 21:08 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2029.codfw.wmnet with OS bullseye
  • 21:07 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2029.codfw.wmnet with OS bullseye
  • 21:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1158.eqiad.wmnet with reason: host reimage
  • 21:00 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1159.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:00 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1158.eqiad.wmnet with reason: host reimage
  • 20:46 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1158.eqiad.wmnet with OS bullseye
  • 20:40 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1159.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:39 ladsgroup@deploy2002: Finished scap: Backport for Disable VipsScaler in group0 (T290759) (duration: 10m 08s)
  • 20:36 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host an-worker1158.eqiad.wmnet with OS bullseye
  • 20:32 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 20:30 ladsgroup@deploy2002: ladsgroup: Backport for Disable VipsScaler in group0 (T290759) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:29 ladsgroup@deploy2002: Started scap: Backport for Disable VipsScaler in group0 (T290759)
  • 20:21 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1161.eqiad.wmnet with OS bullseye
  • 20:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on planet2003.codfw.wmnet with reason: maintenance
  • 20:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on planet2003.codfw.wmnet with reason: maintenance
  • 20:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on planet1003.eqiad.wmnet with reason: maintenance
  • 20:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on planet1003.eqiad.wmnet with reason: maintenance
  • 20:10 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1159.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:07 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1158.eqiad.wmnet with OS bullseye
  • 20:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1173.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1172.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:02 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1169.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:02 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1171.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:02 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1170.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:57 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1165.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:57 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1165.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:55 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1165.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:52 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1173.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:52 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1172.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:52 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1167.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:52 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1168.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1171.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1170.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1169.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:49 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1166.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:43 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1161.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:43 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1162.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:39 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1168.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:38 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1167.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:35 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1166.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:35 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1165.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1164.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1163.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:26 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1162.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:26 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1163.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:26 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1161.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:26 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1164.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:25 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1162.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:25 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1161.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:16 milimetric@deploy2002: Finished deploy [analytics/refinery@72ec207] (thin): hotfix for webrequest refine (duration: 00m 07s)
  • 18:16 milimetric@deploy2002: Started deploy [analytics/refinery@72ec207] (thin): hotfix for webrequest refine
  • 18:15 milimetric@deploy2002: Finished deploy [analytics/refinery@72ec207]: hotfix for webrequest refine (duration: 08m 47s)
  • 18:06 milimetric@deploy2002: Started deploy [analytics/refinery@72ec207]: hotfix for webrequest refine
  • 17:49 klausman@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 17:14 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2012.codfw.wmnet with OS bullseye
  • 17:04 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 17:03 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 16:52 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2012.codfw.wmnet with reason: host reimage
  • 16:49 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2030.codfw.wmnet with OS bullseye
  • 16:49 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2012.codfw.wmnet with reason: host reimage
  • 16:46 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2029.codfw.wmnet with OS bullseye
  • 16:44 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase2029.codfw.wmnet with OS bullseye
  • 16:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2029.codfw.wmnet with OS bullseye
  • 16:35 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs2012.codfw.wmnet with OS bullseye
  • 16:34 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aqs2012.codfw.wmnet with OS bullseye
  • 16:23 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 16:23 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 16:17 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs2012.codfw.wmnet with OS bullseye
  • 16:17 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aqs2012.codfw.wmnet with OS bullseye
  • 16:07 moritzm: installing distro-info-data updates
  • 16:04 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs2012.codfw.wmnet with OS bullseye
  • 16:02 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs2011.codfw.wmnet
  • 16:01 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for aqs2011.codfw.wmnet
  • 15:54 moritzm: installing xen security updates
  • 15:54 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2011.codfw.wmnet with OS bullseye
  • 15:49 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2029.codfw.wmnet with OS bullseye
  • 15:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1164.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:43 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1163.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2096']
  • 15:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2004.codfw.wmnet
  • 15:37 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2004.codfw.wmnet
  • 15:32 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2011.codfw.wmnet with reason: host reimage
  • 15:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2096']
  • 15:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2096']
  • 15:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2096']
  • 15:29 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2011.codfw.wmnet with reason: host reimage
  • 15:25 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1163.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:25 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1164.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2096.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:25 moritzm: imported ganeti 3.0.2-1~deb11u1+wmf1 to apt.wikimedia.org/bullseye-wikimedia T350686
  • 15:24 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1162.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:24 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1161.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:15 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs2011.codfw.wmnet with OS bullseye
  • 15:15 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aqs2011.codfw.wmnet with OS bullseye
  • 15:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2096.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:08 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2029.codfw.wmnet with OS bullseye
  • 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:06 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 100%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53930 and previous config saved to /var/cache/conftool/dbconfig/20231128-150618-root.json
  • 15:05 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:02 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs2011.codfw.wmnet with OS bullseye
  • 14:58 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:52 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Remove partial migration of VisualEditorFeatureUse instrument (T351337) (duration: 10m 17s)
  • 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 75%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53929 and previous config saved to /var/cache/conftool/dbconfig/20231128-145113-root.json
  • 14:46 lucaswerkmeister-wmde@deploy2002: sfaci and lucaswerkmeister-wmde: Continuing with sync
  • 14:43 lucaswerkmeister-wmde@deploy2002: sfaci and lucaswerkmeister-wmde: Backport for Remove partial migration of VisualEditorFeatureUse instrument (T351337) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:41 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Remove partial migration of VisualEditorFeatureUse instrument (T351337)
  • 14:39 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Remove mediawiki.web_ui.interactions event stream (T351195) (duration: 17m 36s)
  • 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 50%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53928 and previous config saved to /var/cache/conftool/dbconfig/20231128-143608-root.json
  • 14:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and phuedx: Continuing with sync
  • 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:29 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 14:23 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and phuedx: Backport for Remove mediawiki.web_ui.interactions event stream (T351195) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:22 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Remove mediawiki.web_ui.interactions event stream (T351195)
  • 14:21 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 25%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53927 and previous config saved to /var/cache/conftool/dbconfig/20231128-142102-root.json
  • 14:19 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Remove EchoMail and EchoInteraction event streams (T344167) (duration: 10m 05s)
  • 14:13 lucaswerkmeister-wmde@deploy2002: phuedx and lucaswerkmeister-wmde: Continuing with sync
  • 14:10 volans: deploying python3-wmflib_1.2.4 fleet-wide (tested changes on all OSes)
  • 14:10 lucaswerkmeister-wmde@deploy2002: phuedx and lucaswerkmeister-wmde: Backport for Remove EchoMail and EchoInteraction event streams (T344167) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:09 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Remove EchoMail and EchoInteraction event streams (T344167)
  • 14:06 lucaswerkmeister-wmde@deploy2002: Backport cancelled.
  • 14:06 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 10%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53926 and previous config saved to /var/cache/conftool/dbconfig/20231128-140557-root.json
  • 14:05 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 14:05 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 14:04 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 14:03 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 14:02 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 14:02 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 14:02 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 5%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53925 and previous config saved to /var/cache/conftool/dbconfig/20231128-135052-root.json
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 1%: Upgrade to 10.6.16 and bookworm', diff saved to https://phabricator.wikimedia.org/P53924 and previous config saved to /var/cache/conftool/dbconfig/20231128-133547-root.json
  • 13:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2028.codfw.wmnet with OS bookworm
  • 13:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2028.codfw.wmnet with reason: host reimage
  • 13:18 volans: uploaded python3-wmflib_1.2.4 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia,bookworm-wikimedia
  • 13:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on es2028.codfw.wmnet with reason: host reimage
  • 12:58 XioNoX: re-enable sampling on cr1-esams:fpc1
  • 12:56 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host es2028.codfw.wmnet with OS bookworm
  • 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2028 T351916', diff saved to https://phabricator.wikimedia.org/P53923 and previous config saved to /var/cache/conftool/dbconfig/20231128-125235-root.json
  • 12:35 kart_: Updated Apertium to 2023-11-23-055425-production (ie Bookworm!) (T346997)
  • 12:32 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/apertium: apply
  • 12:32 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/apertium: apply
  • 12:26 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
  • 12:26 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/apertium: apply
  • 12:13 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/apertium: apply
  • 12:12 kartik@deploy2002: helmfile [staging] START helmfile.d/services/apertium: apply
  • 12:02 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 12:02 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 11:58 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 11:57 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 11:56 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 11:55 kamila@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 11:50 vgutierrez: pool ncredir4001
  • 11:42 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 11:42 kamila@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 11:41 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 11:41 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 11:33 volans@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox interface ID cr2-esams:xe-0/1/2
  • 11:33 volans@cumin1001: START - Cookbook sre.network.debug for Netbox interface ID cr2-esams:xe-0/1/2
  • 11:22 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 11:22 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 11:21 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 11:21 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 10:52 vgutierrez: depool ncredir4001
  • 10:45 vgutierrez: repool ncredir4001
  • 10:38 klausman@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 10:37 moritzm: installing lua5.3 security updates
  • 10:35 vgutierrez: depool ncredir4001
  • 10:21 vgutierrez: rolling restart of pybal on lvs4010 and lvs4008, effectively disabling IPIP encapsulation on ncredir@ulsfo - T351069
  • 10:09 vgutierrez: rolling restart of pybal on lvs4010 and lvs4008, effectively enabling IPIP encapsulation on ncredir@ulsfo - T351069
  • 10:01 sg912@deploy2002: Finished deploy [airflow-dags/analytics@0283c11]: (no justification provided) (duration: 00m 47s)
  • 10:00 sg912@deploy2002: Started deploy [airflow-dags/analytics@0283c11]: (no justification provided)
  • 09:58 moritzm: installing intel-microcode security updates
  • 09:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
  • 09:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 09:40 hashar@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.7 refs T350083
  • 09:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1024.eqiad.wmnet with OS bookworm
  • 08:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1024.eqiad.wmnet with reason: host reimage
  • 08:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1024.eqiad.wmnet with reason: host reimage
  • 08:47 hashar@deploy2002: Finished scap: Backport for Revert "Parsoid DataAccess: Stop processing extensions as top-level docs" (duration: 07m 54s)
  • 08:41 hashar@deploy2002: hashar and ssastry: Continuing with sync
  • 08:41 hashar@deploy2002: hashar and ssastry: Backport for Revert "Parsoid DataAccess: Stop processing extensions as top-level docs" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:39 hashar@deploy2002: Started scap: Backport for Revert "Parsoid DataAccess: Stop processing extensions as top-level docs"
  • 08:37 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1024.eqiad.wmnet with OS bookworm
  • 07:39 XioNoX: add RPKI ROA for 193.46.90.0/24 - T309297
  • 04:56 mwpresync@deploy2002: Pruned MediaWiki: 1.42.0-wmf.4 (duration: 02m 14s)
  • 04:53 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.42.0-wmf.7 refs T350083 (duration: 51m 11s)
  • 04:21 eileen: civicrm upgraded from f3de1778 to c2eaa50e
  • 04:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.42.0-wmf.7 refs T350083
  • 01:46 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2096.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:42 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2096.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:42 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:40 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 01:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2100']
  • 01:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2091']
  • 01:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2100']
  • 01:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2100.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:24 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2091']
  • 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2091.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2100.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2094']
  • 01:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2101.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2101.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2091.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2100.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2091.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti2034']
  • 01:13 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2100.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:13 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2091.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2101']
  • 01:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2097']
  • 01:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2100']
  • 01:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2091']
  • 01:12 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2100']
  • 01:12 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2094']
  • 01:11 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2091']
  • 01:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2096.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:09 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2034']
  • 01:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ganeti2034']
  • 01:08 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2034']
  • 01:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2034.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:07 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2101']
  • 01:07 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2101']
  • 01:07 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2101']
  • 01:06 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2097']
  • 01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2101.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2097']
  • 01:05 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2097']
  • 01:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2097.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2096.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2096.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2034.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2101.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2097.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2096.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:52 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase2029.codfw.wmnet with OS bullseye
  • 00:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2028.codfw.wmnet with OS bullseye
  • 00:38 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:36 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:33 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2029.codfw.wmnet with OS bullseye
  • 00:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2028.codfw.wmnet with reason: host reimage
  • 00:15 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2028.codfw.wmnet with reason: host reimage

2023-11-27

  • 23:47 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2028.codfw.wmnet with OS bullseye
  • 22:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on planet1002.eqiad.wmnet with reason: maintenance
  • 22:54 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on planet1002.eqiad.wmnet with reason: maintenance
  • 22:17 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1157.eqiad.wmnet with OS bullseye
  • 22:17 jhathaway@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhathaway@cumin1001"
  • 22:13 jhathaway@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhathaway@cumin1001"
  • 21:59 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1157.eqiad.wmnet with reason: host reimage
  • 21:56 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1157.eqiad.wmnet with reason: host reimage
  • 21:42 jhathaway@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1157.eqiad.wmnet with OS bullseye
  • 21:28 cjming: end of UTC late backport window
  • 21:26 cjming@deploy2002: Finished scap: Backport for ORES: Set default value of OresLiftWingAddHostHeader to true (T351703) (duration: 07m 57s)
  • 21:19 cjming@deploy2002: isaranto and cjming: Continuing with sync
  • 21:19 cjming@deploy2002: isaranto and cjming: Backport for ORES: Set default value of OresLiftWingAddHostHeader to true (T351703) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:18 cjming@deploy2002: Started scap: Backport for ORES: Set default value of OresLiftWingAddHostHeader to true (T351703)
  • 21:15 cjming@deploy2002: Finished scap: Backport for CentralAuth: Fix wikisource.org cookie handling (T351685) (duration: 11m 26s)
  • 21:09 cjming@deploy2002: cjming and tgr: Continuing with sync
  • 21:07 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-worker1157.eqiad.wmnet with OS bullseye
  • 21:05 cjming@deploy2002: cjming and tgr: Backport for CentralAuth: Fix wikisource.org cookie handling (T351685) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:04 cjming@deploy2002: Started scap: Backport for CentralAuth: Fix wikisource.org cookie handling (T351685)
  • 21:02 btullis@deploy2002: Finished deploy [airflow-dags/analytics_test@0283c11]: (no justification provided) (duration: 00m 11s)
  • 21:02 btullis@deploy2002: Started deploy [airflow-dags/analytics_test@0283c11]: (no justification provided)
  • 20:52 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1157.eqiad.wmnet with OS bullseye
  • 20:16 vgutierrez: rolling restart of pybal on lvs4010 and lvs4008 - T351069
  • 19:53 vgutierrez: restarting pybal on lvs4008 (effectively enabling IPIP encapsulation on ncredir@ulsfo) - T351069
  • 19:50 vgutierrez: restarting pybal on lvs4010 - T351069
  • 19:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on planet1003.eqiad.wmnet with reason: maintenance
  • 19:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on planet1003.eqiad.wmnet with reason: maintenance
  • 19:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on planet2003.codfw.wmnet with reason: maintenance
  • 19:20 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on planet2003.codfw.wmnet with reason: maintenance
  • 18:56 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirtlocal1003.eqiad.wmnet with OS bookworm
  • 18:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host planet1003.eqiad.wmnet with OS bookworm
  • 18:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirtlocal1003.eqiad.wmnet with reason: host reimage
  • 18:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1003.eqiad.wmnet with reason: host reimage
  • 18:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on planet1003.eqiad.wmnet with reason: host reimage
  • 18:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on planet1003.eqiad.wmnet with reason: host reimage
  • 18:11 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host planet1003.eqiad.wmnet with OS bookworm
  • 18:09 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1003.eqiad.wmnet with OS bookworm
  • 18:07 vgutierrez: restarting pybal on lvs4010 - T351069
  • 17:52 vgutierrez: upload ipip-multiqueue-optimizer 0.2 to apt.wm.o (bullseye) - T351069
  • 17:48 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirtlocal1002.eqiad.wmnet with OS bookworm
  • 17:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirtlocal1002.eqiad.wmnet with reason: host reimage
  • 17:22 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1002.eqiad.wmnet with reason: host reimage
  • 17:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti2033']
  • 17:07 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2033']
  • 17:07 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ganeti2033']
  • 17:07 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['kubernetes2058']
  • 17:07 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['kubernetes2057']
  • 17:07 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2033']
  • 17:06 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2058']
  • 17:06 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2057']
  • 17:06 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1002.eqiad.wmnet with OS bookworm
  • 16:56 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 31s)
  • 16:52 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirtlocal1001.eqiad.wmnet with OS bookworm
  • 16:50 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: dbbackups::monitoring
  • 16:50 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 07m 06s)
  • 16:49 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1157.eqiad.wmnet with OS bullseye
  • 16:43 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host an-worker1157.eqiad.wmnet
  • 16:39 pt1979@cumin1001: START - Cookbook sre.hosts.dhcp for host an-worker1157.eqiad.wmnet
  • 16:31 vgutierrez: upload tcp-mss-clamper 0.2+deb12u1 to apt.wm.o (bookworm)
  • 16:30 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: dbbackups::monitoring
  • 16:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2033.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2058.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2057.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:21 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2034.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logging-hd2003']
  • 16:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logging-hd2002']
  • 16:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logging-hd2001']
  • 16:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2109']
  • 16:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2034.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2033.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2058.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2057.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logging-hd2003']
  • 16:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logging-hd2002']
  • 16:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logging-hd2001']
  • 16:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2109']
  • 16:14 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['kubernetes2060']
  • 16:14 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['kubernetes2059']
  • 16:14 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['logging-hd2003']
  • 16:14 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['logging-hd2002']
  • 16:14 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['logging-hd2001']
  • 16:13 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2109']
  • 16:13 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2060']
  • 16:13 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2059']
  • 16:13 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logging-hd2003']
  • 16:13 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logging-hd2002']
  • 16:12 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logging-hd2001']
  • 16:12 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2109']
  • 16:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logging-hd2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logging-hd2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2109.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:11 sukhe: enable puppet and start bird on dns4003
  • 16:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2059.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logging-hd2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns4003.wikimedia.org
  • 16:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2060.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:07 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host dns4003.wikimedia.org
  • 16:07 sukhe: disable puppet and stop bird on dns4003: rebooting
  • 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: zookeeper::flink
  • 15:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2060.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2059.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host logging-hd2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host logging-hd2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host logging-hd2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2109.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2108']
  • 15:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2107']
  • 15:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2106']
  • 15:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2105']
  • 15:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2104']
  • 15:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2103']
  • 15:52 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 15:50 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 15:49 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: zookeeper::flink
  • 15:46 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2108']
  • 15:46 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2107']
  • 15:46 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2106']
  • 15:46 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2105']
  • 15:45 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2104']
  • 15:45 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2103']
  • 15:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2108']
  • 15:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2107']
  • 15:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2106']
  • 15:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2105']
  • 15:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2104']
  • 15:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2103']
  • 15:45 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2108']
  • 15:44 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2107']
  • 15:44 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2106']
  • 15:44 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2105']
  • 15:44 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2104']
  • 15:44 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2103']
  • 15:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2104.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2103.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2106.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2107.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2105.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2108.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:35 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host an-worker1157.eqiad.wmnet with OS bullseye
  • 15:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2108.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2107.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2106.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2105.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2104.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2103.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2101.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2097.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2098']
  • 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2099']
  • 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2102']
  • 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2100']
  • 15:20 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2102']
  • 15:20 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2100']
  • 15:20 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2099']
  • 15:20 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2098']
  • 15:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2102']
  • 15:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2100']
  • 15:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2099']
  • 15:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2098']
  • 15:20 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2102']
  • 15:19 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2100']
  • 15:19 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2099']
  • 15:19 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2098']
  • 15:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2100.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2099.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2102.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2098.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:14 fabfur: set `pooled=yes` on cp11.* hosts in eqiad T349244
  • 15:11 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host an-worker1160.eqiad.wmnet with OS bullseye
  • 15:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
  • 15:09 hashar: restarting CI Jenkins
  • 15:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2102.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2101.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2100.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:07 fabfur: `nfctl select name='cp10.*',service=ats-be set/pooled=inactive` (cdn and ats-be not used anymore on these hosts) T349244
  • 15:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2099.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2098.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2097.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:06 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
  • 15:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2095']
  • 15:01 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1157.eqiad.wmnet with OS bullseye
  • 15:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2096.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2093']
  • 14:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2092']
  • 14:57 urbanecm: mwmaint2002: `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/refreshUserImpactData.php --registeredWithin=1year --editedWithin=2week --hasEditsAtLeast=3 --ignoreIfUpdatedWithin=6hour --verbose --use-job-queue`
  • 14:56 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2095']
  • 14:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2095']
  • 14:56 urbanecm: mwmaint2002: /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/refreshUserImpactData.php --registeredWithin=2week --hasEditsAtLeast=3 --ignoreIfUpdatedWithin=6hour --verbose --use-job-queue
  • 14:56 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2095']
  • 14:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2094']
  • 14:55 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2094']
  • 14:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2094']
  • 14:55 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2094']
  • 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host dns4003.wikimedia.org
  • 14:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2090']
  • 14:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2091']
  • 14:54 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2091']
  • 14:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2091']
  • 14:54 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2091']
  • 14:53 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2091']
  • 14:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2089']
  • 14:53 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2091']
  • 14:53 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1160.eqiad.wmnet with OS bullseye
  • 14:53 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1175
  • 14:53 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2093']
  • 14:53 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1175
  • 14:53 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1161
  • 14:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2093']
  • 14:52 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1161
  • 14:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2094.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:52 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1159
  • 14:52 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1159
  • 14:52 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1158
  • 14:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2093']
  • 14:52 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1158
  • 14:52 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1157
  • 14:52 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1157
  • 14:52 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1160
  • 14:52 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1160
  • 14:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2088']
  • 14:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2091.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:51 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2092']
  • 14:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2092']
  • 14:51 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2092']
  • 14:51 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bookworm
  • 14:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2095.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:49 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2090']
  • 14:49 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2090']
  • 14:49 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2090']
  • 14:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2093.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2092.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:48 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2089']
  • 14:48 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2089']
  • 14:47 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2089']
  • 14:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2087']
  • 14:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2090.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:46 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host dns4003.wikimedia.org
  • 14:46 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2088']
  • 14:46 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2088']
  • 14:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2089.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:45 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2088']
  • 14:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2088.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:43 urbanecm@deploy2002: Finished scap: Backport for Enable native MathML rendering on dewiki (T350787) (duration: 09m 19s)
  • 14:43 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 14:41 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2087']
  • 14:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2087']
  • 14:40 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2087']
  • 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2096.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:38 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2095.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:38 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:38 urbanecm@deploy2002: urbanecm and physikerwelt: Continuing with sync
  • 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2094.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2087.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2093.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2092.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:35 urbanecm@deploy2002: urbanecm and physikerwelt: Backport for Enable native MathML rendering on dewiki (T350787) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2091.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:34 urbanecm@deploy2002: Started scap: Backport for Enable native MathML rendering on dewiki (T350787)
  • 14:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2090.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:34 urbanecm@deploy2002: Finished scap: Backport for bjnwikiquote: add timezone, wgSitename (T350235), dgawiki: add logos, timezone and sitename (T350229) (duration: 10m 57s)
  • 14:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2089.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2088.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:30 moritzm: installing protobuf security updates
  • 14:27 urbanecm@deploy2002: urbanecm and anzx: Continuing with sync
  • 14:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2087.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:24 urbanecm@deploy2002: urbanecm and anzx: Backport for bjnwikiquote: add timezone, wgSitename (T350235), dgawiki: add logos, timezone and sitename (T350229) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:23 urbanecm@deploy2002: Started scap: Backport for bjnwikiquote: add timezone, wgSitename (T350235), dgawiki: add logos, timezone and sitename (T350229)
  • 14:21 urbanecm@deploy2002: Finished scap: Backport for GrowthExperiments: enable frontend for 15th round of wikis (T308141), zghwiki: add timezone, wgSitename (T350241), bbcwiki: add timezone, wgSitename (T350373) (duration: 11m 23s)
  • 14:15 urbanecm@deploy2002: sgimeno and anzx and urbanecm: Continuing with sync
  • 14:11 urbanecm@deploy2002: sgimeno and anzx and urbanecm: Backport for GrowthExperiments: enable frontend for 15th round of wikis (T308141), zghwiki: add timezone, wgSitename (T350241), bbcwiki: add timezone, wgSitename (T350373) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:10 urbanecm@deploy2002: Started scap: Backport for GrowthExperiments: enable frontend for 15th round of wikis (T308141), zghwiki: add timezone, wgSitename (T350241), bbcwiki: add timezone, wgSitename (T350373)
  • 14:10 urbanecm@deploy2002: Finished scap: Backport for UserImpact: Bump VERSION to 10 (T329700) (duration: 07m 56s)
  • 14:04 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host druid1007.eqiad.wmnet with OS bullseye
  • 14:03 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 14:03 urbanecm@deploy2002: urbanecm: Backport for UserImpact: Bump VERSION to 10 (T329700) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:03 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 14:02 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 14:02 urbanecm@deploy2002: Started scap: Backport for UserImpact: Bump VERSION to 10 (T329700)
  • 13:59 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 13:46 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1007.eqiad.wmnet with reason: host reimage
  • 13:45 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 13:45 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 13:45 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 13:43 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1007.eqiad.wmnet with reason: host reimage
  • 13:38 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:38 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:38 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:37 godog: roll-restart prometheus/ops in eqiad/codfw to apply space-based retention - T351179
  • 13:32 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:31 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:30 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:26 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host druid1007.eqiad.wmnet with OS bullseye
  • 13:20 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 13:19 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 13:19 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 13:19 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 13:19 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 13:09 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 13:09 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 13:09 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 13:04 urbanecm@deploy2002: Finished scap: Backport for Compress geui_data json blobs (T351898), User impact: timezone cleanup (T329700), UserImpact: Make smaller SQL queries (T351898) (duration: 07m 37s)
  • 12:56 urbanecm@deploy2002: Started scap: Backport for Compress geui_data json blobs (T351898), User impact: timezone cleanup (T329700), UserImpact: Make smaller SQL queries (T351898)
  • 12:34 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 12:34 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 12:18 kart_: Updated cxserver to 2023-11-24-152117-production (T351932)
  • 12:15 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 12:15 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 12:14 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 12:13 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 12:08 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 12:08 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 12:08 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 12:08 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 12:08 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 12:07 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 12:06 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 12:05 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 12:05 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 12:04 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 12:04 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 12:03 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 11:58 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 11:57 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 11:55 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 11:55 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 11:54 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 11:45 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 11:45 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 11:45 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 11:36 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 11:35 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 11:35 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 11:31 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 11:30 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:30 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:29 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:29 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:02 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:02 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:01 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:00 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:00 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 10:59 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 10:58 elukey: powercycle ml-serve2007 (OEM/DIMM error registered in getsel)
  • 10:50 hnowlan@deploy2002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 10:50 hnowlan@deploy2002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 10:47 hnowlan@deploy2002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 10:47 hnowlan@deploy2002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 10:41 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 58485
  • 09:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 58485
  • 09:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 45706
  • 09:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 45706
  • 08:43 taavi@deploy2002: Finished scap: Backport for Add virtual domain mapping for OATHAuth (T348484) (duration: 07m 53s)
  • 08:41 godog: restart prometheus/k8s-staging in eqiad - T343529
  • 08:37 taavi@deploy2002: taavi: Continuing with sync
  • 08:36 taavi@deploy2002: taavi: Backport for Add virtual domain mapping for OATHAuth (T348484) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:35 taavi@deploy2002: Started scap: Backport for Add virtual domain mapping for OATHAuth (T348484)
  • 08:29 taavi@deploy2002: Finished scap: Backport for GrowthExperiments: enable AddLink frontend for 16,17th rounds of wikis (T308142 T308143) (duration: 19m 54s)
  • 08:23 taavi@deploy2002: taavi and sgimeno: Continuing with sync
  • 08:18 taavi@deploy2002: taavi and sgimeno: Backport for GrowthExperiments: enable AddLink frontend for 16,17th rounds of wikis (T308142 T308143) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:14 moritzm: installing dpkg bugfix updates on bullseye
  • 08:09 taavi@deploy2002: Started scap: Backport for GrowthExperiments: enable AddLink frontend for 16,17th rounds of wikis (T308142 T308143)
  • 07:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2134.codfw.wmnet with OS bookworm
  • 07:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2134.codfw.wmnet with reason: host reimage
  • 07:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2134.codfw.wmnet with reason: host reimage
  • 06:52 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2134.codfw.wmnet with OS bookworm
  • 06:40 marostegui: Failover m2 from db1119 to db1195 - T351863
  • 06:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: Switch
  • 06:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: Switch
  • 06:16 kart_: Update cxserver to 2023-11-20-052250-production (T341458, T349118)
  • 06:12 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 06:12 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 06:06 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 06:05 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 05:43 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 05:43 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply

2023-11-24

  • 15:01 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 15:00 jayme@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 14:58 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 14:58 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 13:41 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 13:41 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 13:41 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 13:40 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 13:33 arnaudb@cumin1001: dbctl commit (dc=all): 'db2191 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53911 and previous config saved to /var/cache/conftool/dbconfig/20231124-133300-arnaudb.json
  • 13:24 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 13:24 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 13:24 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 13:23 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 13:19 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:19 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 13:18 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:18 jayme@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 13:18 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 13:17 arnaudb@cumin1001: dbctl commit (dc=all): 'db2191 (re)pooling @ 90%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53910 and previous config saved to /var/cache/conftool/dbconfig/20231124-131755-arnaudb.json
  • 13:17 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 13:02 arnaudb@cumin1001: dbctl commit (dc=all): 'db2191 (re)pooling @ 80%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53909 and previous config saved to /var/cache/conftool/dbconfig/20231124-130250-arnaudb.json
  • 12:47 arnaudb@cumin1001: dbctl commit (dc=all): 'db2191 (re)pooling @ 70%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53908 and previous config saved to /var/cache/conftool/dbconfig/20231124-124745-arnaudb.json
  • 12:32 arnaudb@cumin1001: dbctl commit (dc=all): 'db2191 (re)pooling @ 60%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53907 and previous config saved to /var/cache/conftool/dbconfig/20231124-123240-arnaudb.json
  • 12:17 arnaudb@cumin1001: dbctl commit (dc=all): 'db2191 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53906 and previous config saved to /var/cache/conftool/dbconfig/20231124-121735-arnaudb.json
  • 12:02 arnaudb@cumin1001: dbctl commit (dc=all): 'db2191 (re)pooling @ 40%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53905 and previous config saved to /var/cache/conftool/dbconfig/20231124-120230-arnaudb.json
  • 11:47 arnaudb@cumin1001: dbctl commit (dc=all): 'db2191 (re)pooling @ 30%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53904 and previous config saved to /var/cache/conftool/dbconfig/20231124-114725-arnaudb.json
  • 11:32 arnaudb@cumin1001: dbctl commit (dc=all): 'db2191 (re)pooling @ 20%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53903 and previous config saved to /var/cache/conftool/dbconfig/20231124-113220-arnaudb.json
  • 11:17 arnaudb@cumin1001: dbctl commit (dc=all): 'db2191 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53902 and previous config saved to /var/cache/conftool/dbconfig/20231124-111715-arnaudb.json
  • 11:11 arnaudb@cumin1001: dbctl commit (dc=all): 'set es2032 back as es1 master for T344589', diff saved to https://phabricator.wikimedia.org/P53901 and previous config saved to /var/cache/conftool/dbconfig/20231124-111109-arnaudb.json
  • 11:09 arnaudb@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 100%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53900 and previous config saved to /var/cache/conftool/dbconfig/20231124-110948-arnaudb.json
  • 11:02 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:02 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:00 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:55 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 10:54 arnaudb@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 90%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53899 and previous config saved to /var/cache/conftool/dbconfig/20231124-105443-arnaudb.json
  • 10:54 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 10:53 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 10:53 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 10:47 arnaudb@cumin1001: dbctl commit (dc=all): 'db2195 (re)pooling @ 100%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53898 and previous config saved to /var/cache/conftool/dbconfig/20231124-104733-arnaudb.json
  • 10:46 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:46 arnaudb@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 100%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53897 and previous config saved to /var/cache/conftool/dbconfig/20231124-104635-arnaudb.json
  • 10:40 arnaudb@cumin1001: dbctl commit (dc=all): 'db2190 (re)pooling @ 100%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53896 and previous config saved to /var/cache/conftool/dbconfig/20231124-104023-arnaudb.json
  • 10:39 arnaudb@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 80%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53895 and previous config saved to /var/cache/conftool/dbconfig/20231124-103938-arnaudb.json
  • 10:39 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:39 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:37 arnaudb@cumin1001: dbctl commit (dc=all): 'db2189 (re)pooling @ 100%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53894 and previous config saved to /var/cache/conftool/dbconfig/20231124-103753-arnaudb.json
  • 10:37 arnaudb@cumin1001: dbctl commit (dc=all): 'db2188 (re)pooling @ 100%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53893 and previous config saved to /var/cache/conftool/dbconfig/20231124-103700-arnaudb.json
  • 10:32 arnaudb@cumin1001: dbctl commit (dc=all): 'db2195 (re)pooling @ 90%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53892 and previous config saved to /var/cache/conftool/dbconfig/20231124-103228-arnaudb.json
  • 10:31 arnaudb@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 90%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53891 and previous config saved to /var/cache/conftool/dbconfig/20231124-103130-arnaudb.json
  • 10:25 arnaudb@cumin1001: dbctl commit (dc=all): 'db2190 (re)pooling @ 90%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53890 and previous config saved to /var/cache/conftool/dbconfig/20231124-102518-arnaudb.json
  • 10:24 arnaudb@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 70%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53889 and previous config saved to /var/cache/conftool/dbconfig/20231124-102433-arnaudb.json
  • 10:22 arnaudb@cumin1001: dbctl commit (dc=all): 'db2189 (re)pooling @ 90%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53888 and previous config saved to /var/cache/conftool/dbconfig/20231124-102248-arnaudb.json
  • 10:22 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2421.codfw.wmnet with OS bullseye
  • 10:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db2188 (re)pooling @ 90%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53887 and previous config saved to /var/cache/conftool/dbconfig/20231124-102155-arnaudb.json
  • 10:20 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2431.codfw.wmnet with OS bullseye
  • 10:17 arnaudb@cumin1001: dbctl commit (dc=all): 'db2195 (re)pooling @ 80%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53886 and previous config saved to /var/cache/conftool/dbconfig/20231124-101722-arnaudb.json
  • 10:16 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1474.eqiad.wmnet with OS bullseye
  • 10:16 arnaudb@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 80%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53885 and previous config saved to /var/cache/conftool/dbconfig/20231124-101625-arnaudb.json
  • 10:16 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1475.eqiad.wmnet with OS bullseye
  • 10:15 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2425.codfw.wmnet with OS bullseye
  • 10:11 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1472.eqiad.wmnet with OS bullseye
  • 10:10 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1473.eqiad.wmnet with OS bullseye
  • 10:10 arnaudb@cumin1001: dbctl commit (dc=all): 'db2190 (re)pooling @ 80%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53884 and previous config saved to /var/cache/conftool/dbconfig/20231124-101013-arnaudb.json
  • 10:09 arnaudb@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 60%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53883 and previous config saved to /var/cache/conftool/dbconfig/20231124-100928-arnaudb.json
  • 10:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db2189 (re)pooling @ 80%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53882 and previous config saved to /var/cache/conftool/dbconfig/20231124-100743-arnaudb.json
  • 10:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db2188 (re)pooling @ 80%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53881 and previous config saved to /var/cache/conftool/dbconfig/20231124-100650-arnaudb.json
  • 10:04 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2421.codfw.wmnet with reason: host reimage
  • 10:02 arnaudb@cumin1001: dbctl commit (dc=all): 'db2195 (re)pooling @ 70%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53880 and previous config saved to /var/cache/conftool/dbconfig/20231124-100218-arnaudb.json
  • 10:01 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2431.codfw.wmnet with reason: host reimage
  • 10:01 arnaudb@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 70%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53879 and previous config saved to /var/cache/conftool/dbconfig/20231124-100120-arnaudb.json
  • 09:58 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1474.eqiad.wmnet with reason: host reimage
  • 09:56 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1475.eqiad.wmnet with reason: host reimage
  • 09:55 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2425.codfw.wmnet with reason: host reimage
  • 09:55 arnaudb@cumin1001: dbctl commit (dc=all): 'db2190 (re)pooling @ 70%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53878 and previous config saved to /var/cache/conftool/dbconfig/20231124-095508-arnaudb.json
  • 09:54 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2425.codfw.wmnet with reason: host reimage
  • 09:54 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2431.codfw.wmnet with reason: host reimage
  • 09:54 arnaudb@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 50%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53877 and previous config saved to /var/cache/conftool/dbconfig/20231124-095423-arnaudb.json
  • 09:54 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2421.codfw.wmnet with reason: host reimage
  • 09:53 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1472.eqiad.wmnet with reason: host reimage
  • 09:53 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1473.eqiad.wmnet with reason: host reimage
  • 09:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db2189 (re)pooling @ 70%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53876 and previous config saved to /var/cache/conftool/dbconfig/20231124-095238-arnaudb.json
  • 09:51 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1475.eqiad.wmnet with reason: host reimage
  • 09:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db2188 (re)pooling @ 70%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53875 and previous config saved to /var/cache/conftool/dbconfig/20231124-095145-arnaudb.json
  • 09:51 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1474.eqiad.wmnet with reason: host reimage
  • 09:50 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1473.eqiad.wmnet with reason: host reimage
  • 09:50 klausman@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:50 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1472.eqiad.wmnet with reason: host reimage
  • 09:47 arnaudb@cumin1001: dbctl commit (dc=all): 'db2195 (re)pooling @ 60%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53874 and previous config saved to /var/cache/conftool/dbconfig/20231124-094713-arnaudb.json
  • 09:46 arnaudb@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 60%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53873 and previous config saved to /var/cache/conftool/dbconfig/20231124-094614-arnaudb.json
  • 09:40 arnaudb@cumin1001: dbctl commit (dc=all): 'db2190 (re)pooling @ 60%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53872 and previous config saved to /var/cache/conftool/dbconfig/20231124-094001-arnaudb.json
  • 09:39 arnaudb@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 45%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53871 and previous config saved to /var/cache/conftool/dbconfig/20231124-093918-arnaudb.json
  • 09:39 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host mw1475.eqiad.wmnet with OS bullseye
  • 09:39 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host mw1474.eqiad.wmnet with OS bullseye
  • 09:38 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host mw1473.eqiad.wmnet with OS bullseye
  • 09:37 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host mw1472.eqiad.wmnet with OS bullseye
  • 09:37 arnaudb@cumin1001: dbctl commit (dc=all): 'db2189 (re)pooling @ 60%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53870 and previous config saved to /var/cache/conftool/dbconfig/20231124-093733-arnaudb.json
  • 09:37 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host mw2431.codfw.wmnet with OS bullseye
  • 09:37 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host mw2425.codfw.wmnet with OS bullseye
  • 09:36 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host mw2421.codfw.wmnet with OS bullseye
  • 09:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db2188 (re)pooling @ 60%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53869 and previous config saved to /var/cache/conftool/dbconfig/20231124-093640-arnaudb.json
  • 09:32 arnaudb@cumin1001: dbctl commit (dc=all): 'db2195 (re)pooling @ 50%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53868 and previous config saved to /var/cache/conftool/dbconfig/20231124-093207-arnaudb.json
  • 09:31 arnaudb@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 50%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53867 and previous config saved to /var/cache/conftool/dbconfig/20231124-093109-arnaudb.json
  • 09:24 arnaudb@cumin1001: dbctl commit (dc=all): 'db2190 (re)pooling @ 50%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53866 and previous config saved to /var/cache/conftool/dbconfig/20231124-092456-arnaudb.json
  • 09:24 arnaudb@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 40%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53865 and previous config saved to /var/cache/conftool/dbconfig/20231124-092413-arnaudb.json
  • 09:22 arnaudb@cumin1001: dbctl commit (dc=all): 'db2189 (re)pooling @ 50%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53864 and previous config saved to /var/cache/conftool/dbconfig/20231124-092228-arnaudb.json
  • 09:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db2188 (re)pooling @ 50%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53862 and previous config saved to /var/cache/conftool/dbconfig/20231124-092135-arnaudb.json
  • 09:17 arnaudb@cumin1001: dbctl commit (dc=all): 'db2195 (re)pooling @ 40%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53861 and previous config saved to /var/cache/conftool/dbconfig/20231124-091702-arnaudb.json
  • 09:16 arnaudb@cumin1001: dbctl commit (dc=all): 'db2193 (re)pooling @ 100%: Post reconfig repooling', diff saved to https://phabricator.wikimedia.org/P53860 and previous config saved to /var/cache/conftool/dbconfig/20231124-091639-arnaudb.json
  • 09:16 arnaudb@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 100%: Post reconfig repooling', diff saved to https://phabricator.wikimedia.org/P53859 and previous config saved to /var/cache/conftool/dbconfig/20231124-091621-arnaudb.json
  • 09:16 arnaudb@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 40%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53858 and previous config saved to /var/cache/conftool/dbconfig/20231124-091604-arnaudb.json
  • 09:16 arnaudb@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 100%: Post reconfig repooling', diff saved to https://phabricator.wikimedia.org/P53857 and previous config saved to /var/cache/conftool/dbconfig/20231124-091601-arnaudb.json
  • 09:09 arnaudb@cumin1001: dbctl commit (dc=all): 'db2190 (re)pooling @ 40%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53856 and previous config saved to /var/cache/conftool/dbconfig/20231124-090951-arnaudb.json
  • 09:09 arnaudb@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 35%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53855 and previous config saved to /var/cache/conftool/dbconfig/20231124-090908-arnaudb.json
  • 09:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db2189 (re)pooling @ 40%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53854 and previous config saved to /var/cache/conftool/dbconfig/20231124-090723-arnaudb.json
  • 09:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db2188 (re)pooling @ 40%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53853 and previous config saved to /var/cache/conftool/dbconfig/20231124-090630-arnaudb.json
  • 09:01 arnaudb@cumin1001: dbctl commit (dc=all): 'db2195 (re)pooling @ 30%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53852 and previous config saved to /var/cache/conftool/dbconfig/20231124-090157-arnaudb.json
  • 09:01 arnaudb@cumin1001: dbctl commit (dc=all): 'db2193 (re)pooling @ 80%: Post reconfig repooling', diff saved to https://phabricator.wikimedia.org/P53851 and previous config saved to /var/cache/conftool/dbconfig/20231124-090134-arnaudb.json
  • 09:01 arnaudb@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 80%: Post reconfig repooling', diff saved to https://phabricator.wikimedia.org/P53850 and previous config saved to /var/cache/conftool/dbconfig/20231124-090116-arnaudb.json
  • 09:01 arnaudb@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 30%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53849 and previous config saved to /var/cache/conftool/dbconfig/20231124-090059-arnaudb.json
  • 09:00 arnaudb@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 80%: Post reconfig repooling', diff saved to https://phabricator.wikimedia.org/P53848 and previous config saved to /var/cache/conftool/dbconfig/20231124-090056-arnaudb.json
  • 08:54 arnaudb@cumin1001: dbctl commit (dc=all): 'db2190 (re)pooling @ 30%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53847 and previous config saved to /var/cache/conftool/dbconfig/20231124-085446-arnaudb.json
  • 08:54 arnaudb@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 30%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53846 and previous config saved to /var/cache/conftool/dbconfig/20231124-085403-arnaudb.json
  • 08:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db2189 (re)pooling @ 30%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53845 and previous config saved to /var/cache/conftool/dbconfig/20231124-085218-arnaudb.json
  • 08:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db2188 (re)pooling @ 30%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53844 and previous config saved to /var/cache/conftool/dbconfig/20231124-085125-arnaudb.json
  • 08:46 arnaudb@cumin1001: dbctl commit (dc=all): 'db2195 (re)pooling @ 20%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53843 and previous config saved to /var/cache/conftool/dbconfig/20231124-084652-arnaudb.json
  • 08:46 arnaudb@cumin1001: dbctl commit (dc=all): 'db2193 (re)pooling @ 60%: Post reconfig repooling', diff saved to https://phabricator.wikimedia.org/P53842 and previous config saved to /var/cache/conftool/dbconfig/20231124-084629-arnaudb.json
  • 08:46 arnaudb@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 60%: Post reconfig repooling', diff saved to https://phabricator.wikimedia.org/P53841 and previous config saved to /var/cache/conftool/dbconfig/20231124-084611-arnaudb.json
  • 08:45 arnaudb@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 20%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53840 and previous config saved to /var/cache/conftool/dbconfig/20231124-084554-arnaudb.json
  • 08:45 arnaudb@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 60%: Post reconfig repooling', diff saved to https://phabricator.wikimedia.org/P53839 and previous config saved to /var/cache/conftool/dbconfig/20231124-084551-arnaudb.json
  • 08:39 arnaudb@cumin1001: dbctl commit (dc=all): 'db2190 (re)pooling @ 20%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53838 and previous config saved to /var/cache/conftool/dbconfig/20231124-083941-arnaudb.json
  • 08:38 arnaudb@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 25%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53837 and previous config saved to /var/cache/conftool/dbconfig/20231124-083858-arnaudb.json
  • 08:37 arnaudb@cumin1001: dbctl commit (dc=all): 'db2189 (re)pooling @ 20%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53836 and previous config saved to /var/cache/conftool/dbconfig/20231124-083713-arnaudb.json
  • 08:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db2188 (re)pooling @ 20%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53835 and previous config saved to /var/cache/conftool/dbconfig/20231124-083620-arnaudb.json
  • 08:31 arnaudb@cumin1001: dbctl commit (dc=all): 'db2195 (re)pooling @ 10%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53834 and previous config saved to /var/cache/conftool/dbconfig/20231124-083147-arnaudb.json
  • 08:31 arnaudb@cumin1001: dbctl commit (dc=all): 'db2193 (re)pooling @ 40%: Post reconfig repooling', diff saved to https://phabricator.wikimedia.org/P53833 and previous config saved to /var/cache/conftool/dbconfig/20231124-083124-arnaudb.json
  • 08:31 arnaudb@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 40%: Post reconfig repooling', diff saved to https://phabricator.wikimedia.org/P53832 and previous config saved to /var/cache/conftool/dbconfig/20231124-083106-arnaudb.json
  • 08:30 arnaudb@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 10%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53831 and previous config saved to /var/cache/conftool/dbconfig/20231124-083049-arnaudb.json
  • 08:30 arnaudb@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 40%: Post reconfig repooling', diff saved to https://phabricator.wikimedia.org/P53830 and previous config saved to /var/cache/conftool/dbconfig/20231124-083046-arnaudb.json
  • 08:24 arnaudb@cumin1001: dbctl commit (dc=all): 'db2190 (re)pooling @ 10%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53829 and previous config saved to /var/cache/conftool/dbconfig/20231124-082436-arnaudb.json
  • 08:23 arnaudb@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 20%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53828 and previous config saved to /var/cache/conftool/dbconfig/20231124-082353-arnaudb.json
  • 08:22 arnaudb@cumin1001: dbctl commit (dc=all): 'db2189 (re)pooling @ 10%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53827 and previous config saved to /var/cache/conftool/dbconfig/20231124-082208-arnaudb.json
  • 08:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db2188 (re)pooling @ 10%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53826 and previous config saved to /var/cache/conftool/dbconfig/20231124-082115-arnaudb.json
  • 08:16 arnaudb@cumin1001: dbctl commit (dc=all): 'db2193 (re)pooling @ 20%: Post reconfig repooling', diff saved to https://phabricator.wikimedia.org/P53825 and previous config saved to /var/cache/conftool/dbconfig/20231124-081619-arnaudb.json
  • 08:16 arnaudb@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 20%: Post reconfig repooling', diff saved to https://phabricator.wikimedia.org/P53824 and previous config saved to /var/cache/conftool/dbconfig/20231124-081601-arnaudb.json
  • 08:15 arnaudb@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 20%: Post reconfig repooling', diff saved to https://phabricator.wikimedia.org/P53823 and previous config saved to /var/cache/conftool/dbconfig/20231124-081541-arnaudb.json
  • 08:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2180 and db2193 to fix their config, will repool them asap', diff saved to https://phabricator.wikimedia.org/P53822 and previous config saved to /var/cache/conftool/dbconfig/20231124-081422-arnaudb.json
  • 08:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2181 and db2193 to fix their config, will repool them asap', diff saved to https://phabricator.wikimedia.org/P53821 and previous config saved to /var/cache/conftool/dbconfig/20231124-081304-arnaudb.json
  • 08:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy2001.codfw.wmnet with OS bookworm
  • 08:08 arnaudb@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 15%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53820 and previous config saved to /var/cache/conftool/dbconfig/20231124-080848-arnaudb.json
  • 07:53 arnaudb@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 10%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53819 and previous config saved to /var/cache/conftool/dbconfig/20231124-075343-arnaudb.json
  • 07:51 arnaudb@cumin1001: dbctl commit (dc=all): 'repool API on db2181', diff saved to https://phabricator.wikimedia.org/P53818 and previous config saved to /var/cache/conftool/dbconfig/20231124-075137-arnaudb.json
  • 07:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy2001.codfw.wmnet with reason: host reimage
  • 07:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy2001.codfw.wmnet with reason: host reimage
  • 07:38 arnaudb@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 5%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53817 and previous config saved to /var/cache/conftool/dbconfig/20231124-073838-arnaudb.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: Upgrade to 10.6.16', diff saved to https://phabricator.wikimedia.org/P53816 and previous config saved to /var/cache/conftool/dbconfig/20231124-073743-root.json
  • 07:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy2002.codfw.wmnet with OS bookworm
  • 07:35 arnaudb@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on es2032.codfw.wmnet with reason: reboot
  • 07:35 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2032.codfw.wmnet with reason: reboot
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 100%: Upgrade to 10.6.16', diff saved to https://phabricator.wikimedia.org/P53815 and previous config saved to /var/cache/conftool/dbconfig/20231124-073510-root.json
  • 07:35 arnaudb@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on es2032.codfw.wmnet with reason: reboot
  • 07:34 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2032.codfw.wmnet with reason: reboot
  • 07:34 arnaudb@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on es2032.codfw.wmnet with reason: reboot
  • 07:34 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2032.codfw.wmnet with reason: reboot
  • 07:32 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy2001.codfw.wmnet with OS bookworm
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: Upgrade to 10.6.16', diff saved to https://phabricator.wikimedia.org/P53814 and previous config saved to /var/cache/conftool/dbconfig/20231124-072238-root.json
  • 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 75%: Upgrade to 10.6.16', diff saved to https://phabricator.wikimedia.org/P53813 and previous config saved to /var/cache/conftool/dbconfig/20231124-072005-root.json
  • 07:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy2002.codfw.wmnet with reason: host reimage
  • 07:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy2002.codfw.wmnet with reason: host reimage
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: Upgrade to 10.6.16', diff saved to https://phabricator.wikimedia.org/P53812 and previous config saved to /var/cache/conftool/dbconfig/20231124-070733-root.json
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 50%: Upgrade to 10.6.16', diff saved to https://phabricator.wikimedia.org/P53811 and previous config saved to /var/cache/conftool/dbconfig/20231124-070500-root.json
  • 06:55 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy2002.codfw.wmnet with OS bookworm
  • 06:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy2003.codfw.wmnet with OS bookworm
  • 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: Upgrade to 10.6.16', diff saved to https://phabricator.wikimedia.org/P53810 and previous config saved to /var/cache/conftool/dbconfig/20231124-065228-root.json
  • 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 25%: Upgrade to 10.6.16', diff saved to https://phabricator.wikimedia.org/P53809 and previous config saved to /var/cache/conftool/dbconfig/20231124-064955-root.json
  • 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: Upgrade to 10.6.16', diff saved to https://phabricator.wikimedia.org/P53808 and previous config saved to /var/cache/conftool/dbconfig/20231124-063723-root.json
  • 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 10%: Upgrade to 10.6.16', diff saved to https://phabricator.wikimedia.org/P53807 and previous config saved to /var/cache/conftool/dbconfig/20231124-063450-root.json
  • 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P53806 and previous config saved to /var/cache/conftool/dbconfig/20231124-063424-root.json
  • 06:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy2003.codfw.wmnet with reason: host reimage
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2122', diff saved to https://phabricator.wikimedia.org/P53805 and previous config saved to /var/cache/conftool/dbconfig/20231124-063152-root.json
  • 06:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy2003.codfw.wmnet with reason: host reimage
  • 06:14 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy2003.codfw.wmnet with OS bookworm

2023-11-23

  • 17:44 vgutierrez: repool ncredir4001
  • 17:25 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2420.codfw.wmnet with OS bullseye
  • 17:18 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2131.codfw.wmnet onto db2191.codfw.wmnet
  • 17:04 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2420.codfw.wmnet with reason: host reimage
  • 17:03 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host an-worker1160.eqiad.wmnet with OS bullseye
  • 17:01 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2420.codfw.wmnet with reason: host reimage
  • 16:44 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 16:43 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 16:43 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host mw2420.codfw.wmnet with OS bullseye
  • 16:42 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 16:41 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 16:41 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 16:40 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 16:35 arnaudb@cumin1001: dbctl commit (dc=all): 'db1242 (re)pooling @ 100%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53804 and previous config saved to /var/cache/conftool/dbconfig/20231123-163507-arnaudb.json
  • 16:20 arnaudb@cumin1001: dbctl commit (dc=all): 'db1242 (re)pooling @ 90%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53802 and previous config saved to /var/cache/conftool/dbconfig/20231123-162002-arnaudb.json
  • 16:13 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1160.eqiad.wmnet with OS bullseye
  • 16:13 klausman@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 16:04 arnaudb@cumin1001: dbctl commit (dc=all): 'db1242 (re)pooling @ 75%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53801 and previous config saved to /var/cache/conftool/dbconfig/20231123-160457-arnaudb.json
  • 15:49 arnaudb@cumin1001: dbctl commit (dc=all): 'db1242 (re)pooling @ 60%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53799 and previous config saved to /var/cache/conftool/dbconfig/20231123-154952-arnaudb.json
  • 15:44 arnaudb@cumin1001: dbctl commit (dc=all): 'db1243 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53798 and previous config saved to /var/cache/conftool/dbconfig/20231123-154425-arnaudb.json
  • 15:34 arnaudb@cumin1001: dbctl commit (dc=all): 'db1242 (re)pooling @ 45%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53797 and previous config saved to /var/cache/conftool/dbconfig/20231123-153447-arnaudb.json
  • 15:29 arnaudb@cumin1001: dbctl commit (dc=all): 'db1243 (re)pooling @ 90%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53796 and previous config saved to /var/cache/conftool/dbconfig/20231123-152920-arnaudb.json
  • 15:19 arnaudb@cumin1001: dbctl commit (dc=all): 'db1242 (re)pooling @ 30%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53795 and previous config saved to /var/cache/conftool/dbconfig/20231123-151942-arnaudb.json
  • 15:14 arnaudb@cumin1001: dbctl commit (dc=all): 'db1243 (re)pooling @ 80%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53794 and previous config saved to /var/cache/conftool/dbconfig/20231123-151415-arnaudb.json
  • 15:11 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host druid1008.eqiad.wmnet with OS bullseye
  • 15:11 arnaudb@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 100%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53793 and previous config saved to /var/cache/conftool/dbconfig/20231123-151122-arnaudb.json
  • 15:08 arnaudb@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53792 and previous config saved to /var/cache/conftool/dbconfig/20231123-150825-arnaudb.json
  • 15:04 arnaudb@cumin1001: dbctl commit (dc=all): 'db1242 (re)pooling @ 15%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53790 and previous config saved to /var/cache/conftool/dbconfig/20231123-150437-arnaudb.json
  • 14:59 arnaudb@cumin1001: dbctl commit (dc=all): 'db1243 (re)pooling @ 70%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53789 and previous config saved to /var/cache/conftool/dbconfig/20231123-145910-arnaudb.json
  • 14:56 arnaudb@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 90%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53788 and previous config saved to /var/cache/conftool/dbconfig/20231123-145617-arnaudb.json
  • 14:55 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1008.eqiad.wmnet with reason: host reimage
  • 14:53 arnaudb@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 90%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53787 and previous config saved to /var/cache/conftool/dbconfig/20231123-145320-arnaudb.json
  • 14:51 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1008.eqiad.wmnet with reason: host reimage
  • 14:50 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons.
  • 14:49 arnaudb@cumin1001: dbctl commit (dc=all): 'db1242 (re)pooling @ 10%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53786 and previous config saved to /var/cache/conftool/dbconfig/20231123-144932-arnaudb.json
  • 14:44 arnaudb@cumin1001: dbctl commit (dc=all): 'db1243 (re)pooling @ 60%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53785 and previous config saved to /var/cache/conftool/dbconfig/20231123-144405-arnaudb.json
  • 14:41 arnaudb@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 80%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53784 and previous config saved to /var/cache/conftool/dbconfig/20231123-144112-arnaudb.json
  • 14:38 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host druid1008.eqiad.wmnet with OS bullseye
  • 14:38 arnaudb@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 80%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53783 and previous config saved to /var/cache/conftool/dbconfig/20231123-143815-arnaudb.json
  • 14:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy2004.codfw.wmnet with OS bookworm
  • 14:34 arnaudb@cumin1001: dbctl commit (dc=all): 'db1242 (re)pooling @ 5%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53782 and previous config saved to /var/cache/conftool/dbconfig/20231123-143427-arnaudb.json
  • 14:32 arnaudb@cumin1001: dbctl commit (dc=all): 'temporary depool of db1242 to fix API', diff saved to https://phabricator.wikimedia.org/P53781 and previous config saved to /var/cache/conftool/dbconfig/20231123-143238-arnaudb.json
  • 14:31 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 14:31 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 14:30 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 14:30 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 14:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db2175 repooling api', diff saved to https://phabricator.wikimedia.org/P53780 and previous config saved to /var/cache/conftool/dbconfig/20231123-142950-arnaudb.json
  • 14:29 arnaudb@cumin1001: dbctl commit (dc=all): 'db1243 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53779 and previous config saved to /var/cache/conftool/dbconfig/20231123-142900-arnaudb.json
  • 14:27 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host druid1008.eqiad.wmnet with OS bullseye
  • 14:26 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host druid1008.eqiad.wmnet with OS bullseye
  • 14:26 arnaudb@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53778 and previous config saved to /var/cache/conftool/dbconfig/20231123-142639-arnaudb.json
  • 14:26 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 14:26 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 14:26 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 14:26 arnaudb@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 70%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53777 and previous config saved to /var/cache/conftool/dbconfig/20231123-142607-arnaudb.json
  • 14:25 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 14:23 arnaudb@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 70%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53776 and previous config saved to /var/cache/conftool/dbconfig/20231123-142310-arnaudb.json
  • 14:22 fabfur: swap cp1115 <-> cp1090 (T349244)
  • 14:21 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1115.eqiad.wmnet
  • 14:21 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp1115.eqiad.wmnet
  • 14:21 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 14:21 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 14:21 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 14:20 jayme@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 14:17 fabfur: swap cp1114 <-> cp1089 (T349244)
  • 14:16 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1114.eqiad.wmnet
  • 14:16 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp1114.eqiad.wmnet
  • 14:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy2004.codfw.wmnet with reason: host reimage
  • 14:13 arnaudb@cumin1001: dbctl commit (dc=all): 'db1243 (re)pooling @ 40%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53775 and previous config saved to /var/cache/conftool/dbconfig/20231123-141355-arnaudb.json
  • 14:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy2004.codfw.wmnet with reason: host reimage
  • 14:11 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
  • 14:11 arnaudb@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 90%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53774 and previous config saved to /var/cache/conftool/dbconfig/20231123-141134-arnaudb.json
  • 14:11 arnaudb@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 60%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53773 and previous config saved to /var/cache/conftool/dbconfig/20231123-141102-arnaudb.json
  • 14:09 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host irc1002.wikimedia.org
  • 14:08 arnaudb@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 60%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53772 and previous config saved to /var/cache/conftool/dbconfig/20231123-140805-arnaudb.json
  • 13:58 arnaudb@cumin1001: dbctl commit (dc=all): 'db1243 (re)pooling @ 30%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53771 and previous config saved to /var/cache/conftool/dbconfig/20231123-135850-arnaudb.json
  • 13:58 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host irc1002.wikimedia.org
  • 13:56 arnaudb@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 80%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53770 and previous config saved to /var/cache/conftool/dbconfig/20231123-135629-arnaudb.json
  • 13:55 arnaudb@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 50%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53769 and previous config saved to /var/cache/conftool/dbconfig/20231123-135557-arnaudb.json
  • 13:54 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy2004.codfw.wmnet with OS bookworm
  • 13:53 marostegui@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dbproxy2004.codfw.wmnet with OS bookworm
  • 13:53 arnaudb@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53768 and previous config saved to /var/cache/conftool/dbconfig/20231123-135300-arnaudb.json
  • 13:49 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host irc2002.wikimedia.org
  • 13:45 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db2131.codfw.wmnet onto db2191.codfw.wmnet
  • 13:43 arnaudb@cumin1001: dbctl commit (dc=all): 'db1243 (re)pooling @ 20%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53767 and previous config saved to /var/cache/conftool/dbconfig/20231123-134345-arnaudb.json
  • 13:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db2131 in db2191 for T343674', diff saved to https://phabricator.wikimedia.org/P53766 and previous config saved to /var/cache/conftool/dbconfig/20231123-134316-arnaudb.json
  • 13:41 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2191.codfw.wmnet with reason: provisionning db2191.codfw.wmnet - T343674
  • 13:41 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2191.codfw.wmnet with reason: provisionning db2191.codfw.wmnet - T343674
  • 13:41 arnaudb@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 70%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53765 and previous config saved to /var/cache/conftool/dbconfig/20231123-134124-arnaudb.json
  • 13:41 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2131.codfw.wmnet with reason: provisionning db2191.codfw.wmnet - T343674
  • 13:41 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2131.codfw.wmnet with reason: provisionning db2191.codfw.wmnet - T343674
  • 13:40 arnaudb@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 40%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53764 and previous config saved to /var/cache/conftool/dbconfig/20231123-134052-arnaudb.json
  • 13:39 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host druid1008.eqiad.wmnet with OS bullseye
  • 13:37 arnaudb@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 40%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53763 and previous config saved to /var/cache/conftool/dbconfig/20231123-133755-arnaudb.json
  • 13:30 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host irc2002.wikimedia.org
  • 13:28 arnaudb@cumin1001: dbctl commit (dc=all): 'db1243 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53762 and previous config saved to /var/cache/conftool/dbconfig/20231123-132840-arnaudb.json
  • 13:26 arnaudb@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 60%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53761 and previous config saved to /var/cache/conftool/dbconfig/20231123-132619-arnaudb.json
  • 13:25 arnaudb@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 30%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53760 and previous config saved to /var/cache/conftool/dbconfig/20231123-132547-arnaudb.json
  • 13:22 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy2004.codfw.wmnet with OS bookworm
  • 13:22 arnaudb@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 30%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53759 and previous config saved to /var/cache/conftool/dbconfig/20231123-132250-arnaudb.json
  • 13:22 arnaudb@cumin1001: dbctl commit (dc=all): 'db2149 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53758 and previous config saved to /var/cache/conftool/dbconfig/20231123-132215-arnaudb.json
  • 13:21 marostegui@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dbproxy2004.codfw.wmnet with OS bookworm
  • 13:11 arnaudb@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53757 and previous config saved to /var/cache/conftool/dbconfig/20231123-131114-arnaudb.json
  • 13:10 arnaudb@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 20%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53756 and previous config saved to /var/cache/conftool/dbconfig/20231123-131042-arnaudb.json
  • 13:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 20%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53755 and previous config saved to /var/cache/conftool/dbconfig/20231123-130745-arnaudb.json
  • 13:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db2149 (re)pooling @ 90%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53754 and previous config saved to /var/cache/conftool/dbconfig/20231123-130710-arnaudb.json
  • 12:56 arnaudb@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 40%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53753 and previous config saved to /var/cache/conftool/dbconfig/20231123-125609-arnaudb.json
  • 12:55 arnaudb@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 10%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53752 and previous config saved to /var/cache/conftool/dbconfig/20231123-125537-arnaudb.json
  • 12:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53751 and previous config saved to /var/cache/conftool/dbconfig/20231123-125240-arnaudb.json
  • 12:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db2149 (re)pooling @ 80%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53750 and previous config saved to /var/cache/conftool/dbconfig/20231123-125205-arnaudb.json
  • 12:45 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2181.codfw.wmnet onto db2195.codfw.wmnet
  • 12:41 arnaudb@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 30%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53749 and previous config saved to /var/cache/conftool/dbconfig/20231123-124104-arnaudb.json
  • 12:40 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy2004.codfw.wmnet with OS bookworm
  • 12:37 arnaudb@cumin1001: dbctl commit (dc=all): 'db2149 (re)pooling @ 70%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53748 and previous config saved to /var/cache/conftool/dbconfig/20231123-123700-arnaudb.json
  • 12:30 vgutierrez: depooling ncredir4001 till puppet is fixed
  • 12:26 arnaudb@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 20%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53747 and previous config saved to /var/cache/conftool/dbconfig/20231123-122559-arnaudb.json
  • 12:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db2149 (re)pooling @ 60%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53746 and previous config saved to /var/cache/conftool/dbconfig/20231123-122155-arnaudb.json
  • 12:19 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host druid1008.eqiad.wmnet with OS bullseye
  • 12:11 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host druid1008.eqiad.wmnet with OS bullseye
  • 12:10 arnaudb@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53745 and previous config saved to /var/cache/conftool/dbconfig/20231123-121054-arnaudb.json
  • 12:08 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2146.codfw.wmnet onto db2188.codfw.wmnet
  • 12:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db2149 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53744 and previous config saved to /var/cache/conftool/dbconfig/20231123-120650-arnaudb.json
  • 11:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db2149 (re)pooling @ 40%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53743 and previous config saved to /var/cache/conftool/dbconfig/20231123-115145-arnaudb.json
  • 11:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db2149 (re)pooling @ 30%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53742 and previous config saved to /var/cache/conftool/dbconfig/20231123-113640-arnaudb.json
  • 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host lists1004.eqiad.wmnet
  • 11:25 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host druid1008.eqiad.wmnet with OS bullseye
  • 11:24 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2175.codfw.wmnet onto db2189.codfw.wmnet
  • 11:23 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host lists1004.eqiad.wmnet
  • 11:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db2149 (re)pooling @ 20%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53741 and previous config saved to /var/cache/conftool/dbconfig/20231123-112135-arnaudb.json
  • 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dborch1001.wikimedia.org
  • 11:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dborch1001.wikimedia.org
  • 11:12 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 11:12 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 11:12 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 11:12 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 11:12 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 11:11 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 11:11 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 11:11 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 11:11 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 11:11 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 11:11 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 11:11 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 11:11 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 11:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 11:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db2149 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P53740 and previous config saved to /var/cache/conftool/dbconfig/20231123-110630-arnaudb.json
  • 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: orchestrator
  • 10:59 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2149.codfw.wmnet onto db2190.codfw.wmnet
  • 10:52 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: orchestrator
  • 10:50 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db2181.codfw.wmnet onto db2195.codfw.wmnet
  • 10:47 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db2181 in db2195 for T343674', diff saved to https://phabricator.wikimedia.org/P53739 and previous config saved to /var/cache/conftool/dbconfig/20231123-104724-arnaudb.json
  • 10:45 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: provisionning db2195.codfw.wmnet - T343674
  • 10:45 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: provisionning db2195.codfw.wmnet - T343674
  • 10:45 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: provisionning db2195.codfw.wmnet - T343674
  • 10:45 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: provisionning db2195.codfw.wmnet - T343674
  • 10:34 stevemunene@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host druid1008.eqiad.wmnet with OS bullseye
  • 10:31 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db2146.codfw.wmnet onto db2188.codfw.wmnet
  • 10:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: swift::proxy
  • 10:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db2146 in db2188 for T343674', diff saved to https://phabricator.wikimedia.org/P53738 and previous config saved to /var/cache/conftool/dbconfig/20231123-102840-arnaudb.json
  • 10:27 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: provisionning db2188.codfw.wmnet - T343674
  • 10:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: provisionning db2188.codfw.wmnet - T343674
  • 10:27 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: provisionning db2188.codfw.wmnet - T343674
  • 10:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: provisionning db2188.codfw.wmnet - T343674
  • 10:22 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host druid1008.eqiad.wmnet with OS bullseye
  • 10:16 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: swift::proxy
  • 10:09 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db2175.codfw.wmnet onto db2189.codfw.wmnet
  • 10:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db2175 in db2189 for T343674', diff saved to https://phabricator.wikimedia.org/P53737 and previous config saved to /var/cache/conftool/dbconfig/20231123-100638-arnaudb.json
  • 10:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: provisionning db2189.codfw.wmnet - T343674
  • 10:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: provisionning db2189.codfw.wmnet - T343674
  • 10:04 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: provisionning db2189.codfw.wmnet - T343674
  • 10:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: provisionning db2189.codfw.wmnet - T343674
  • 09:59 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host druid1008.eqiad.wmnet with OS bullseye
  • 09:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 09:26 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 09:20 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db2149.codfw.wmnet onto db2190.codfw.wmnet
  • 09:19 arnaudb@cumin1001: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db2149.codfw.wmnet onto db2190.codfw.wmnet
  • 09:18 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db2149.codfw.wmnet onto db2190.codfw.wmnet
  • 09:18 arnaudb@cumin1001: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db2149.codfw.wmnet onto db2190.codfw.wmnet
  • 09:17 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db2149.codfw.wmnet onto db2190.codfw.wmnet
  • 09:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db2149 in db2190 for T343674', diff saved to https://phabricator.wikimedia.org/P53736 and previous config saved to /var/cache/conftool/dbconfig/20231123-091514-arnaudb.json
  • 09:13 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2190.codfw.wmnet with reason: provisionning db2190.codfw.wmnet - T343674
  • 09:13 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2190.codfw.wmnet with reason: provisionning db2190.codfw.wmnet - T343674
  • 09:13 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: provisionning db2190.codfw.wmnet - T343674
  • 09:13 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: provisionning db2190.codfw.wmnet - T343674
  • 09:12 godog: add 50G to prometheus/services in codfw
  • 09:10 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host druid1008.eqiad.wmnet with OS bullseye
  • 09:10 godog: add 80G to prometheus/k8s in eqiad
  • 08:49 Emperor: powercycle titan1001
  • 08:45 moritzm: powercycling titan1002
  • 08:37 hashar: Restarting CI Jenkins for plugins removals
  • 07:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1195.eqiad.wmnet with OS bookworm
  • 07:19 _joe_: restarted sirenbot
  • 07:08 hashar: Restarted CI Jenkins to upgrade Rebuilder plugin
  • 07:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1195.eqiad.wmnet with reason: host reimage
  • 07:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1195.eqiad.wmnet with reason: host reimage
  • 06:53 hashar: Restarting Gerrit
  • 06:52 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1195.eqiad.wmnet with OS bookworm
  • 06:50 hashar: Restarting CI Jenkins for plugins removals
  • 06:44 marostegui: Failover m2 from db1195 to db1119 - T351638
  • 06:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: Switch
  • 06:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: Switch
  • 06:23 hashar: Restarting CI Jenkins for plugin update # T282893
  • 02:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2028.codfw.wmnet with OS bullseye
  • 01:26 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host planet2003.codfw.wmnet with OS bookworm
  • 01:26 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host planet1003.eqiad.wmnet with OS bookworm
  • 01:09 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2028.codfw.wmnet with OS bullseye
  • 01:03 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['restbase2033']
  • 01:02 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2033']
  • 01:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase2033.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:51 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2033.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on planet2003.codfw.wmnet with reason: host reimage
  • 00:41 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on planet2003.codfw.wmnet with reason: host reimage
  • 00:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on planet1003.eqiad.wmnet with reason: host reimage
  • 00:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on planet1003.eqiad.wmnet with reason: host reimage
  • 00:29 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host planet2003.codfw.wmnet with OS bookworm
  • 00:28 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=97) for new host planet2003.codfw.wmnet
  • 00:28 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host planet2003.codfw.wmnet with OS bookworm
  • 00:20 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host planet1003.eqiad.wmnet with OS bookworm
  • 00:19 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host planet1003.eqiad.wmnet with OS bookworm
  • 00:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase2033.mgmt.codfw.wmnet with reboot policy FORCED

2023-11-22

  • 23:59 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['restbase2035']
  • 23:58 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2035']
  • 23:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['restbase2034']
  • 23:58 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2034']
  • 23:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['restbase2033']
  • 23:56 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2033']
  • 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['restbase2032']
  • 23:56 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2032']
  • 23:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['restbase2031']
  • 23:55 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2031']
  • 23:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['restbase2030']
  • 23:54 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2033.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:54 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2030']
  • 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['restbase2029']
  • 23:54 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2029']
  • 23:53 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['restbase2029']
  • 23:53 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2029']
  • 23:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['restbase2028']
  • 23:51 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2028']
  • 23:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['restbase2028']
  • 23:50 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2028']
  • 23:50 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['restbase2028']
  • 23:50 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2028']
  • 23:50 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['restbase2028']
  • 23:49 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2028']
  • 23:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase2035.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase2033.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase2034.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2035.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase2032.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2034.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase2031.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2033.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase2033.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:26 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2010.codfw.wmnet with OS bullseye
  • 23:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2033.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2032.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase2030.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase2029.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2031.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase2028.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:10 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2030.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:09 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2029.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:05 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on planet2003.codfw.wmnet with reason: host reimage
  • 23:02 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2010.codfw.wmnet with reason: host reimage
  • 23:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2028.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on planet2003.codfw.wmnet with reason: host reimage
  • 22:57 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2010.codfw.wmnet with reason: host reimage
  • 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating ganeti servers in codfw - jhancock@cumin2002"
  • 22:52 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating ganeti servers in codfw - jhancock@cumin2002"
  • 22:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 22:43 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs2010.codfw.wmnet with OS bullseye
  • 22:43 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aqs2010.codfw.wmnet with OS bullseye
  • 22:41 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host planet2003.codfw.wmnet with OS bookworm
  • 22:41 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM planet2003.codfw.wmnet - dzahn@cumin1001"
  • 22:40 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM planet2003.codfw.wmnet - dzahn@cumin1001"
  • 22:40 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) planet2003.codfw.wmnet on all recursors
  • 22:40 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache planet2003.codfw.wmnet on all recursors
  • 22:40 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:40 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM planet2003.codfw.wmnet - dzahn@cumin1001"
  • 22:38 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM planet2003.codfw.wmnet - dzahn@cumin1001"
  • 22:37 jhathaway: my latest commit, may have broken puppet-merge, I'm investigating
  • 22:35 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 22:35 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host planet2003.codfw.wmnet
  • 22:34 mutante: puppetserver1001 - manually signed puppet cert request for planet1003
  • 22:25 ebernhardson: start cirrus updater backfilling into relforge
  • 22:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on planet1003.eqiad.wmnet with reason: host reimage
  • 22:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on planet1003.eqiad.wmnet with reason: host reimage
  • 22:20 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs2010.codfw.wmnet with OS bullseye
  • 22:20 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aqs2010.codfw.wmnet with OS bullseye
  • 22:18 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:18 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:18 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:11 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host planet1003.eqiad.wmnet with OS bookworm
  • 22:09 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=97) for new host planet1003.eqiad.wmnet
  • 22:08 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host planet1003.eqiad.wmnet with OS bookworm
  • 22:07 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1175.eqiad.wmnet with OS bullseye
  • 22:06 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs2010.codfw.wmnet with OS bullseye
  • 22:05 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2009.codfw.wmnet with OS bullseye
  • 21:46 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2009.codfw.wmnet with reason: host reimage
  • 21:44 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2009.codfw.wmnet with reason: host reimage
  • 21:37 catrope@deploy2002: Finished scap: Backport for Update Annual Plan Core Metrics survey (T351353) (duration: 09m 04s)
  • 21:30 catrope@deploy2002: catrope and dani: Continuing with sync
  • 21:30 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs2009.codfw.wmnet with OS bullseye
  • 21:29 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs2008.codfw.wmnet
  • 21:29 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for aqs2008.codfw.wmnet
  • 21:29 catrope@deploy2002: catrope and dani: Backport for Update Annual Plan Core Metrics survey (T351353) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:28 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2008.codfw.wmnet with OS bullseye
  • 21:28 catrope@deploy2002: Started scap: Backport for Update Annual Plan Core Metrics survey (T351353)
  • 21:24 catrope@deploy2002: Finished scap: Backport for Undeploy Reader Demographics 2 survey on enwiki (T344393) (duration: 21m 43s)
  • 21:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on planet1003.eqiad.wmnet with reason: host reimage
  • 21:18 catrope@deploy2002: catrope and dani: Continuing with sync
  • 21:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on planet1003.eqiad.wmnet with reason: host reimage
  • 21:11 catrope@deploy2002: catrope and dani: Backport for Undeploy Reader Demographics 2 survey on enwiki (T344393) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:07 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2008.codfw.wmnet with reason: host reimage
  • 21:07 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host planet1003.eqiad.wmnet with OS bookworm
  • 21:04 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2008.codfw.wmnet with reason: host reimage
  • 21:03 catrope@deploy2002: Started scap: Backport for Undeploy Reader Demographics 2 survey on enwiki (T344393)
  • 20:52 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM planet1003.eqiad.wmnet - dzahn@cumin1001"
  • 20:51 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM planet1003.eqiad.wmnet - dzahn@cumin1001"
  • 20:51 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) planet1003.eqiad.wmnet on all recursors
  • 20:50 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache planet1003.eqiad.wmnet on all recursors
  • 20:50 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:50 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM planet1003.eqiad.wmnet - dzahn@cumin1001"
  • 20:50 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM planet1003.eqiad.wmnet - dzahn@cumin1001"
  • 20:47 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 20:47 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host planet1003.eqiad.wmnet
  • 20:45 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1175.eqiad.wmnet with OS bullseye
  • 20:35 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs2008.codfw.wmnet with OS bullseye
  • 20:35 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs2007.codfw.wmnet
  • 20:35 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for aqs2007.codfw.wmnet
  • 20:33 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2007.codfw.wmnet with OS bullseye
  • 20:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T348183)', diff saved to https://phabricator.wikimedia.org/P53735 and previous config saved to /var/cache/conftool/dbconfig/20231122-202947-arnaudb.json
  • 20:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P53734 and previous config saved to /var/cache/conftool/dbconfig/20231122-201441-arnaudb.json
  • 20:14 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2007.codfw.wmnet with reason: host reimage
  • 20:11 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2007.codfw.wmnet with reason: host reimage
  • 19:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P53733 and previous config saved to /var/cache/conftool/dbconfig/20231122-195934-arnaudb.json
  • 19:58 ejegg: standalone (payments listener) SmashPig upgraded from b867e553 to f24afba3
  • 19:56 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs2007.codfw.wmnet with OS bullseye
  • 19:55 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs2006.codfw.wmnet
  • 19:55 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for aqs2006.codfw.wmnet
  • 19:48 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2006.codfw.wmnet with OS bullseye
  • 19:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T348183)', diff saved to https://phabricator.wikimedia.org/P53732 and previous config saved to /var/cache/conftool/dbconfig/20231122-194428-arnaudb.json
  • 19:37 ejegg: fundraising civicrm upgraded from 3c5db93b to f3de1778
  • 19:36 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1160.eqiad.wmnet with OS bullseye
  • 19:25 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2006.codfw.wmnet with reason: host reimage
  • 19:22 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2006.codfw.wmnet with reason: host reimage
  • 19:06 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs2006.codfw.wmnet with OS bullseye
  • 19:05 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs2005.codfw.wmnet
  • 19:05 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for aqs2005.codfw.wmnet
  • 18:58 ejegg: standalone SmashPig upgraded from c5b12dc3 to b867e553
  • 18:57 cstone: payments-wiki upgraded from 714552c5 to f02f8653
  • 18:50 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 18:16 robh@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1160.eqiad.wmnet with OS bullseye
  • 18:03 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2005.codfw.wmnet with OS bullseye
  • 17:45 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2005.codfw.wmnet with reason: host reimage
  • 17:42 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2005.codfw.wmnet with reason: host reimage
  • 17:36 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1175']
  • 17:29 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1175']
  • 17:28 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 17:28 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 17:27 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1175.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:27 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs2005.codfw.wmnet with OS bullseye
  • 17:27 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 17:27 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 17:26 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs2004.codfw.wmnet
  • 17:26 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for aqs2004.codfw.wmnet
  • 17:25 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 17:25 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 17:24 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 17:23 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2004.codfw.wmnet with OS bullseye
  • 17:23 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 17:21 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 17:21 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 17:07 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 17:06 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 17:02 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2004.codfw.wmnet with reason: host reimage
  • 17:01 fabfur: swapped cp1113 <-> cp1088 (T349244)
  • 16:59 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 16:57 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1113.eqiad.wmnet
  • 16:57 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp1113.eqiad.wmnet
  • 16:57 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2004.codfw.wmnet with reason: host reimage
  • 16:56 Emperor: repool ms-fe1014 with new envoy TLS setup T317616
  • 16:55 volans: installed spicerack v8.2.0 to the cumin hosts
  • 16:55 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1014.eqiad.wmnet with OS bullseye
  • 16:48 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 16:47 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1113.eqiad.wmnet with OS bullseye
  • 16:47 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 16:46 Emperor: repool moss-fe2001 with new envoy TLS setup T317616
  • 16:45 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 16:44 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe2001.codfw.wmnet with OS bullseye
  • 16:43 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs2004.codfw.wmnet with OS bullseye
  • 16:42 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 16:42 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 16:42 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs2003.codfw.wmnet
  • 16:42 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for aqs2003.codfw.wmnet
  • 16:41 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 16:40 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2003.codfw.wmnet with OS bullseye
  • 16:38 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe1014.eqiad.wmnet with reason: host reimage
  • 16:35 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe1014.eqiad.wmnet with reason: host reimage
  • 16:34 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe2001.codfw.wmnet with reason: host reimage
  • 16:31 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe2001.codfw.wmnet with reason: host reimage
  • 16:31 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 16:31 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: mediabackup::worker
  • 16:30 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2001.codfw.wmnet with OS bullseye
  • 16:28 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1113.eqiad.wmnet with reason: host reimage
  • 16:25 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1113.eqiad.wmnet with reason: host reimage
  • 16:24 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: mediabackup::worker
  • 16:24 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: mediabackup::storage
  • 16:20 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-fe2001.codfw.wmnet with OS bullseye
  • 16:18 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2003.codfw.wmnet with reason: host reimage
  • 16:18 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1014.eqiad.wmnet with OS bullseye
  • 16:16 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 16:16 hnowlan@deploy2002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 16:16 hnowlan@deploy2002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 16:16 hnowlan@deploy2002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 16:16 hnowlan@deploy2002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 16:16 hnowlan@deploy2002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 16:16 hnowlan@deploy2002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 16:16 Emperor: depool ms-fe1014 to reimage with new envoy TLS setup T317616
  • 16:15 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2003.codfw.wmnet with reason: host reimage
  • 16:15 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 16:15 hnowlan@deploy2002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 16:15 hnowlan@deploy2002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 16:14 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 16:14 Emperor: repool moss-fe1001 with new envoy TLS setup T317616
  • 16:14 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: mediabackup::storage
  • 16:13 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 16:13 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:13 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:12 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe2001.codfw.wmnet with reason: host reimage
  • 16:12 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: dbbackups::metadata
  • 16:09 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe2001.codfw.wmnet with reason: host reimage
  • 16:09 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 16:08 sukhe: enable Puppet on A:lvs to merge CR 976312 and run agent
  • 16:08 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 16:07 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 16:05 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: dbbackups::metadata
  • 16:05 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: dbbackups::content
  • 16:05 sukhe: disable Puppet on A:lvs to merge CR 976312
  • 16:05 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 16:02 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:02 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:02 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs2003.codfw.wmnet with OS bullseye
  • 16:01 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aqs2003.codfw.wmnet with OS bullseye
  • 15:59 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: dbbackups::content
  • 15:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe1001.eqiad.wmnet with OS bullseye
  • 15:57 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: backup::production
  • 15:55 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2001.codfw.wmnet with OS bullseye
  • 15:55 moritzm: installing dpkg bugfix updates on bullseye
  • 15:55 fabfur@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1113
  • 15:54 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-fe2001.codfw.wmnet with OS bullseye
  • 15:53 fabfur@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1113
  • 15:53 fabfur@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:52 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding cp1113 back with correct VLAN - fabfur@cumin1001"
  • 15:52 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding cp1113 back with correct VLAN - fabfur@cumin1001"
  • 15:48 fabfur@cumin1001: START - Cookbook sre.dns.netbox
  • 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host testreduce1002.eqiad.wmnet
  • 15:46 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 15:46 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe2001.codfw.wmnet with reason: host reimage
  • 15:46 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 15:45 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs2003.codfw.wmnet with OS bullseye
  • 15:45 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs2002.codfw.wmnet
  • 15:45 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: backup::production
  • 15:45 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for aqs2002.codfw.wmnet
  • 15:44 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 15:43 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe1001.eqiad.wmnet with reason: host reimage
  • 15:43 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: backup::es
  • 15:43 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 15:42 volans: uploaded spicerack_8.2.0 to apt.wikimedia.org bullseye-wikimedia
  • 15:42 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe2001.codfw.wmnet with reason: host reimage
  • 15:40 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe1001.eqiad.wmnet with reason: host reimage
  • 15:38 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 15:38 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1175.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:36 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: backup::es
  • 15:36 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2002.codfw.wmnet with OS bullseye
  • 15:35 kamila@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 15:35 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: backup::databases
  • 15:33 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host testreduce1002.eqiad.wmnet
  • 15:31 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:31 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1175 - jclark@cumin1001"
  • 15:30 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1175 - jclark@cumin1001"
  • 15:30 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: backup::databases
  • 15:28 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 15:28 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host moss-fe1001.eqiad.wmnet with OS bullseye
  • 15:28 mvernon@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host moss-fe1001.eqiad.wmnet with OS bullseye
  • 15:28 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2001.codfw.wmnet with OS bullseye
  • 15:28 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-fe2001.codfw.wmnet with OS bullseye
  • 15:23 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe2001.codfw.wmnet with reason: host reimage
  • 15:20 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe1001.eqiad.wmnet with reason: host reimage
  • 15:19 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe2001.codfw.wmnet with reason: host reimage
  • 15:17 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe1001.eqiad.wmnet with reason: host reimage
  • 15:15 moritzm: installing python3.7 security updates
  • 15:14 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2002.codfw.wmnet with reason: host reimage
  • 15:11 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2002.codfw.wmnet with reason: host reimage
  • 15:07 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host wdqs2008.codfw.wmnet
  • 15:07 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp1113.eqiad.wmnet
  • 15:06 fabfur@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:06 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp1113.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - fabfur@cumin1001"
  • 15:04 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2001.codfw.wmnet with OS bullseye
  • 15:04 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host moss-fe1001.eqiad.wmnet with OS bullseye
  • 15:04 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp1113.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - fabfur@cumin1001"
  • 15:03 Emperor: depool moss-fe1001 to reimage with new envoy TLS setup T317616
  • 15:02 Emperor: depool moss-fe2001 to reimage with new envoy TLS setup T317616
  • 15:01 fabfur@cumin1001: START - Cookbook sre.dns.netbox
  • 15:00 jayme: uncordoned and repooled kubernetes1013
  • 15:00 Emperor: repool ms-fe1009 with new envoy TLS setup T317616
  • 14:59 Emperor: repool ms-fe2009 with new envoy TLS setup T317616
  • 14:59 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host wdqs2008.codfw.wmnet
  • 14:57 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs2002.codfw.wmnet with OS bullseye
  • 14:57 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2009.codfw.wmnet with OS bullseye
  • 14:56 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs2001.codfw.wmnet with OS bullseye
  • 14:54 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1009.eqiad.wmnet with OS bullseye
  • 14:50 fabfur@cumin1001: START - Cookbook sre.hosts.decommission for hosts cp1113.eqiad.wmnet
  • 14:41 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 14:38 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs2001.codfw.wmnet with reason: host reimage
  • 14:35 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe1009.eqiad.wmnet with reason: host reimage
  • 14:35 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-fe2009.codfw.wmnet with reason: host reimage
  • 14:35 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2009.codfw.wmnet with reason: host reimage
  • 14:34 fabfur: start re-provisioning and re-imaging cp1113 to fix wrong subnet (T342159)
  • 14:34 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs2001.codfw.wmnet with reason: host reimage
  • 14:32 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe1009.eqiad.wmnet with reason: host reimage
  • 14:31 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 14:30 urandom: restarting Cassandra, sessionstore2001 (post-Puppet 7 migration)
  • 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host sessionstore2001.codfw.wmnet
  • 14:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2179 (T348183)', diff saved to https://phabricator.wikimedia.org/P53726 and previous config saved to /var/cache/conftool/dbconfig/20231122-142312-arnaudb.json
  • 14:23 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 14:23 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 14:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T348183)', diff saved to https://phabricator.wikimedia.org/P53725 and previous config saved to /var/cache/conftool/dbconfig/20231122-142301-arnaudb.json
  • 14:21 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2009.codfw.wmnet with OS bullseye
  • 14:20 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1009.eqiad.wmnet with OS bullseye
  • 14:19 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs2001.codfw.wmnet with OS bullseye
  • 14:19 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs2001.codfw.wmnet
  • 14:19 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for aqs2001.codfw.wmnet
  • 14:19 Emperor: depool ms-fe2009 to reimage with new envoy TLS setup T317616
  • 14:19 Emperor: depool ms-fe1009 to reimage with new envoy TLS setup T317616
  • 14:14 Emperor: repool ms-fe2010 with new envoy TLS setup T317616
  • 14:14 Emperor: repool ms-fe1010 with new envoy TLS setup T317616
  • 14:12 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host sessionstore2001.codfw.wmnet
  • 14:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P53724 and previous config saved to /var/cache/conftool/dbconfig/20231122-140754-arnaudb.json
  • 14:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2010.codfw.wmnet with OS bullseye
  • 14:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash2001.codfw.wmnet
  • 13:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1010.eqiad.wmnet with OS bullseye
  • 13:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P53723 and previous config saved to /var/cache/conftool/dbconfig/20231122-135248-arnaudb.json
  • 13:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host logstash2001.codfw.wmnet
  • 13:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host logstash2023.codfw.wmnet
  • 13:47 jayme@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host mw1475.eqiad.wmnet
  • 13:47 jayme@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host mw1474.eqiad.wmnet
  • 13:47 jayme@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host mw1472.eqiad.wmnet
  • 13:47 jayme@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host mw1473.eqiad.wmnet
  • 13:47 jayme@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host mw2425.codfw.wmnet
  • 13:47 jayme@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host mw2431.codfw.wmnet
  • 13:47 jayme@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host mw2421.codfw.wmnet
  • 13:47 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe2010.codfw.wmnet with reason: host reimage
  • 13:44 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe1010.eqiad.wmnet with reason: host reimage
  • 13:43 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2010.codfw.wmnet with reason: host reimage
  • 13:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host logstash2023.codfw.wmnet
  • 13:41 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe1010.eqiad.wmnet with reason: host reimage
  • 13:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T348183)', diff saved to https://phabricator.wikimedia.org/P53722 and previous config saved to /var/cache/conftool/dbconfig/20231122-133741-arnaudb.json
  • 13:29 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2010.codfw.wmnet with OS bullseye
  • 13:29 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1010.eqiad.wmnet with OS bullseye
  • 13:27 Emperor: depool ms-fe2010 to reimage with new envoy TLS setup T317616
  • 13:27 Emperor: depool ms-fe1010 to reimage with new envoy TLS setup T317616
  • 13:23 Emperor: repool ms-fe2011 with new envoy TLS setup T317616
  • 13:22 Emperor: repool ms-fe1011 with new envoy TLS setup T317616
  • 13:21 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2011.codfw.wmnet with OS bullseye
  • 13:17 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1011.eqiad.wmnet with OS bullseye
  • 13:12 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host logstash2023.codfw.wmnet
  • 13:05 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe2011.codfw.wmnet with reason: host reimage
  • 13:03 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe1011.eqiad.wmnet with reason: host reimage
  • 13:02 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2011.codfw.wmnet with reason: host reimage
  • 13:02 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 13:02 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 13:01 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host logstash2023.codfw.wmnet
  • 13:01 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 13:01 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 13:01 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host logstash2001.codfw.wmnet
  • 13:00 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 13:00 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 12:59 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe1011.eqiad.wmnet with reason: host reimage
  • 12:59 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 12:59 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb1001.eqiad.wmnet
  • 12:59 claime: Raising mw-web and mw-api-ext replicas for traffic bump - T348122
  • 12:58 jayme@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host mw2420.codfw.wmnet
  • 12:55 jayme@cumin1001: START - Cookbook sre.puppet.migrate-host for host mw1474.eqiad.wmnet
  • 12:55 jayme@cumin1001: START - Cookbook sre.puppet.migrate-host for host mw1475.eqiad.wmnet
  • 12:54 jayme@cumin1001: START - Cookbook sre.puppet.migrate-host for host mw1473.eqiad.wmnet
  • 12:54 jayme@cumin1001: START - Cookbook sre.puppet.migrate-host for host mw1472.eqiad.wmnet
  • 12:53 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1001.eqiad.wmnet with OS bullseye
  • 12:52 jayme@cumin1001: START - Cookbook sre.puppet.migrate-host for host mw2431.codfw.wmnet
  • 12:52 jayme@cumin1001: START - Cookbook sre.puppet.migrate-host for host mw2425.codfw.wmnet
  • 12:51 jayme@cumin1001: START - Cookbook sre.puppet.migrate-host for host mw2421.codfw.wmnet
  • 12:50 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host logstash2001.codfw.wmnet
  • 12:49 jayme@cumin1001: START - Cookbook sre.puppet.migrate-host for host mw2420.codfw.wmnet
  • 12:48 taavi@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudlb1001.eqiad.wmnet
  • 12:47 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2011.codfw.wmnet with OS bullseye
  • 12:47 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1011.eqiad.wmnet with OS bullseye
  • 12:45 Emperor: depool ms-fe1011 to reimage with new envoy TLS setup T317616
  • 12:45 Emperor: depool ms-fe2011 to reimage with new envoy TLS setup T317616
  • 12:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host mc-gp1001.eqiad.wmnet
  • 12:35 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1001.eqiad.wmnet with reason: host reimage
  • 12:32 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1001.eqiad.wmnet with reason: host reimage
  • 12:31 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host mc-gp1001.eqiad.wmnet
  • 12:27 taavi@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudlb1002.eqiad.wmnet
  • 12:19 brouberol@cumin1001: START - Cookbook sre.hosts.reimage for host an-druid1001.eqiad.wmnet with OS bullseye
  • 12:15 taavi@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudlb1002.eqiad.wmnet
  • 12:15 Emperor: repool ms-fe1012 with new envoy TLS setup T317616
  • 12:15 Emperor: repool ms-fe2012 with new envoy TLS setup T317616
  • 12:10 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2012.codfw.wmnet with OS bullseye
  • 12:05 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1012.eqiad.wmnet with OS bullseye
  • 11:56 hashar: Restarting Gerrit
  • 11:53 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe2012.codfw.wmnet with reason: host reimage
  • 11:50 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2012.codfw.wmnet with reason: host reimage
  • 11:50 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe1012.eqiad.wmnet with reason: host reimage
  • 11:47 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe1012.eqiad.wmnet with reason: host reimage
  • 11:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_cluster::postgresql
  • 11:35 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2012.codfw.wmnet with OS bullseye
  • 11:35 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1012.eqiad.wmnet with OS bullseye
  • 11:34 Emperor: depool ms-fe2012 to reimage with new envoy TLS setup T317616
  • 11:33 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: analytics_cluster::postgresql
  • 11:33 Emperor: depool ms-fe1012 to reimage with new envoy TLS setup T317616
  • 11:26 arnaudb@cumin1001: dbctl commit (dc=all): 'db2193 (re)pooling @ 100%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53719 and previous config saved to /var/cache/conftool/dbconfig/20231122-112649-arnaudb.json
  • 11:26 arnaudb@cumin1001: dbctl commit (dc=all): 'db2192 (re)pooling @ 100%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53718 and previous config saved to /var/cache/conftool/dbconfig/20231122-112641-arnaudb.json
  • 11:11 arnaudb@cumin1001: dbctl commit (dc=all): 'db2193 (re)pooling @ 90%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53717 and previous config saved to /var/cache/conftool/dbconfig/20231122-111144-arnaudb.json
  • 11:11 arnaudb@cumin1001: dbctl commit (dc=all): 'db2192 (re)pooling @ 90%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53716 and previous config saved to /var/cache/conftool/dbconfig/20231122-111136-arnaudb.json
  • 10:56 arnaudb@cumin1001: dbctl commit (dc=all): 'db2193 (re)pooling @ 80%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53715 and previous config saved to /var/cache/conftool/dbconfig/20231122-105639-arnaudb.json
  • 10:56 arnaudb@cumin1001: dbctl commit (dc=all): 'db2192 (re)pooling @ 80%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53714 and previous config saved to /var/cache/conftool/dbconfig/20231122-105631-arnaudb.json
  • 10:54 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host an-db1002.eqiad.wmnet
  • 10:46 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host an-db1002.eqiad.wmnet
  • 10:43 elukey@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Roll restart after change in the CA bundle - elukey@cumin1001
  • 10:41 arnaudb@cumin1001: dbctl commit (dc=all): 'db2193 (re)pooling @ 70%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53713 and previous config saved to /var/cache/conftool/dbconfig/20231122-104134-arnaudb.json
  • 10:41 arnaudb@cumin1001: dbctl commit (dc=all): 'db2192 (re)pooling @ 70%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53712 and previous config saved to /var/cache/conftool/dbconfig/20231122-104126-arnaudb.json
  • 10:33 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 10:33 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 10:33 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: swift::storage
  • 10:31 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 10:27 Emperor: repool ms-fe2013 with new envoy TLS setup T317616
  • 10:26 Emperor: repool ms-fe1013 with new envoy TLS setup T317616
  • 10:26 arnaudb@cumin1001: dbctl commit (dc=all): 'db2193 (re)pooling @ 60%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53711 and previous config saved to /var/cache/conftool/dbconfig/20231122-102629-arnaudb.json
  • 10:26 arnaudb@cumin1001: dbctl commit (dc=all): 'db2192 (re)pooling @ 60%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53710 and previous config saved to /var/cache/conftool/dbconfig/20231122-102621-arnaudb.json
  • 10:25 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 10:25 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Roll restart after change in the CA bundle - elukey@cumin1001
  • 10:25 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 10:25 elukey@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Roll restart after change in the CA bundle - elukey@cumin1001
  • 10:24 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@0cca675] (releasing): (no justification provided) (duration: 00m 40s)
  • 10:23 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@0cca675] (releasing): (no justification provided)
  • 10:21 vgutierrez@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs (T351069)
  • 10:11 arnaudb@cumin1001: dbctl commit (dc=all): 'db2193 (re)pooling @ 50%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53709 and previous config saved to /var/cache/conftool/dbconfig/20231122-101124-arnaudb.json
  • 10:11 arnaudb@cumin1001: dbctl commit (dc=all): 'db2192 (re)pooling @ 50%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53708 and previous config saved to /var/cache/conftool/dbconfig/20231122-101116-arnaudb.json
  • 10:07 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Roll restart after change in the CA bundle - elukey@cumin1001
  • 10:02 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: swift::storage
  • 09:56 arnaudb@cumin1001: dbctl commit (dc=all): 'db2193 (re)pooling @ 40%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53707 and previous config saved to /var/cache/conftool/dbconfig/20231122-095619-arnaudb.json
  • 09:56 arnaudb@cumin1001: dbctl commit (dc=all): 'db2192 (re)pooling @ 40%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53706 and previous config saved to /var/cache/conftool/dbconfig/20231122-095611-arnaudb.json
  • 09:53 vgutierrez@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs (T351069)
  • 09:53 vgutierrez: rolling restart of pybal to catch up on a NOOP config update - T351069
  • 09:51 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host wcqs2001.codfw.wmnet
  • 09:49 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1013.eqiad.wmnet with OS bullseye
  • 09:47 elukey: Update of the profile::base::certificate's CA bundle fleet wide (https://gerrit.wikimedia.org/r/c/operations/puppet/+/976659)
  • 09:41 arnaudb@cumin1001: dbctl commit (dc=all): 'db2193 (re)pooling @ 30%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53705 and previous config saved to /var/cache/conftool/dbconfig/20231122-094114-arnaudb.json
  • 09:41 arnaudb@cumin1001: dbctl commit (dc=all): 'db2192 (re)pooling @ 30%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53704 and previous config saved to /var/cache/conftool/dbconfig/20231122-094106-arnaudb.json
  • 09:40 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: kafka::main
  • 09:37 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe1013.eqiad.wmnet with reason: host reimage
  • 09:35 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host wcqs2001.codfw.wmnet
  • 09:34 elukey@cumin1001: START - Cookbook sre.puppet.migrate-role for role: kafka::main
  • 09:34 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe1013.eqiad.wmnet with reason: host reimage
  • 09:30 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2013.codfw.wmnet with OS bullseye
  • 09:26 arnaudb@cumin1001: dbctl commit (dc=all): 'db2193 (re)pooling @ 20%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53703 and previous config saved to /var/cache/conftool/dbconfig/20231122-092609-arnaudb.json
  • 09:26 arnaudb@cumin1001: dbctl commit (dc=all): 'db2192 (re)pooling @ 20%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53702 and previous config saved to /var/cache/conftool/dbconfig/20231122-092601-arnaudb.json
  • 09:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe2013.codfw.wmnet with reason: host reimage
  • 09:14 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2013.codfw.wmnet with reason: host reimage
  • 09:11 arnaudb@cumin1001: dbctl commit (dc=all): 'db2193 (re)pooling @ 10%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53701 and previous config saved to /var/cache/conftool/dbconfig/20231122-091104-arnaudb.json
  • 09:10 arnaudb@cumin1001: dbctl commit (dc=all): 'db2192 (re)pooling @ 10%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53700 and previous config saved to /var/cache/conftool/dbconfig/20231122-091056-arnaudb.json
  • 09:10 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: gerrit
  • 09:05 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes2041.codfw.wmnet
  • 09:04 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubernetes2041.codfw.wmnet
  • 09:01 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: gerrit
  • 09:00 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2013.codfw.wmnet with OS bullseye
  • 09:00 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1013.eqiad.wmnet with OS bullseye
  • 08:59 Emperor: depool ms-fe1013 to reimage with new envoy TLS setup T317616
  • 08:59 Emperor: depool ms-fe2013 to reimage with new envoy TLS setup T317616
  • 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53699 and previous config saved to /var/cache/conftool/dbconfig/20231122-084943-root.json
  • 08:41 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: titan
  • 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53698 and previous config saved to /var/cache/conftool/dbconfig/20231122-083438-root.json
  • 08:32 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: titan
  • 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53697 and previous config saved to /var/cache/conftool/dbconfig/20231122-081933-root.json
  • 08:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T348183)', diff saved to https://phabricator.wikimedia.org/P53696 and previous config saved to /var/cache/conftool/dbconfig/20231122-081912-arnaudb.json
  • 08:19 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 08:18 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: logging::mediawiki::udp2log
  • 08:14 kartik@deploy2002: Finished scap: Backport for Enable Content/Section translation on some Wikipedias with potential to be supported with MinT (T345267) (duration: 08m 46s)
  • 08:09 kartik@deploy2002: kartik: Continuing with sync
  • 08:07 kartik@deploy2002: kartik: Backport for Enable Content/Section translation on some Wikipedias with potential to be supported with MinT (T345267) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:06 kartik@deploy2002: Started scap: Backport for Enable Content/Section translation on some Wikipedias with potential to be supported with MinT (T345267)
  • 08:04 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: logging::mediawiki::udp2log
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 25%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53695 and previous config saved to /var/cache/conftool/dbconfig/20231122-080428-root.json
  • 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 10%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53694 and previous config saved to /var/cache/conftool/dbconfig/20231122-074923-root.json
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 100%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53693 and previous config saved to /var/cache/conftool/dbconfig/20231122-072247-root.json
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 to test 10.4.32 T351283', diff saved to https://phabricator.wikimedia.org/P53692 and previous config saved to /var/cache/conftool/dbconfig/20231122-071911-root.json
  • 07:07 marostegui@deploy2002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc2014 to pc2 master" (duration: 08m 10s)
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 75%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53691 and previous config saved to /var/cache/conftool/dbconfig/20231122-070742-root.json
  • 07:02 marostegui@deploy2002: marostegui: Continuing with sync
  • 07:01 marostegui@deploy2002: marostegui: Backport for Revert "ProductionServices.php: Promote pc2014 to pc2 master" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 06:59 marostegui@deploy2002: Started scap: Backport for Revert "ProductionServices.php: Promote pc2014 to pc2 master"
  • 06:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2012.codfw.wmnet with OS bookworm
  • 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 50%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53690 and previous config saved to /var/cache/conftool/dbconfig/20231122-065238-root.json
  • 06:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc2012.codfw.wmnet with reason: host reimage
  • 06:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2012.codfw.wmnet with reason: host reimage
  • 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 25%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53689 and previous config saved to /var/cache/conftool/dbconfig/20231122-063733-root.json
  • 06:23 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc2012.codfw.wmnet with OS bookworm
  • 06:22 marostegui@deploy2002: Finished scap: Backport for ProductionServices.php: Promote pc2014 to pc2 master (T351620) (duration: 07m 28s)
  • 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 10%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53688 and previous config saved to /var/cache/conftool/dbconfig/20231122-062228-root.json
  • 06:17 marostegui@deploy2002: marostegui: Continuing with sync
  • 06:16 marostegui@deploy2002: marostegui: Backport for ProductionServices.php: Promote pc2014 to pc2 master (T351620) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 06:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc[2012,2014].codfw.wmnet,pc[1012,1014].eqiad.wmnet with reason: Switch
  • 06:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc[2012,2014].codfw.wmnet,pc[1012,1014].eqiad.wmnet with reason: Switch
  • 06:15 marostegui@deploy2002: Started scap: Backport for ProductionServices.php: Promote pc2014 to pc2 master (T351620)
  • 04:25 eileen: civicrm upgraded from 43d191c8 to 3c5db93b
  • 02:01 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1158.eqiad.wmnet with OS bullseye
  • 01:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['an-worker1158']
  • 01:21 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1158']
  • 00:55 tstarling@deploy2002: Synchronized wmf-config/CommonSettings.php: enable LoginNotify seen subnets table g965663 T346989 (duration: 06m 23s)
  • 00:41 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1158.eqiad.wmnet with OS bullseye

2023-11-21

  • 23:05 vriley@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:05 vriley@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: message - vriley@cumin1001"
  • 23:04 vriley@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: message - vriley@cumin1001"
  • 23:02 vriley@cumin1001: START - Cookbook sre.dns.netbox
  • 23:02 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1036
  • 23:01 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1038
  • 23:01 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1037
  • 23:00 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1038
  • 23:00 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1037
  • 23:00 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1036
  • 22:59 vriley@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:59 vriley@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: message - vriley@cumin1001"
  • 22:58 vriley@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: message - vriley@cumin1001"
  • 22:54 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1021.eqiad.wmnet
  • 22:54 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for aqs1021.eqiad.wmnet
  • 22:53 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1021.eqiad.wmnet with OS bullseye
  • 22:48 vriley@cumin1001: START - Cookbook sre.dns.netbox
  • 22:47 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1035
  • 22:46 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1035
  • 22:32 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1021.eqiad.wmnet with reason: host reimage
  • 22:30 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T348183)', diff saved to https://phabricator.wikimedia.org/P53687 and previous config saved to /var/cache/conftool/dbconfig/20231121-223053-arnaudb.json
  • 22:29 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1021.eqiad.wmnet with reason: host reimage
  • 22:23 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1038.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:20 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:18 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:18 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1021.eqiad.wmnet with OS bullseye
  • 22:17 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host aqs1021.eqiad.wmnet with OS bullseye
  • 22:15 vriley@cumin1001: START - Cookbook sre.hosts.provision for host ganeti1038.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P53686 and previous config saved to /var/cache/conftool/dbconfig/20231121-221547-arnaudb.json
  • 22:15 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:12 vriley@cumin1001: START - Cookbook sre.hosts.provision for host ganeti1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:11 vriley@cumin1001: START - Cookbook sre.hosts.provision for host ganeti1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:08 vriley@cumin1001: START - Cookbook sre.hosts.provision for host ganeti1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:06 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1021.eqiad.wmnet with OS bullseye
  • 22:02 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1020.eqiad.wmnet with OS bullseye
  • 22:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P53685 and previous config saved to /var/cache/conftool/dbconfig/20231121-220040-arnaudb.json
  • 21:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T348183)', diff saved to https://phabricator.wikimedia.org/P53684 and previous config saved to /var/cache/conftool/dbconfig/20231121-214534-arnaudb.json
  • 21:40 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1020.eqiad.wmnet with reason: host reimage
  • 21:37 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1020.eqiad.wmnet with reason: host reimage
  • 21:35 catrope@deploy2002: Sync cancelled.
  • 21:31 catrope@deploy2002: ssastry and catrope: Backport for ParserOutputPostCacheTransform: Don't reprocess content (T351461) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:29 catrope@deploy2002: Started scap: Backport for ParserOutputPostCacheTransform: Don't reprocess content (T351461)
  • 21:29 catrope@deploy2002: Finished scap: Backport for [parser] Broaden TOC placeholder regular expression (duration: 12m 40s)
  • 21:23 catrope@deploy2002: catrope and ssastry: Continuing with sync
  • 21:18 catrope@deploy2002: catrope and ssastry: Backport for [parser] Broaden TOC placeholder regular expression synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:16 catrope@deploy2002: Started scap: Backport for [parser] Broaden TOC placeholder regular expression
  • 21:15 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1020.eqiad.wmnet with OS bullseye
  • 21:14 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1019.eqiad.wmnet with OS bullseye
  • 20:55 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1019.eqiad.wmnet with reason: host reimage
  • 20:53 mutante: gerrit1003 - deleted /root/backup_of_srv_gerrit_plugins - disk usage down to 56% (T351658)
  • 20:52 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1019.eqiad.wmnet with reason: host reimage
  • 20:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2179 (T348183)', diff saved to https://phabricator.wikimedia.org/P53683 and previous config saved to /var/cache/conftool/dbconfig/20231121-204501-arnaudb.json
  • 20:44 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 20:44 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 20:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T348183)', diff saved to https://phabricator.wikimedia.org/P53682 and previous config saved to /var/cache/conftool/dbconfig/20231121-204440-arnaudb.json
  • 20:41 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1019.eqiad.wmnet with OS bullseye
  • 20:41 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aqs1019.eqiad.wmnet with OS bullseye
  • 20:32 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1019.eqiad.wmnet with OS bullseye
  • 20:31 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1018.eqiad.wmnet
  • 20:31 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for aqs1018.eqiad.wmnet
  • 20:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P53681 and previous config saved to /var/cache/conftool/dbconfig/20231121-202933-arnaudb.json
  • 20:27 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1018.eqiad.wmnet with OS bullseye
  • 20:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P53680 and previous config saved to /var/cache/conftool/dbconfig/20231121-201427-arnaudb.json
  • 20:05 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1018.eqiad.wmnet with reason: host reimage
  • 20:02 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1018.eqiad.wmnet with reason: host reimage
  • 19:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T348183)', diff saved to https://phabricator.wikimedia.org/P53679 and previous config saved to /var/cache/conftool/dbconfig/20231121-195920-arnaudb.json
  • 19:58 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1158.eqiad.wmnet with OS bullseye
  • 19:51 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1018.eqiad.wmnet with OS bullseye
  • 19:49 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1017.eqiad.wmnet with OS bullseye
  • 19:38 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1157.eqiad.wmnet with OS bullseye
  • 19:28 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1017.eqiad.wmnet with reason: host reimage
  • 19:26 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1017.eqiad.wmnet with reason: host reimage
  • 19:11 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1017.eqiad.wmnet with OS bullseye
  • 19:08 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1016.eqiad.wmnet with OS bullseye
  • 18:42 ladsgroup@deploy2002: Finished scap: Backport for Undeploy DoubleWiki, Part III (T351675) (duration: 25m 41s)
  • 18:38 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1158.eqiad.wmnet with OS bullseye
  • 18:30 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 18:29 ladsgroup@deploy2002: ladsgroup: Backport for Undeploy DoubleWiki, Part III (T351675) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 18:18 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1157.eqiad.wmnet with OS bullseye
  • 18:16 ladsgroup@deploy2002: Started scap: Backport for Undeploy DoubleWiki, Part III (T351675)
  • 18:15 jynus: restart of bacula-sd on backup1009 T351725
  • 18:13 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1158']
  • 18:13 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1158']
  • 18:11 ladsgroup@deploy2002: Finished scap: Backport for Undeploy DoubleWiki, Part II (T351675) (duration: 08m 24s)
  • 18:06 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 18:04 ladsgroup@deploy2002: ladsgroup: Backport for Undeploy DoubleWiki, Part II (T351675) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 18:03 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:03 ladsgroup@deploy2002: Started scap: Backport for Undeploy DoubleWiki, Part II (T351675)
  • 18:02 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 18:02 ladsgroup@deploy2002: Finished scap: Backport for Undeploy DoubleWiki, Part I (T351675) (duration: 08m 27s)
  • 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating restbase servers in codfw - jhancock@cumin2002"
  • 17:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating restbase servers in codfw - jhancock@cumin2002"
  • 17:56 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 17:56 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 17:54 ladsgroup@deploy2002: ladsgroup: Backport for Undeploy DoubleWiki, Part I (T351675) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 17:54 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1174']
  • 17:53 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1174']
  • 17:53 ladsgroup@deploy2002: Started scap: Backport for Undeploy DoubleWiki, Part I (T351675)
  • 17:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1173']
  • 17:46 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1170']
  • 17:44 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1169']
  • 17:44 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1168']
  • 17:44 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1167']
  • 17:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1174']
  • 17:43 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1172']
  • 17:42 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1173']
  • 17:42 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1172']
  • 17:42 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1171']
  • 17:42 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1171']
  • 17:40 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1172']
  • 17:40 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1172']
  • 17:40 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1166']
  • 17:40 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1171']
  • 17:40 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1171']
  • 17:40 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1171']
  • 17:40 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1171']
  • 17:40 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1170']
  • 17:39 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1164']
  • 17:39 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1165']
  • 17:38 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1169']
  • 17:38 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1168']
  • 17:38 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1167']
  • 17:38 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1160']
  • 17:38 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1159']
  • 17:37 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1163']
  • 17:33 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1166']
  • 17:33 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1165']
  • 17:32 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1164']
  • 17:32 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1162']
  • 17:32 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1161']
  • 17:32 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1163']
  • 17:32 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1162']
  • 17:32 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1161']
  • 17:31 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1160']
  • 17:31 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1159']
  • 17:31 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1158']
  • 17:30 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1171.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1174.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1170.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1173.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1172.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1169.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:29 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1173.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:29 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1174.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:29 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1171.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:29 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1170.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:28 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1169.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:28 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1172.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1166.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1164.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1168.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1163.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1167.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1165.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:26 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1168.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:26 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1167.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:26 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1166.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:26 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1165.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:26 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1164.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:25 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1163.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:25 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1161.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:25 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1162.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:25 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1160.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:25 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1158.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:25 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1159.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:24 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1157.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:24 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1162.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:24 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1161.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:24 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1160.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:24 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1159.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:23 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1158.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:23 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1157.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:22 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1160']
  • 17:22 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1160']
  • 17:22 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1160']
  • 17:22 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1159']
  • 17:21 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1160']
  • 17:21 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1159']
  • 17:21 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1158']
  • 17:21 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['an-worker1173']
  • 17:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['an-worker1170']
  • 17:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['an-worker1169']
  • 17:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['an-worker1168']
  • 17:19 ejegg: fundraising civicrm upgraded from 3a8558e7 to 43d191c8
  • 17:16 ejegg: payments-wiki upgraded from 56790715 to 714552c5
  • 17:16 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1172']
  • 17:16 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1171']
  • 17:15 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1172']
  • 17:15 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1171']
  • 17:15 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1172']
  • 17:14 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1171']
  • 17:14 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1174']
  • 17:14 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1173']
  • 17:14 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1172']
  • 17:14 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1171']
  • 17:13 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1170']
  • 17:13 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1169']
  • 17:13 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1168']
  • 17:12 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1158']
  • 17:01 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:01 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:00 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:47 klausman@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 16:41 vgutierrez@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs[1017-1019].eqiad.wmnet} and A:lvs (T351069)
  • 16:38 klausman@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 16:36 vgutierrez@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs[1017-1019].eqiad.wmnet} and A:lvs (T351069)
  • 16:34 vgutierrez@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1020.eqiad.wmnet} and A:lvs (T351069)
  • 16:34 vgutierrez@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1020.eqiad.wmnet} and A:lvs (T351069)
  • 16:33 vgutierrez: updating pybal to 1.5.14 on eqiad - T351069
  • 16:11 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host kafka-main1001.eqiad.wmnet
  • 16:07 elukey@cumin1001: START - Cookbook sre.puppet.migrate-host for host kafka-main1001.eqiad.wmnet
  • 16:05 sukhe@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1113
  • 16:05 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp1113
  • 16:03 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:03 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: allocate cloud-private svc ips to wiki replicas - taavi@cumin1001"
  • 16:02 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: allocate cloud-private svc ips to wiki replicas - taavi@cumin1001"
  • 15:59 taavi@cumin1001: START - Cookbook sre.dns.netbox
  • 15:55 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1168.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:51 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: insetup::wmcs
  • 15:47 fabfur: repooled cp1088
  • 15:44 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: insetup::wmcs
  • 15:43 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wmcs::cloudgw
  • 15:37 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::cloudgw
  • 15:34 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wmcs::cloudlb
  • 15:27 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1168.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:26 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
  • 15:25 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::cloudlb
  • 15:24 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wmcs::db::wikireplicas::analytics_multiinstance
  • 15:23 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1016.eqiad.wmnet with reason: host reimage
  • 15:21 fabfur: depooled cp1113
  • 15:14 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::db::wikireplicas::analytics_multiinstance
  • 15:13 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wmcs::db::wikireplicas::web_multiinstance
  • 15:12 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1016.eqiad.wmnet with OS bullseye
  • 15:07 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::db::wikireplicas::web_multiinstance
  • 15:06 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: dumps::distribution::server
  • 15:05 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1015.eqiad.wmnet
  • 15:04 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for aqs1015.eqiad.wmnet
  • 15:04 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1015.eqiad.wmnet with OS bullseye
  • 15:00 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: dumps::distribution::server
  • 14:57 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: cluster::cloud_management
  • 14:53 hnowlan@deploy2002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 14:53 hnowlan@deploy2002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 14:52 hnowlan@deploy2002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 14:52 hnowlan@deploy2002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 14:52 hnowlan@deploy2002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 14:52 hnowlan@deploy2002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 14:52 hnowlan@deploy2002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 14:52 hnowlan@deploy2002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 14:49 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: cluster::cloud_management
  • 14:45 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1015.eqiad.wmnet with reason: host reimage
  • 14:45 Lucas_WMDE: T350224 maintenance script finished (8m46s real time)
  • 14:44 fabfur: swapped cp1113 <-> cp1088 (T349244)
  • 14:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1174.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:43 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1173.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:43 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1015.eqiad.wmnet with reason: host reimage
  • 14:43 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1113.eqiad.wmnet
  • 14:43 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp1113.eqiad.wmnet
  • 14:42 fabfur: swapped cp1112 <-> cp1087 (T349244)
  • 14:41 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1171.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:39 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1112.eqiad.wmnet
  • 14:39 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp1112.eqiad.wmnet
  • 14:38 vgutierrez@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs[3008-3009].esams.wmnet} and A:lvs (T351069)
  • 14:36 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1172.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2172 (T348183)', diff saved to https://phabricator.wikimedia.org/P53676 and previous config saved to /var/cache/conftool/dbconfig/20231121-143640-arnaudb.json
  • 14:36 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 14:36 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 14:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T348183)', diff saved to https://phabricator.wikimedia.org/P53675 and previous config saved to /var/cache/conftool/dbconfig/20231121-143619-arnaudb.json
  • 14:36 Lucas_WMDE: START [in tmux] lucaswerkmeister-wmde@mwmaint2002:~$ mwscript Wikibase.Lexeme.Maintenance.FixPagePropsSortkey wikidatawiki --batch-size=1000 # T350224
  • 14:35 vgutierrez@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs[3008-3009].esams.wmnet} and A:lvs (T351069)
  • 14:34 vgutierrez@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs3010.esams.wmnet} and A:lvs (T351069)
  • 14:34 vgutierrez@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs3010.esams.wmnet} and A:lvs (T351069)
  • 14:32 vgutierrez: updating pybal to 1.5.14 on esams - T351069
  • 14:32 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1015.eqiad.wmnet with OS bullseye
  • 14:31 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1015.eqiad.wmnet
  • 14:31 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for aqs1015.eqiad.wmnet
  • 14:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1171.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P53674 and previous config saved to /var/cache/conftool/dbconfig/20231121-142112-arnaudb.json
  • 14:19 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1174.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:19 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1173.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:18 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1165.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:18 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1170.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:18 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1169.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:18 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1166.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:18 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1167.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:11 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for mc: Make it possible to use mcrouter server set by environment (T346690) (duration: 07m 09s)
  • 14:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs[6001-6002].drmrs.wmnet} and A:lvs (T351069)
  • 14:08 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1172.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:08 vgutierrez@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs[6001-6002].drmrs.wmnet} and A:lvs (T351069)
  • 14:08 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1171.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:07 vgutierrez@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs6003.drmrs.wmnet} and A:lvs (T351069)
  • 14:07 vgutierrez@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs6003.drmrs.wmnet} and A:lvs (T351069)
  • 14:07 godog: revert rsyslog upgrade on centrallog2002 - T351710
  • 14:06 vgutierrez: updating pybal to 1.5.14 on drmrs - T351069
  • 14:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P53673 and previous config saved to /var/cache/conftool/dbconfig/20231121-140606-arnaudb.json
  • 14:06 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1171.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:05 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and d3r1ck01: Continuing with sync
  • 14:05 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and d3r1ck01: Backport for mc: Make it possible to use mcrouter server set by environment (T346690) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:04 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for mc: Make it possible to use mcrouter server set by environment (T346690)
  • 13:58 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:57 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:56 klausman@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 13:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T348183)', diff saved to https://phabricator.wikimedia.org/P53672 and previous config saved to /var/cache/conftool/dbconfig/20231121-135059-arnaudb.json
  • 13:49 godog: test upgrade rsyslog on centrallog2002 - T351710
  • 13:38 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wmcs::openstack::eqiad1::virt_ceph
  • 13:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1168.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:32 Emperor: repool ms-fe2014 with new envoy TLS setup T317616
  • 13:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1170.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1169.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1168.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1167.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1166.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1165.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:22 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::openstack::eqiad1::virt_ceph
  • 13:22 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wmcs::openstack::eqiad1::virt
  • 13:14 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::openstack::eqiad1::virt
  • 13:13 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan2002.codfw.wmnet
  • 13:06 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host titan2002.codfw.wmnet
  • 13:05 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wmcs::openstack::eqiad1::services
  • 12:57 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:57 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker11 - jclark@cumin1001"
  • 12:56 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker11 - jclark@cumin1001"
  • 12:54 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 12:52 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::openstack::eqiad1::services
  • 12:49 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wmcs::openstack::eqiad1::rabbitmq
  • 12:42 awight@deploy2002: Finished scap: Backport for Revert "Revert "Enable Reference Previews on all wikis"" (duration: 08m 12s)
  • 12:40 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::openstack::eqiad1::rabbitmq
  • 12:38 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1115.eqiad.wmnet with OS bullseye
  • 12:36 awight@deploy2002: awight: Continuing with sync
  • 12:35 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wmcs::openstack::eqiad1::net
  • 12:35 awight@deploy2002: awight: Backport for Revert "Revert "Enable Reference Previews on all wikis"" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:34 awight@deploy2002: Started scap: Backport for Revert "Revert "Enable Reference Previews on all wikis""
  • 12:31 awight@deploy2002: Sync cancelled.
  • 12:28 awight@deploy2002: wmde-fisch and awight: Backport for Enable Reference Previews on all wikis (T282999) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:27 awight@deploy2002: Started scap: Backport for Enable Reference Previews on all wikis (T282999)
  • 12:25 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::openstack::eqiad1::net
  • 12:22 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wmcs::openstack::eqiad1::control
  • 12:21 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1115.eqiad.wmnet with reason: host reimage
  • 12:18 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1115.eqiad.wmnet with reason: host reimage
  • 12:09 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2014.codfw.wmnet with OS bullseye
  • 12:03 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS bullseye
  • 12:03 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1115.eqiad.wmnet with OS bullseye
  • 12:01 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::openstack::eqiad1::control
  • 11:59 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wmcs::openstack::eqiad1::cinder_backups
  • 11:56 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe2014.codfw.wmnet with reason: host reimage
  • 11:53 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS bullseye
  • 11:53 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2014.codfw.wmnet with reason: host reimage
  • 11:51 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::openstack::eqiad1::cinder_backups
  • 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host titan2002.codfw.wmnet
  • 11:37 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on kubernetes2041.codfw.wmnet with reason: NIC 1 Port 1 network link is down
  • 11:37 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on kubernetes2041.codfw.wmnet with reason: NIC 1 Port 1 network link is down
  • 11:35 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host titan2002.codfw.wmnet
  • 11:22 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2014.codfw.wmnet with OS bullseye
  • 11:21 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:21 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 11:20 Emperor: depool ms-fe2014 to reimage with new envoy TLS setup T317616
  • 11:13 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host mwlog2002.codfw.wmnet
  • 11:05 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host mwlog2002.codfw.wmnet
  • 11:00 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: gitlab_runner
  • 10:50 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: gitlab_runner
  • 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host gerrit2002.wikimedia.org
  • 10:35 jbond: upload new wmf-certificates packages
  • 10:25 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1003.eqiad.wmnet
  • 10:22 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host gerrit2002.wikimedia.org
  • 10:21 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 10:18 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1003.eqiad.wmnet
  • 10:11 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 10:10 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 10:10 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host gitlab-runner1002.eqiad.wmnet
  • 10:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 100%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53670 and previous config saved to /var/cache/conftool/dbconfig/20231121-100607-arnaudb.json
  • 10:05 arnaudb@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 100%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53669 and previous config saved to /var/cache/conftool/dbconfig/20231121-100536-arnaudb.json
  • 10:03 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 10:02 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 10:02 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 10:01 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 10:00 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host gitlab-runner1002.eqiad.wmnet
  • 10:00 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 09:53 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 09:51 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 09:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 90%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53667 and previous config saved to /var/cache/conftool/dbconfig/20231121-095102-arnaudb.json
  • 09:50 arnaudb@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 90%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53666 and previous config saved to /var/cache/conftool/dbconfig/20231121-095031-arnaudb.json
  • 09:35 arnaudb@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 75%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53665 and previous config saved to /var/cache/conftool/dbconfig/20231121-093557-arnaudb.json
  • 09:35 arnaudb@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 75%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53664 and previous config saved to /var/cache/conftool/dbconfig/20231121-093526-arnaudb.json
  • 09:24 vgutierrez@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs[2011-2013].codfw.wmnet} and A:lvs (T351069)
  • 09:20 arnaudb@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 60%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53663 and previous config saved to /var/cache/conftool/dbconfig/20231121-092052-arnaudb.json
  • 09:20 arnaudb@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 60%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53662 and previous config saved to /var/cache/conftool/dbconfig/20231121-092021-arnaudb.json
  • 09:19 vgutierrez@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs[2011-2013].codfw.wmnet} and A:lvs (T351069)
  • 09:18 vgutierrez@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs2014.codfw.wmnet} and A:lvs (T351069)
  • 09:18 vgutierrez@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs2014.codfw.wmnet} and A:lvs (T351069)
  • 09:17 vgutierrez: updating pybal to 1.5.14 on codfw - T351069
  • 09:17 vgutierrez@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs[5004-5005].eqsin.wmnet} and A:lvs (T351069)
  • 09:16 vgutierrez@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs[5004-5005].eqsin.wmnet} and A:lvs (T351069)
  • 09:15 vgutierrez@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs5006.eqsin.wmnet} and A:lvs (T351069)
  • 09:15 vgutierrez@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs5006.eqsin.wmnet} and A:lvs (T351069)
  • 09:14 vgutierrez: updating pybal to 1.5.14 on eqsin - T351069
  • 09:12 vgutierrez@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs[4008-4009].ulsfo.wmnet} and A:lvs (T351069)
  • 09:10 vgutierrez: updating pybal to 1.5.14 on ulsfo - T351069
  • 09:09 vgutierrez@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs[4008-4009].ulsfo.wmnet} and A:lvs (T351069)
  • 09:05 arnaudb@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 45%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53661 and previous config saved to /var/cache/conftool/dbconfig/20231121-090547-arnaudb.json
  • 09:05 arnaudb@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 45%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53660 and previous config saved to /var/cache/conftool/dbconfig/20231121-090516-arnaudb.json
  • 08:50 arnaudb@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 30%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53659 and previous config saved to /var/cache/conftool/dbconfig/20231121-085042-arnaudb.json
  • 08:50 arnaudb@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 30%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53658 and previous config saved to /var/cache/conftool/dbconfig/20231121-085011-arnaudb.json
  • 08:41 vgutierrez: updating pybal to 1.5.14 on lvs4010 - T351069
  • 08:38 awight: scap window cancelled due to k8s error
  • 08:37 awight@deploy2002: Finished scap: Backport for Revert "Enable Reference Previews on all wikis" (duration: 07m 08s)
  • 08:37 vgutierrez: upload pybal 1.15.14 to apt.wm.o (bullseye-wikimedia) - T348837
  • 08:35 arnaudb@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 15%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53657 and previous config saved to /var/cache/conftool/dbconfig/20231121-083537-arnaudb.json
  • 08:35 arnaudb@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 15%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53656 and previous config saved to /var/cache/conftool/dbconfig/20231121-083504-arnaudb.json
  • 08:32 awight@deploy2002: awight and trainbranchbot: Continuing with sync
  • 08:31 awight@deploy2002: awight and trainbranchbot: Backport for Revert "Enable Reference Previews on all wikis" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:31 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.migrate-role (exit_code=99) for role: ncredir
  • 08:30 awight@deploy2002: Started scap: Backport for Revert "Enable Reference Previews on all wikis"
  • 08:28 awight@deploy2002: Sync cancelled.
  • 08:28 awight@deploy2002: wmde-fisch and awight: Backport for Enable Reference Previews on all wikis (T282999) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:27 awight@deploy2002: Started scap: Backport for Enable Reference Previews on all wikis (T282999)
  • 08:24 awight@deploy2002: Finished scap: Backport for Enable Reference Previews on all wikis (T282999) (duration: 15m 02s)
  • 08:20 arnaudb@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 10%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53655 and previous config saved to /var/cache/conftool/dbconfig/20231121-082032-arnaudb.json
  • 08:20 arnaudb@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 10%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53654 and previous config saved to /var/cache/conftool/dbconfig/20231121-082000-arnaudb.json
  • 08:18 awight@deploy2002: awight and wmde-fisch: Continuing with sync
  • 08:16 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host druid1011.eqiad.wmnet with OS bullseye
  • 08:11 awight@deploy2002: awight and wmde-fisch: Backport for Enable Reference Previews on all wikis (T282999) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:09 awight@deploy2002: Started scap: Backport for Enable Reference Previews on all wikis (T282999)
  • 08:05 arnaudb@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 5%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53653 and previous config saved to /var/cache/conftool/dbconfig/20231121-080527-arnaudb.json
  • 08:04 arnaudb@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 5%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53652 and previous config saved to /var/cache/conftool/dbconfig/20231121-080455-arnaudb.json
  • 07:52 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1011.eqiad.wmnet with reason: host reimage
  • 07:50 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1011.eqiad.wmnet with reason: host reimage
  • 07:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1210.eqiad.wmnet with OS bookworm
  • 07:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1210.eqiad.wmnet with reason: host reimage
  • 07:25 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host druid1011.eqiad.wmnet with OS bullseye
  • 07:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1210.eqiad.wmnet with reason: host reimage
  • 07:24 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2155 (T348183)', diff saved to https://phabricator.wikimedia.org/P53651 and previous config saved to /var/cache/conftool/dbconfig/20231121-072424-arnaudb.json
  • 07:24 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 07:24 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 07:24 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 07:23 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 07:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T348183)', diff saved to https://phabricator.wikimedia.org/P53650 and previous config saved to /var/cache/conftool/dbconfig/20231121-072346-arnaudb.json
  • 07:12 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1210.eqiad.wmnet with OS bookworm
  • 07:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P53649 and previous config saved to /var/cache/conftool/dbconfig/20231121-070840-arnaudb.json
  • 07:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 33452
  • 07:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 33452
  • 06:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P53648 and previous config saved to /var/cache/conftool/dbconfig/20231121-065333-arnaudb.json
  • 06:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T348183)', diff saved to https://phabricator.wikimedia.org/P53647 and previous config saved to /var/cache/conftool/dbconfig/20231121-063827-arnaudb.json
  • 02:32 ejegg: fundraising civicrm upgraded from b3da5d3f to 3a8558e7
  • 01:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2147 (T348183)', diff saved to https://phabricator.wikimedia.org/P53646 and previous config saved to /var/cache/conftool/dbconfig/20231121-013514-arnaudb.json
  • 01:35 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 01:34 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance

2023-11-20

  • 22:46 catrope@deploy2002: Finished scap: Backport for [parsoid] Fix Parsoid relative links (T350952) (duration: 19m 32s)
  • 22:41 catrope@deploy2002: catrope and cscott: Continuing with sync
  • 22:28 catrope@deploy2002: catrope and cscott: Backport for [parsoid] Fix Parsoid relative links (T350952) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:27 catrope@deploy2002: Started scap: Backport for [parsoid] Fix Parsoid relative links (T350952)
  • 22:20 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:19 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update entries for cloud hosts. - cmooney@cumin1001"
  • 22:19 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update entries for cloud hosts. - cmooney@cumin1001"
  • 22:17 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 22:15 catrope@deploy2002: Finished scap: Backport for Revert "mw.notify: Limit width of overlay to max-width-page-container" (T349622) (duration: 17m 40s)
  • 22:09 catrope@deploy2002: jdlrobson and catrope: Continuing with sync
  • 21:59 catrope@deploy2002: jdlrobson and catrope: Backport for Revert "mw.notify: Limit width of overlay to max-width-page-container" (T349622) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:58 catrope@deploy2002: Started scap: Backport for Revert "mw.notify: Limit width of overlay to max-width-page-container" (T349622)
  • 21:38 catrope@deploy2002: Finished scap: Backport for Disable MobileFrontend AMC drawer temporarily while erroring (T351669) (duration: 22m 11s)
  • 21:32 catrope@deploy2002: catrope and jdlrobson: Continuing with sync
  • 21:17 catrope@deploy2002: catrope and jdlrobson: Backport for Disable MobileFrontend AMC drawer temporarily while erroring (T351669) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:16 catrope@deploy2002: Started scap: Backport for Disable MobileFrontend AMC drawer temporarily while erroring (T351669)
  • 21:12 catrope@deploy2002: Finished scap: Backport for Enable action blocks in ruwiki (T351048) (duration: 08m 52s)
  • 21:06 catrope@deploy2002: catrope and stjn: Continuing with sync
  • 21:05 catrope@deploy2002: catrope and stjn: Backport for Enable action blocks in ruwiki (T351048) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:03 catrope@deploy2002: Started scap: Backport for Enable action blocks in ruwiki (T351048)
  • 21:02 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1014.eqiad.wmnet
  • 21:02 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for aqs1014.eqiad.wmnet
  • 21:02 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1014.eqiad.wmnet with OS bullseye
  • 20:40 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1014.eqiad.wmnet with reason: host reimage
  • 20:37 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1014.eqiad.wmnet with reason: host reimage
  • 20:34 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 20:33 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 20:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T348183)', diff saved to https://phabricator.wikimedia.org/P53645 and previous config saved to /var/cache/conftool/dbconfig/20231120-203337-arnaudb.json
  • 20:21 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1014.eqiad.wmnet with OS bullseye
  • 20:21 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aqs1014.eqiad.wmnet with OS bullseye
  • 20:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P53644 and previous config saved to /var/cache/conftool/dbconfig/20231120-201831-arnaudb.json
  • 20:10 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1014.eqiad.wmnet with OS bullseye
  • 20:08 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1013.eqiad.wmnet with OS bullseye
  • 20:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P53643 and previous config saved to /var/cache/conftool/dbconfig/20231120-200324-arnaudb.json
  • 19:59 brett@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for acmechief2001.codfw.wmnet
  • 19:59 brett@cumin2002: START - Cookbook sre.hosts.remove-downtime for acmechief2001.codfw.wmnet
  • 19:50 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1013.eqiad.wmnet with reason: host reimage
  • 19:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T348183)', diff saved to https://phabricator.wikimedia.org/P53642 and previous config saved to /var/cache/conftool/dbconfig/20231120-194818-arnaudb.json
  • 19:48 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1013.eqiad.wmnet with reason: host reimage
  • 19:36 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1013.eqiad.wmnet with OS bullseye
  • 19:21 sukhe: pool cp4045.ulsfo.wmnet post reboot and puppet 7 upgrade
  • 19:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4045.ulsfo.wmnet
  • 19:05 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4045.ulsfo.wmnet
  • 19:04 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 19:03 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host acmechief2001.codfw.wmnet with OS bookworm
  • 19:03 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 19:02 sukhe: depool cp4045 for reboot
  • 18:59 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 18:59 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 18:59 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 18:59 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 18:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4045.ulsfo.wmnet
  • 18:48 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4045.ulsfo.wmnet
  • 18:44 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on acmechief2001.codfw.wmnet with reason: host reimage
  • 18:41 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on acmechief2001.codfw.wmnet with reason: host reimage
  • 18:39 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 18:38 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 18:37 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:37 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:27 brett@cumin1001: START - Cookbook sre.hosts.reimage for host acmechief2001.codfw.wmnet with OS bookworm
  • 18:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wikidough
  • 18:18 volans: installed spicerack v8.1.0 on the cumin hosts
  • 18:13 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: wikidough
  • 18:08 ebernhardson: start test backfill of 4 days of itwiki and frwiki edits to relforge from cirrus updater
  • 18:06 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:06 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:50 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.wikimedia.org with OS bullseye
  • 17:47 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 17:39 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:39 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ganeti1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:37 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 17:32 volans: uploaded spicerack_8.1.0 to apt.wikimedia.org bullseye-wikimedia
  • 17:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: durum
  • 17:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.wikimedia.org with reason: host reimage
  • 17:25 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.wikimedia.org with reason: host reimage
  • 17:18 hashar: Restarting Gerrit # T351658
  • 17:15 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: durum
  • 17:10 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.wikimedia.org with OS bullseye
  • 16:56 ladsgroup@deploy2002: Finished scap: Backport for Set pagelinks migration to read new in testwiki, fawikiquote, cebwiki (T351237) (duration: 10m 06s)
  • 16:51 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 16:48 ladsgroup@deploy2002: ladsgroup: Backport for Set pagelinks migration to read new in testwiki, fawikiquote, cebwiki (T351237) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:46 ladsgroup@deploy2002: Started scap: Backport for Set pagelinks migration to read new in testwiki, fawikiquote, cebwiki (T351237)
  • 16:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1210', diff saved to https://phabricator.wikimedia.org/P53638 and previous config saved to /var/cache/conftool/dbconfig/20231120-162648-root.json
  • 15:48 Lucas_WMDE: DONE Wikibase.Lexeme.Maintenance.FixPagePropsSortkey (T350224) in 1.079s real time :)
  • 15:48 fabfur: swapped cp1111 <-> cp1086 (T349244)
  • 15:48 Lucas_WMDE: START lucaswerkmeister-wmde@mwmaint2002:~$ mwscript Wikibase.Lexeme.Maintenance.FixPagePropsSortkey testwikidatawiki --batch-size=1000 # T350224
  • 15:47 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1111.eqiad.wmnet
  • 15:47 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp1111.eqiad.wmnet
  • 15:44 fabfur: swapped cp1110 <-> cp1085 (T349244)
  • 15:44 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1110.eqiad.wmnet
  • 15:42 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1010.wikimedia.org with OS bullseye
  • 14:48 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 14:39 urbanecm: UTC afternoon B&C window done
  • 14:37 urbanecm@deploy2002: Finished scap: Backport for EditGrowthConfig: Do not provide default for levelling up threshold when disabled (T351603), Add update.php maintenance script to fix pp_sortkey (T350224) (duration: 10m 28s)
  • 14:31 urbanecm@deploy2002: urbanecm and lucaswerkmeister-wmde and cyndywikime: Continuing with sync
  • 14:28 urbanecm@deploy2002: urbanecm and lucaswerkmeister-wmde and cyndywikime: Backport for EditGrowthConfig: Do not provide default for levelling up threshold when disabled (T351603), Add update.php maintenance script to fix pp_sortkey (T350224) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:26 urbanecm@deploy2002: Started scap: Backport for EditGrowthConfig: Do not provide default for levelling up threshold when disabled (T351603), Add update.php maintenance script to fix pp_sortkey (T350224)
  • 14:26 urbanecm@deploy2002: Finished scap: Backport for Update the list of ReferenceTooltip gadget names (T351314), Update the list of NavigationPopups gadget names (T351314) (duration: 09m 48s)
  • 14:22 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.wikimedia.org with OS bullseye
  • 14:20 urbanecm@deploy2002: urbanecm and wmde-fisch: Continuing with sync
  • 14:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3314 (T348183)', diff saved to https://phabricator.wikimedia.org/P53637 and previous config saved to /var/cache/conftool/dbconfig/20231120-141857-arnaudb.json
  • 14:18 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 14:18 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 14:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T348183)', diff saved to https://phabricator.wikimedia.org/P53636 and previous config saved to /var/cache/conftool/dbconfig/20231120-141835-arnaudb.json
  • 14:17 urbanecm@deploy2002: urbanecm and wmde-fisch: Backport for Update the list of ReferenceTooltip gadget names (T351314), Update the list of NavigationPopups gadget names (T351314) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:16 urbanecm@deploy2002: Started scap: Backport for Update the list of ReferenceTooltip gadget names (T351314), Update the list of NavigationPopups gadget names (T351314)
  • 14:13 arnaudb@cumin1001: dbctl commit (dc=all): 'prepare reboot of es2032 for T344589', diff saved to https://phabricator.wikimedia.org/P53635 and previous config saved to /var/cache/conftool/dbconfig/20231120-141312-arnaudb.json
  • 14:12 urbanecm@deploy2002: Finished scap: Backport for Set new $wgMicroStashType setting to "mcrouter-primary-dc" (T336004) (duration: 07m 06s)
  • 14:11 arnaudb@cumin1001: dbctl commit (dc=all): 'set es2028 as es1 master for T344589', diff saved to https://phabricator.wikimedia.org/P53634 and previous config saved to /var/cache/conftool/dbconfig/20231120-141131-arnaudb.json
  • 14:07 urbanecm@deploy2002: urbanecm and d3r1ck01: Continuing with sync
  • 14:06 urbanecm@deploy2002: urbanecm and d3r1ck01: Backport for Set new $wgMicroStashType setting to "mcrouter-primary-dc" (T336004) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:05 urbanecm@deploy2002: Started scap: Backport for Set new $wgMicroStashType setting to "mcrouter-primary-dc" (T336004)
  • 14:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P53633 and previous config saved to /var/cache/conftool/dbconfig/20231120-140329-arnaudb.json
  • 13:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P53632 and previous config saved to /var/cache/conftool/dbconfig/20231120-134822-arnaudb.json
  • 13:37 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2178.codfw.wmnet onto db2192.codfw.wmnet
  • 13:33 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host pc1014.eqiad.wmnet
  • 13:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T348183)', diff saved to https://phabricator.wikimedia.org/P53631 and previous config saved to /var/cache/conftool/dbconfig/20231120-133316-arnaudb.json
  • 13:30 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2180.codfw.wmnet onto db2193.codfw.wmnet
  • 13:25 jbond@cumin1001: START - Cookbook sre.puppet.migrate-host for host pc1014.eqiad.wmnet
  • 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 100%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53630 and previous config saved to /var/cache/conftool/dbconfig/20231120-125655-root.json
  • 12:48 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db2178.codfw.wmnet onto db2192.codfw.wmnet
  • 12:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db2178 in db2192 for T343674', diff saved to https://phabricator.wikimedia.org/P53629 and previous config saved to /var/cache/conftool/dbconfig/20231120-124522-arnaudb.json
  • 12:44 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2192.codfw.wmnet with reason: provisionning db2192.codfw.wmnet - T343674
  • 12:44 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2192.codfw.wmnet with reason: provisionning db2192.codfw.wmnet - T343674
  • 12:44 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: provisionning db2192.codfw.wmnet - T343674
  • 12:43 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: provisionning db2192.codfw.wmnet - T343674
  • 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 75%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53628 and previous config saved to /var/cache/conftool/dbconfig/20231120-124150-root.json
  • 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 50%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53627 and previous config saved to /var/cache/conftool/dbconfig/20231120-122645-root.json
  • 12:22 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db2180.codfw.wmnet onto db2193.codfw.wmnet
  • 12:22 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1143.eqiad.wmnet onto db1243.eqiad.wmnet
  • 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 25%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53625 and previous config saved to /var/cache/conftool/dbconfig/20231120-121140-root.json
  • 12:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db2180 in db2193 for T343674', diff saved to https://phabricator.wikimedia.org/P53624 and previous config saved to /var/cache/conftool/dbconfig/20231120-120743-arnaudb.json
  • 12:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2193.codfw.wmnet with reason: provisionning db2193.codfw.wmnet - T343674
  • 12:05 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2193.codfw.wmnet with reason: provisionning db2193.codfw.wmnet - T343674
  • 12:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: provisionning db2193.codfw.wmnet - T343674
  • 12:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: provisionning db2193.codfw.wmnet - T343674
  • 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 10%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53623 and previous config saved to /var/cache/conftool/dbconfig/20231120-115635-root.json
  • 11:48 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host druid1009.eqiad.wmnet with OS bullseye
  • 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1210', diff saved to https://phabricator.wikimedia.org/P53622 and previous config saved to /var/cache/conftool/dbconfig/20231120-113205-root.json
  • 11:26 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1009.eqiad.wmnet with reason: host reimage
  • 11:23 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1009.eqiad.wmnet with reason: host reimage
  • 11:21 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: ml_k8s::worker
  • 11:16 klausman@cumin1001: START - Cookbook sre.puppet.migrate-role for role: ml_k8s::worker
  • 11:07 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: ml_k8s::master
  • 11:00 klausman@cumin1001: START - Cookbook sre.puppet.migrate-role for role: ml_k8s::master
  • 10:58 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host druid1009.eqiad.wmnet with OS bullseye
  • 10:57 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: etcd::v3::ml_etcd
  • 10:56 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:56 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management records for ganeti103[5-8] - T349925 - volans@cumin1001"
  • 10:55 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management records for ganeti103[5-8] - T349925 - volans@cumin1001"
  • 10:52 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 10:50 klausman@cumin1001: START - Cookbook sre.puppet.migrate-role for role: etcd::v3::ml_etcd
  • 10:23 arnaudb@cumin1001: dbctl commit (dc=all): 'db1242 (re)pooling @ 100%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53620 and previous config saved to /var/cache/conftool/dbconfig/20231120-102327-arnaudb.json
  • 10:23 arnaudb@cumin1001: dbctl commit (dc=all): 'db1241 (re)pooling @ 100%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53619 and previous config saved to /var/cache/conftool/dbconfig/20231120-102303-arnaudb.json
  • 10:22 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 10:22 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 10:13 klausman@cumin1001: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for ml-serve1008.eqiad.wmnet: Renew puppet certificate - klausman@cumin1001
  • 10:13 klausman@cumin1001: START - Cookbook sre.puppet.renew-cert for ml-serve1008.eqiad.wmnet: Renew puppet certificate - klausman@cumin1001
  • 10:12 klausman@cumin1001: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for ml-serve1008.eqiad.wmnet: Renew puppet certificate - klausman@cumin1001
  • 10:12 klausman@cumin1001: START - Cookbook sre.puppet.renew-cert for ml-serve1008.eqiad.wmnet: Renew puppet certificate - klausman@cumin1001
  • 10:08 arnaudb@cumin1001: dbctl commit (dc=all): 'db1242 (re)pooling @ 90%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53618 and previous config saved to /var/cache/conftool/dbconfig/20231120-100823-arnaudb.json
  • 10:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db1241 (re)pooling @ 90%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53617 and previous config saved to /var/cache/conftool/dbconfig/20231120-100758-arnaudb.json
  • 10:05 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1143.eqiad.wmnet onto db1243.eqiad.wmnet
  • 10:02 arnaudb@cumin1001: dbctl commit (dc=all): 'T344036 add db1243', diff saved to https://phabricator.wikimedia.org/P53616 and previous config saved to /var/cache/conftool/dbconfig/20231120-100212-arnaudb.json
  • 10:01 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: provisionning db1243.eqiad.wmnet - T344036
  • 10:00 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: provisionning db1243.eqiad.wmnet - T344036
  • 10:00 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: provisionning db1243.eqiad.wmnet - T344036
  • 10:00 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: provisionning db1243.eqiad.wmnet - T344036
  • 09:53 arnaudb@cumin1001: dbctl commit (dc=all): 'db1242 (re)pooling @ 75%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53615 and previous config saved to /var/cache/conftool/dbconfig/20231120-095318-arnaudb.json
  • 09:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db1241 (re)pooling @ 75%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53614 and previous config saved to /var/cache/conftool/dbconfig/20231120-095253-arnaudb.json
  • 09:50 Emperor: restart swift_dispersion_stats on thanos-fe1001
  • 09:41 godog: add 50G to prometheus/k8s in codfw
  • 09:39 godog: add 50G to prometheus/services in eqiad
  • 09:38 arnaudb@cumin1001: dbctl commit (dc=all): 'db1242 (re)pooling @ 60%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53613 and previous config saved to /var/cache/conftool/dbconfig/20231120-093813-arnaudb.json
  • 09:37 arnaudb@cumin1001: dbctl commit (dc=all): 'db1241 (re)pooling @ 60%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53612 and previous config saved to /var/cache/conftool/dbconfig/20231120-093748-arnaudb.json
  • 09:34 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab-runner2004.codfw.wmnet with OS bullseye
  • 09:23 arnaudb@cumin1001: dbctl commit (dc=all): 'db1242 (re)pooling @ 45%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53611 and previous config saved to /var/cache/conftool/dbconfig/20231120-092308-arnaudb.json
  • 09:22 arnaudb@cumin1001: dbctl commit (dc=all): 'db1241 (re)pooling @ 45%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53610 and previous config saved to /var/cache/conftool/dbconfig/20231120-092243-arnaudb.json
  • 09:18 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab-runner2004.codfw.wmnet with reason: host reimage
  • 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab-runner2004.codfw.wmnet with reason: host reimage
  • 09:08 arnaudb@cumin1001: dbctl commit (dc=all): 'db1242 (re)pooling @ 30%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53609 and previous config saved to /var/cache/conftool/dbconfig/20231120-090803-arnaudb.json
  • 09:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db1241 (re)pooling @ 30%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53608 and previous config saved to /var/cache/conftool/dbconfig/20231120-090738-arnaudb.json
  • 09:00 jelto@cumin1001: START - Cookbook sre.hosts.reimage for host gitlab-runner2004.codfw.wmnet with OS bullseye
  • 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 100%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53607 and previous config saved to /var/cache/conftool/dbconfig/20231120-085636-root.json
  • 08:54 XioNoX: Refresh client certificate for central logging on pfw's - T351110
  • 08:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db1242 (re)pooling @ 15%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53606 and previous config saved to /var/cache/conftool/dbconfig/20231120-085258-arnaudb.json
  • 08:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db1241 (re)pooling @ 15%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53605 and previous config saved to /var/cache/conftool/dbconfig/20231120-085233-arnaudb.json
  • 08:37 arnaudb@cumin1001: dbctl commit (dc=all): 'db1242 (re)pooling @ 10%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53603 and previous config saved to /var/cache/conftool/dbconfig/20231120-083753-arnaudb.json
  • 08:37 arnaudb@cumin1001: dbctl commit (dc=all): 'db1241 (re)pooling @ 10%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53602 and previous config saved to /var/cache/conftool/dbconfig/20231120-083729-arnaudb.json
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 50%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53601 and previous config saved to /var/cache/conftool/dbconfig/20231120-082625-root.json
  • 08:22 arnaudb@cumin1001: dbctl commit (dc=all): 'db1242 (re)pooling @ 5%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53600 and previous config saved to /var/cache/conftool/dbconfig/20231120-082248-arnaudb.json
  • 08:22 arnaudb@cumin1001: dbctl commit (dc=all): 'db1241 (re)pooling @ 5%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53599 and previous config saved to /var/cache/conftool/dbconfig/20231120-082224-arnaudb.json
  • 08:18 kartik@deploy2002: Finished scap: Backport for testwiki: Enable the Unified Content Translation Dashboard (T337915) (duration: 11m 49s)
  • 08:13 kartik@deploy2002: kartik: Continuing with sync
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 25%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53598 and previous config saved to /var/cache/conftool/dbconfig/20231120-081120-root.json
  • 08:10 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 08:09 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 08:09 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 08:09 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 08:08 kartik@deploy2002: kartik: Backport for testwiki: Enable the Unified Content Translation Dashboard (T337915) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:07 kartik@deploy2002: Started scap: Backport for testwiki: Enable the Unified Content Translation Dashboard (T337915)
  • 08:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3314 (T348183)', diff saved to https://phabricator.wikimedia.org/P53597 and previous config saved to /var/cache/conftool/dbconfig/20231120-080541-arnaudb.json
  • 08:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 08:05 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 08:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T348183)', diff saved to https://phabricator.wikimedia.org/P53596 and previous config saved to /var/cache/conftool/dbconfig/20231120-080519-arnaudb.json
  • 08:00 marostegui@deploy2002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc3 master" (duration: 07m 52s)
  • 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 10%: Testing 10.4.32', diff saved to https://phabricator.wikimedia.org/P53595 and previous config saved to /var/cache/conftool/dbconfig/20231120-075615-root.json
  • 07:54 marostegui@deploy2002: marostegui: Continuing with sync
  • 07:54 marostegui@deploy2002: marostegui: Backport for Revert "ProductionServices.php: Promote pc1014 to pc3 master" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:52 marostegui@deploy2002: Started scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc3 master"
  • 07:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P53594 and previous config saved to /var/cache/conftool/dbconfig/20231120-075013-arnaudb.json
  • 07:49 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 07:48 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 07:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 07:37 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 07:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1013.eqiad.wmnet with OS bookworm
  • 07:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P53593 and previous config saved to /var/cache/conftool/dbconfig/20231120-073506-arnaudb.json
  • 07:34 moritzm: installing ncurses security updates
  • 07:20 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T348183)', diff saved to https://phabricator.wikimedia.org/P53592 and previous config saved to /var/cache/conftool/dbconfig/20231120-072000-arnaudb.json
  • 07:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1013.eqiad.wmnet with reason: host reimage
  • 07:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1013.eqiad.wmnet with reason: host reimage
  • 07:15 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 07:14 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 07:05 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc1013.eqiad.wmnet with OS bookworm
  • 07:04 marostegui@deploy2002: Finished scap: Backport for ProductionServices.php: Promote pc1014 to pc3 master (T351284) (duration: 07m 58s)
  • 06:58 marostegui@deploy2002: marostegui: Continuing with sync
  • 06:58 marostegui@deploy2002: marostegui: Backport for ProductionServices.php: Promote pc1014 to pc3 master (T351284) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 06:56 marostegui@deploy2002: Started scap: Backport for ProductionServices.php: Promote pc1014 to pc3 master (T351284)
  • 06:54 moritzm: installing python3.7 security updates
  • 06:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc2013.codfw.wmnet,pc[1013-1014].eqiad.wmnet with reason: Switch
  • 06:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2013.codfw.wmnet,pc[1013-1014].eqiad.wmnet with reason: Switch
  • 06:52 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host apt1002.wikimedia.org
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1210 T351283', diff saved to https://phabricator.wikimedia.org/P53591 and previous config saved to /var/cache/conftool/dbconfig/20231120-064733-root.json
  • 06:42 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host apt1002.wikimedia.org
  • 06:25 moritzm: installing qemu security updates on bullseye
  • 06:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T348183)', diff saved to https://phabricator.wikimedia.org/P53590 and previous config saved to /var/cache/conftool/dbconfig/20231120-061928-arnaudb.json
  • 06:19 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 06:19 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 06:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T348183)', diff saved to https://phabricator.wikimedia.org/P53589 and previous config saved to /var/cache/conftool/dbconfig/20231120-061906-arnaudb.json
  • 06:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: WIP
  • 06:12 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: WIP
  • 06:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P53588 and previous config saved to /var/cache/conftool/dbconfig/20231120-060400-arnaudb.json
  • 05:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P53587 and previous config saved to /var/cache/conftool/dbconfig/20231120-054853-arnaudb.json
  • 05:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T348183)', diff saved to https://phabricator.wikimedia.org/P53586 and previous config saved to /var/cache/conftool/dbconfig/20231120-053347-arnaudb.json
  • 00:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2119 (T348183)', diff saved to https://phabricator.wikimedia.org/P53585 and previous config saved to /var/cache/conftool/dbconfig/20231120-003846-arnaudb.json
  • 00:38 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 00:38 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 00:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T348183)', diff saved to https://phabricator.wikimedia.org/P53584 and previous config saved to /var/cache/conftool/dbconfig/20231120-003824-arnaudb.json
  • 00:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P53583 and previous config saved to /var/cache/conftool/dbconfig/20231120-002317-arnaudb.json
  • 00:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P53582 and previous config saved to /var/cache/conftool/dbconfig/20231120-000811-arnaudb.json

2023-11-19

  • 23:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T348183)', diff saved to https://phabricator.wikimedia.org/P53581 and previous config saved to /var/cache/conftool/dbconfig/20231119-235305-arnaudb.json
  • 18:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2110 (T348183)', diff saved to https://phabricator.wikimedia.org/P53580 and previous config saved to /var/cache/conftool/dbconfig/20231119-183758-arnaudb.json
  • 18:37 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 18:37 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 18:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T348183)', diff saved to https://phabricator.wikimedia.org/P53579 and previous config saved to /var/cache/conftool/dbconfig/20231119-183736-arnaudb.json
  • 18:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P53578 and previous config saved to /var/cache/conftool/dbconfig/20231119-182230-arnaudb.json
  • 18:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P53577 and previous config saved to /var/cache/conftool/dbconfig/20231119-180723-arnaudb.json
  • 17:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T348183)', diff saved to https://phabricator.wikimedia.org/P53576 and previous config saved to /var/cache/conftool/dbconfig/20231119-175217-arnaudb.json
  • 12:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2106 (T348183)', diff saved to https://phabricator.wikimedia.org/P53575 and previous config saved to /var/cache/conftool/dbconfig/20231119-123433-arnaudb.json
  • 12:34 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 12:34 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 07:22 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 07:22 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 03:19 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 02:23 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 02:22 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance

2023-11-18

  • 21:35 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: Maintenance
  • 21:34 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: Maintenance
  • 21:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T348183)', diff saved to https://phabricator.wikimedia.org/P53574 and previous config saved to /var/cache/conftool/dbconfig/20231118-213454-arnaudb.json
  • 21:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P53573 and previous config saved to /var/cache/conftool/dbconfig/20231118-211947-arnaudb.json
  • 21:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P53572 and previous config saved to /var/cache/conftool/dbconfig/20231118-210441-arnaudb.json
  • 20:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T348183)', diff saved to https://phabricator.wikimedia.org/P53571 and previous config saved to /var/cache/conftool/dbconfig/20231118-204934-arnaudb.json
  • 14:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1221 (T348183)', diff saved to https://phabricator.wikimedia.org/P53570 and previous config saved to /var/cache/conftool/dbconfig/20231118-145043-arnaudb.json
  • 14:50 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:50 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:50 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 14:50 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 14:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T348183)', diff saved to https://phabricator.wikimedia.org/P53569 and previous config saved to /var/cache/conftool/dbconfig/20231118-145003-arnaudb.json
  • 14:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P53568 and previous config saved to /var/cache/conftool/dbconfig/20231118-143457-arnaudb.json
  • 14:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P53567 and previous config saved to /var/cache/conftool/dbconfig/20231118-141950-arnaudb.json
  • 14:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T348183)', diff saved to https://phabricator.wikimedia.org/P53566 and previous config saved to /var/cache/conftool/dbconfig/20231118-140444-arnaudb.json
  • 08:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1199 (T348183)', diff saved to https://phabricator.wikimedia.org/P53565 and previous config saved to /var/cache/conftool/dbconfig/20231118-085142-arnaudb.json
  • 08:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 08:51 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 08:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T348183)', diff saved to https://phabricator.wikimedia.org/P53564 and previous config saved to /var/cache/conftool/dbconfig/20231118-085121-arnaudb.json
  • 08:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P53563 and previous config saved to /var/cache/conftool/dbconfig/20231118-083615-arnaudb.json
  • 08:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P53562 and previous config saved to /var/cache/conftool/dbconfig/20231118-082108-arnaudb.json
  • 08:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T348183)', diff saved to https://phabricator.wikimedia.org/P53561 and previous config saved to /var/cache/conftool/dbconfig/20231118-080602-arnaudb.json
  • 03:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1190 (T348183)', diff saved to https://phabricator.wikimedia.org/P53560 and previous config saved to /var/cache/conftool/dbconfig/20231118-030303-arnaudb.json
  • 03:02 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 03:02 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance

2023-11-17

  • 23:39 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:38 vriley@cumin1001: START - Cookbook sre.hosts.provision for host ganeti1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:00 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 21:59 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 21:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T348183)', diff saved to https://phabricator.wikimedia.org/P53559 and previous config saved to /var/cache/conftool/dbconfig/20231117-215947-arnaudb.json
  • 21:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P53558 and previous config saved to /var/cache/conftool/dbconfig/20231117-214441-arnaudb.json
  • 21:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P53557 and previous config saved to /var/cache/conftool/dbconfig/20231117-212935-arnaudb.json
  • 21:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T348183)', diff saved to https://phabricator.wikimedia.org/P53556 and previous config saved to /var/cache/conftool/dbconfig/20231117-211428-arnaudb.json
  • 19:51 bvibber: brion regenerating .m3u8 streaming manifests for all video files on mwmaint2002 (cleanup for T350996)
  • 18:05 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:04 vriley@cumin1001: START - Cookbook sre.hosts.provision for host ganeti1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:01 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1038.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:59 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:59 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:58 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:50 vriley@cumin1001: START - Cookbook sre.hosts.provision for host ganeti1038.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:49 vriley@cumin1001: START - Cookbook sre.hosts.provision for host ganeti1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:48 vriley@cumin1001: START - Cookbook sre.hosts.provision for host ganeti1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:47 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1038
  • 17:47 vriley@cumin1001: START - Cookbook sre.hosts.provision for host ganeti1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:46 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1038
  • 17:46 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1037
  • 17:45 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1036
  • 17:45 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1037
  • 17:43 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1036
  • 17:42 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1035
  • 17:40 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1035
  • 17:11 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1142.eqiad.wmnet onto db1242.eqiad.wmnet
  • 16:46 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:46 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:26 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:26 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T348183)', diff saved to https://phabricator.wikimedia.org/P53553 and previous config saved to /var/cache/conftool/dbconfig/20231117-161806-arnaudb.json
  • 16:18 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 16:17 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 16:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T348183)', diff saved to https://phabricator.wikimedia.org/P53552 and previous config saved to /var/cache/conftool/dbconfig/20231117-161744-arnaudb.json
  • 16:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P53551 and previous config saved to /var/cache/conftool/dbconfig/20231117-160238-arnaudb.json
  • 15:58 bking@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:58 bking@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:58 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:57 bking@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:57 bking@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:56 bking@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:56 bking@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:56 bking@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:47 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P53550 and previous config saved to /var/cache/conftool/dbconfig/20231117-154731-arnaudb.json
  • 15:38 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:38 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T348183)', diff saved to https://phabricator.wikimedia.org/P53549 and previous config saved to /var/cache/conftool/dbconfig/20231117-153225-arnaudb.json
  • 15:05 XioNoX: cr1-esams> request chassis fpc slot 1 online - T351304
  • 14:45 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1142.eqiad.wmnet onto db1242.eqiad.wmnet
  • 14:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1142 in db1242 for T344036', diff saved to https://phabricator.wikimedia.org/P53547 and previous config saved to /var/cache/conftool/dbconfig/20231117-144234-arnaudb.json
  • 14:40 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: provisionning db1242.eqiad.wmnet - T344036
  • 14:39 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: provisionning db1242.eqiad.wmnet - T344036
  • 14:39 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: provisionning db1242.eqiad.wmnet - T344036
  • 14:39 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: provisionning db1242.eqiad.wmnet - T344036
  • 14:20 elukey@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host ml-serve1001.eqiad.wmnet
  • 14:20 elukey@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-serve1001.eqiad.wmnet
  • 13:48 jynus: reenable puppet on dbprov2001 T351491
  • 13:47 klausman@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host ml-serve1001.eqiad.wmnet
  • 13:47 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ml-serve1002.eqiad.wmnet
  • 13:47 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-serve1001.eqiad.wmnet
  • 13:46 klausman@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host ml-serve1003.eqiad.wmnet
  • 13:45 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-serve1003.eqiad.wmnet
  • 13:45 klausman@cumin1001: END (ERROR) - Cookbook sre.puppet.migrate-host (exit_code=97) for host ml-serve1003.eqiad.wmnet
  • 13:45 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-serve1003.eqiad.wmnet
  • 13:44 klausman@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host ml-serve1003.eqiad.wmnet
  • 13:44 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-serve1002.eqiad.wmnet
  • 13:44 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-serve1003.eqiad.wmnet
  • 13:42 moritzm: imported php-luasandbox 4.0.2-3+wmf2+bullseye1 to component/php74 for bullseye-wikimedia
  • 13:36 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ml-serve1004.eqiad.wmnet
  • 13:35 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ml-serve1005.eqiad.wmnet
  • 13:33 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-serve1004.eqiad.wmnet
  • 13:32 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-serve1005.eqiad.wmnet
  • 13:30 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ml-serve1006.eqiad.wmnet
  • 13:28 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-serve1006.eqiad.wmnet
  • 13:26 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ml-serve1007.eqiad.wmnet
  • 13:24 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-serve1007.eqiad.wmnet
  • 12:54 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host ldap-rw2001.wikimedia.org
  • 12:53 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host ldap-rw2001.wikimedia.org
  • 12:53 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host ldap-rw2001.wikimedia.org
  • 12:52 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host ldap-rw2001.wikimedia.org
  • 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ldap-rw1001.wikimedia.org
  • 12:51 joal@deploy2002: Finished deploy [airflow-dags/analytics@a5e5ddc]: Airflow HOTFIX [airflow-dags/analytics@a5e5ddca] (duration: 00m 28s)
  • 12:50 joal@deploy2002: Started deploy [airflow-dags/analytics@a5e5ddc]: Airflow HOTFIX [airflow-dags/analytics@a5e5ddca]
  • 12:42 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host ldap-rw1001.wikimedia.org
  • 12:10 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 12:09 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 11:36 mabualruz@deploy2002: Finished scap: Backport for Fixes AMC outreach drawer (T351362) (duration: 07m 32s)
  • 11:30 mabualruz@deploy2002: mabualruz: Continuing with sync
  • 11:29 mabualruz@deploy2002: mabualruz: Backport for Fixes AMC outreach drawer (T351362) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 11:28 mabualruz@deploy2002: Started scap: Backport for Fixes AMC outreach drawer (T351362)
  • 11:20 jynus: running schema change on backup1-codfw (mediabackups) T191804
  • 11:17 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 11:17 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 11:17 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 11:17 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 11:16 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 11:16 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 11:16 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 11:16 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 11:16 cgoubert@deploy2002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 11:16 cgoubert@deploy2002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 11:16 cgoubert@deploy2002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 11:16 cgoubert@deploy2002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 11:16 cgoubert@deploy2002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 11:15 cgoubert@deploy2002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 11:15 cgoubert@deploy2002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 11:15 cgoubert@deploy2002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 11:15 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 11:14 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 11:14 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 11:14 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 11:14 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 11:13 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 11:13 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 11:13 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 11:12 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 11:12 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 11:12 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 11:11 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 11:11 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:11 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:11 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:10 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:10 jynus: running schema change on backup1-eqiad (mediabackups) T191804
  • 11:08 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 11:08 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 11:08 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 11:08 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 11:08 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 11:08 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:08 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:08 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:08 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:08 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 11:08 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 11:08 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 11:08 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:08 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:08 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:08 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:07 claime: Redeploying mw-on-k8s for T350430
  • 10:54 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ml-serve1008.eqiad.wmnet
  • 10:53 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 10:52 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 10:51 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-serve1008.eqiad.wmnet
  • 10:31 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ml-serve-ctrl1001.eqiad.wmnet
  • 10:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T348183)', diff saved to https://phabricator.wikimedia.org/P53542 and previous config saved to /var/cache/conftool/dbconfig/20231117-102952-arnaudb.json
  • 10:29 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 10:29 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 10:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T348183)', diff saved to https://phabricator.wikimedia.org/P53541 and previous config saved to /var/cache/conftool/dbconfig/20231117-102931-arnaudb.json
  • 10:28 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-serve-ctrl1001.eqiad.wmnet
  • 10:23 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ml-serve-ctrl1002.eqiad.wmnet
  • 10:20 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-serve-ctrl1002.eqiad.wmnet
  • 10:19 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 10:19 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 10:17 jmm@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new crm VM - jmm@cumin1001 - T349402"
  • 10:16 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 10:16 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 10:16 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 10:15 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 10:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P53540 and previous config saved to /var/cache/conftool/dbconfig/20231117-101425-arnaudb.json
  • 10:12 jmm@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new crm VM - jmm@cumin1001 - T349402"
  • 10:12 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ml-etcd1001.eqiad.wmnet
  • 10:09 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-etcd1001.eqiad.wmnet
  • 10:08 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ml-etcd1002.eqiad.wmnet
  • 09:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P53539 and previous config saved to /var/cache/conftool/dbconfig/20231117-095918-arnaudb.json
  • 09:51 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-etcd1002.eqiad.wmnet
  • 09:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T348183)', diff saved to https://phabricator.wikimedia.org/P53537 and previous config saved to /var/cache/conftool/dbconfig/20231117-094412-arnaudb.json
  • 09:38 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ml-etcd1003.eqiad.wmnet
  • 09:31 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-etcd1003.eqiad.wmnet
  • 09:24 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Test Upgrade GitLab Replica gitlab1003 with new runners
  • 09:22 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Test Upgrade GitLab Replica gitlab1003 with new runners
  • 09:12 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.reboot-runner (exit_code=0) rolling reboot on A:gitlab-runner
  • 09:04 moritzm: imported php-memcached 3.1.5+2.2.0-5+deb11u1+wmf1+bullseye1 to component/php74 for bullseye-wikimedia
  • 09:01 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host crm2001.codfw.wmnet
  • 09:01 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host crm2001.codfw.wmnet with OS bookworm
  • 08:45 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on crm2001.codfw.wmnet with reason: host reimage
  • 08:42 jmm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on crm2001.codfw.wmnet with reason: host reimage
  • 08:30 jelto@cumin1001: START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:gitlab-runner
  • 08:25 jmm@cumin1001: START - Cookbook sre.hosts.reimage for host crm2001.codfw.wmnet with OS bookworm
  • 08:15 jmm@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM crm2001.codfw.wmnet - jmm@cumin1001"
  • 08:14 jmm@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM crm2001.codfw.wmnet - jmm@cumin1001"
  • 08:14 jmm@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) crm2001.codfw.wmnet on all recursors
  • 08:14 jmm@cumin1001: START - Cookbook sre.dns.wipe-cache crm2001.codfw.wmnet on all recursors
  • 08:14 jmm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:14 jmm@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM crm2001.codfw.wmnet - jmm@cumin1001"
  • 08:13 jmm@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM crm2001.codfw.wmnet - jmm@cumin1001"
  • 08:10 jmm@cumin1001: START - Cookbook sre.dns.netbox
  • 08:09 jmm@cumin1001: START - Cookbook sre.ganeti.makevm for new host crm2001.codfw.wmnet
  • 08:06 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
  • 08:05 jmm@cumin1001: START - Cookbook sre.ganeti.resource-report
  • 07:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host debmonitor2003.codfw.wmnet
  • 07:49 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host debmonitor2003.codfw.wmnet
  • 07:34 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
  • 07:34 jmm@cumin1001: START - Cookbook sre.ganeti.resource-report
  • 07:34 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
  • 07:34 jmm@cumin1001: START - Cookbook sre.ganeti.resource-report
  • 07:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2133.codfw.wmnet with OS bookworm
  • 07:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2133.codfw.wmnet with reason: host reimage
  • 07:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2133.codfw.wmnet with reason: host reimage
  • 06:55 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2133.codfw.wmnet with OS bookworm
  • 06:48 mabualruz@deploy2002: Backport cancelled.
  • 04:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T348183)', diff saved to https://phabricator.wikimedia.org/P53535 and previous config saved to /var/cache/conftool/dbconfig/20231117-044504-arnaudb.json
  • 04:44 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 04:44 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 04:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T348183)', diff saved to https://phabricator.wikimedia.org/P53534 and previous config saved to /var/cache/conftool/dbconfig/20231117-044443-arnaudb.json
  • 04:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P53533 and previous config saved to /var/cache/conftool/dbconfig/20231117-042937-arnaudb.json
  • 04:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P53532 and previous config saved to /var/cache/conftool/dbconfig/20231117-041430-arnaudb.json
  • 03:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T348183)', diff saved to https://phabricator.wikimedia.org/P53531 and previous config saved to /var/cache/conftool/dbconfig/20231117-035924-arnaudb.json
  • 01:19 cstone: payments-wiki upgraded from eae2f35e to 56790715
  • 01:12 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1158.eqiad.wmnet with OS bullseye
  • 01:00 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['an-worker1158']
  • 00:55 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1158']
  • 00:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1157.eqiad.wmnet with OS bullseye
  • 00:48 ejegg: fundraising civiproxy upgraded from c000fc1e to 6625c844
  • 00:39 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['an-worker1157']
  • 00:32 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1157']

2023-11-16

  • 23:52 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1158']
  • 23:51 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1158.eqiad.wmnet with OS bullseye
  • 23:46 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1158']
  • 23:43 samtar@deploy2002: Finished scap: Backport for Revert "Disable drawer temporarily while erroring" (duration: 07m 31s)
  • 23:37 samtar@deploy2002: samtar: Continuing with sync
  • 23:37 samtar@deploy2002: samtar: Backport for Revert "Disable drawer temporarily while erroring" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:35 samtar@deploy2002: Started scap: Backport for Revert "Disable drawer temporarily while erroring"
  • 23:34 samtar@deploy2002: Sync cancelled.
  • 23:33 topranks: Change VRRP IP for public1-a-codfw vlan on codfw CRs T347191
  • 23:30 topranks: Add gateway IP for public1-a-codfw Vlan to ssw in codfw T347191
  • 23:30 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1157.eqiad.wmnet with OS bullseye
  • 23:30 samtar@deploy2002: jdlrobson and samtar: Backport for Disable drawer temporarily while erroring (T351362) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1157.eqiad.wmnet with OS bullseye
  • 23:28 samtar@deploy2002: Started scap: Backport for Disable drawer temporarily while erroring (T351362)
  • 23:28 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:28 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove old vlan 2001 entries - cmooney@cumin1001"
  • 23:27 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove old vlan 2001 entries - cmooney@cumin1001"
  • 23:25 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 23:10 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cr[1-2]-codfw,cr[1-2]-codfw IPv6 with reason: Move public1-a-codfw vlan GW from codfw CR routers to ssw
  • 23:10 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cr[1-2]-codfw,cr[1-2]-codfw IPv6 with reason: Move public1-a-codfw vlan GW from codfw CR routers to ssw
  • 22:39 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T348183)', diff saved to https://phabricator.wikimedia.org/P53529 and previous config saved to /var/cache/conftool/dbconfig/20231116-223915-arnaudb.json
  • 22:39 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 22:38 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 22:36 mutante: disabled puppet on miscweb*, netmon* and phab* hosts, deploying gerrit:974285, confirming noop
  • 22:31 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:31 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove old vlan 1117 entries - cmooney@cumin1001"
  • 22:30 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove old vlan 1117 entries - cmooney@cumin1001"
  • 22:29 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 22:09 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1157.eqiad.wmnet with OS bullseye
  • 22:00 dr0ptp4kt@deploy2002: Finished scap: Backport for Make the feed gracefully handle long snippets and urls (T347732 T351463) (duration: 09m 50s)
  • 21:59 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1157']
  • 21:54 dr0ptp4kt@deploy2002: dr0ptp4kt and soda: Continuing with sync
  • 21:53 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1157']
  • 21:53 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1157']
  • 21:53 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1157']
  • 21:52 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1157']
  • 21:52 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1157']
  • 21:51 dr0ptp4kt@deploy2002: dr0ptp4kt and soda: Backport for Make the feed gracefully handle long snippets and urls (T347732 T351463) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:50 dr0ptp4kt@deploy2002: Started scap: Backport for Make the feed gracefully handle long snippets and urls (T347732 T351463)
  • 21:43 topranks: Removing VRRP config for for public1-b-codfw on codfw CRs (T347191)
  • 21:38 dr0ptp4kt@deploy2002: Finished scap: Backport for Conditionally render the content of header-action instead of the slot (T351121) (duration: 07m 36s)
  • 21:32 dr0ptp4kt@deploy2002: dr0ptp4kt and jforrester: Continuing with sync
  • 21:32 dr0ptp4kt@deploy2002: dr0ptp4kt and jforrester: Backport for Conditionally render the content of header-action instead of the slot (T351121) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:30 dr0ptp4kt@deploy2002: Started scap: Backport for Conditionally render the content of header-action instead of the slot (T351121)
  • 21:25 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1164.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:25 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1163.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:18 dr0ptp4kt@deploy2002: Finished scap: Backport for Pre-deploy Annual Plan Core Metrics survey (T351353) (duration: 11m 12s)
  • 21:12 dr0ptp4kt@deploy2002: dr0ptp4kt and dani: Continuing with sync
  • 21:08 dr0ptp4kt@deploy2002: dr0ptp4kt and dani: Backport for Pre-deploy Annual Plan Core Metrics survey (T351353) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:07 dr0ptp4kt@deploy2002: Started scap: Backport for Pre-deploy Annual Plan Core Metrics survey (T351353)
  • 20:54 topranks: changing VRRP GW IP for public1-b-codfw on codfw CRs and disabling IPv6 RAs on the CRs (T347191)
  • 20:41 topranks: adding anycast GW for public1-b-codfw vlan to codfw spine switches (T347191)
  • 20:23 dr0ptp4kt@deploy2002: Finished deploy [airflow-dags/search@b00c6ca]: Deploying Airflow search WDQS graph split HDFS job (duration: 00m 27s)
  • 20:23 dr0ptp4kt@deploy2002: Started deploy [airflow-dags/search@b00c6ca]: Deploying Airflow search WDQS graph split HDFS job
  • 19:54 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1012.eqiad.wmnet with OS bullseye
  • 19:47 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:47 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS for IPs in public1-b-codfw vlan - cmooney@cumin1001"
  • 19:46 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS for IPs in public1-b-codfw vlan - cmooney@cumin1001"
  • 19:44 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 19:28 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1158.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:26 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1164.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:26 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1163.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:25 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1012.eqiad.wmnet with reason: host reimage
  • 19:25 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1160.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:25 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1162.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:25 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1159.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:24 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1157.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:24 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1161.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:22 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1012.eqiad.wmnet with reason: host reimage
  • 19:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1164.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:14 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1163.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:10 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.42.0-wmf.5 refs T350081
  • 19:10 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1012.eqiad.wmnet with OS bullseye
  • 19:09 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1164
  • 19:09 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1164
  • 19:09 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1163
  • 19:09 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1163
  • 19:08 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:07 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 19:04 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aqs1012.eqiad.wmnet with OS bullseye
  • 19:02 bking@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 19:01 bking@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 19:01 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1164.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:00 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1161.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:00 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1162.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:00 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1163.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:00 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1160.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:00 bking@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 18:59 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1159.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:59 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1158.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:59 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1157.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:56 bking@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 18:56 bking@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 18:56 bking@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 18:55 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:55 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker11 - jclark@cumin1001"
  • 18:54 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker11 - jclark@cumin1001"
  • 18:51 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 18:44 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1010.wikimedia.org with OS bullseye
  • 18:15 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1012.eqiad.wmnet with OS bullseye
  • 18:14 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aqs1012.eqiad.wmnet with OS bullseye
  • 17:48 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 17:48 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 17:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T348183)', diff saved to https://phabricator.wikimedia.org/P53526 and previous config saved to /var/cache/conftool/dbconfig/20231116-174800-arnaudb.json
  • 17:44 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1012.eqiad.wmnet with OS bullseye
  • 17:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P53525 and previous config saved to /var/cache/conftool/dbconfig/20231116-173254-arnaudb.json
  • 17:29 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1012.eqiad.wmnet with OS bullseye
  • 17:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.wikimedia.org with OS bullseye
  • 17:27 brett: Re-enabling puppet on all acme-chief clients post-bookworm upgrade - T342154
  • 17:23 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.wikimedia.org with OS bullseye
  • 17:20 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudelastic1010.wikimedia.org with OS bullseye
  • 17:19 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.wikimedia.org with OS bullseye
  • 17:18 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudelastic1010.wikimedia.org with OS bullseye
  • 17:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P53523 and previous config saved to /var/cache/conftool/dbconfig/20231116-171748-arnaudb.json
  • 17:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.wikimedia.org with OS bullseye
  • 17:13 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudelastic1009.wikimedia.org with reason: host reimage
  • 17:12 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.wikimedia.org with reason: host reimage
  • 17:08 brett@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for acmechief1001.eqiad.wmnet
  • 17:08 brett@cumin2002: START - Cookbook sre.hosts.remove-downtime for acmechief1001.eqiad.wmnet
  • 17:07 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 17:07 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 17:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T348183)', diff saved to https://phabricator.wikimedia.org/P53522 and previous config saved to /var/cache/conftool/dbconfig/20231116-170241-arnaudb.json
  • 17:00 brett: Disabling puppet on all acme-chief clients for acme-chief bookworm upgrades - T342154
  • 16:58 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.wikimedia.org with OS bullseye
  • 16:52 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 16:51 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 16:50 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 16:50 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
  • 16:49 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host acmechief1001.eqiad.wmnet with OS bookworm
  • 16:39 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1012.eqiad.wmnet with OS bullseye
  • 16:37 sukhe: repool cp4037
  • 16:31 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on acmechief1001.eqiad.wmnet with reason: host reimage
  • 16:30 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on 6 hosts with reason: Extending downtime for depooled cp hosts
  • 16:30 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on 6 hosts with reason: Extending downtime for depooled cp hosts
  • 16:27 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4037.ulsfo.wmnet
  • 16:26 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on acmechief1001.eqiad.wmnet with reason: host reimage
  • 16:26 fabfur: swapped cp1109 <-> cp1084 (T349244)
  • 16:24 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1109.eqiad.wmnet
  • 16:24 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp1109.eqiad.wmnet
  • 16:23 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host sretest1004.eqiad.wmnet
  • 16:21 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1009.wikimedia.org with OS bullseye
  • 16:21 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1010.wikimedia.org with OS bullseye
  • 16:20 fabfur: swapped cp1108 <-> cp1083 (T349244)
  • 16:18 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1108.eqiad.wmnet
  • 16:18 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp1108.eqiad.wmnet
  • 16:18 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4037.ulsfo.wmnet
  • 16:17 brett@cumin1001: START - Cookbook sre.hosts.reimage for host acmechief1001.eqiad.wmnet with OS bookworm
  • 16:17 sukhe: depool cp4037 for reboot [post puppet 7 upgrade]
  • 16:09 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4037.ulsfo.wmnet
  • 16:08 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['aqs1012']
  • 16:03 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4037.ulsfo.wmnet
  • 16:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: kafka::logging
  • 15:55 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1002.eqiad.wmnet with OS bullseye
  • 15:55 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: kafka::logging
  • 15:38 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1002.eqiad.wmnet with reason: host reimage
  • 15:38 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['aqs1012']
  • 15:38 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['aqs1012']
  • 15:37 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['aqs1012']
  • 15:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1008.wikimedia.org with OS bullseye
  • 15:35 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1002.eqiad.wmnet with reason: host reimage
  • 15:26 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1141.eqiad.wmnet onto db1241.eqiad.wmnet
  • 15:22 brouberol@cumin1001: START - Cookbook sre.hosts.reimage for host an-druid1002.eqiad.wmnet with OS bullseye
  • 15:21 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: revert logstash changes - bking@cumin2002 - T324335
  • 15:21 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1008.wikimedia.org with reason: host reimage
  • 15:18 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1008.wikimedia.org with reason: host reimage
  • 15:17 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: revert logstash changes - bking@cumin2002 - T324335
  • 15:15 ayounsi@cumin1001: START - Cookbook sre.hosts.dhcp for host sretest1004.eqiad.wmnet
  • 15:03 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1008.wikimedia.org with OS bullseye
  • 15:01 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.wikimedia.org with OS bullseye
  • 15:01 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.wikimedia.org with OS bullseye
  • 14:57 arnaudb@cumin1001: dbctl commit (dc=all): 'remove db1136', diff saved to https://phabricator.wikimedia.org/P53519 and previous config saved to /var/cache/conftool/dbconfig/20231116-145754-arnaudb.json
  • 14:57 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 14:57 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1136.eqiad.wmnet
  • 14:57 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:57 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1136.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
  • 14:57 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 14:57 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 14:57 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 14:57 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 14:56 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1136.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
  • 14:56 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 14:56 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 14:56 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 14:56 cgoubert@deploy2002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 14:56 cgoubert@deploy2002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 14:56 cgoubert@deploy2002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 14:56 cgoubert@deploy2002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 14:55 cgoubert@deploy2002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 14:55 cgoubert@deploy2002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 14:55 cgoubert@deploy2002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 14:55 cgoubert@deploy2002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 14:54 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
  • 14:53 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 14:53 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 14:53 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 14:52 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 14:52 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 14:51 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 14:51 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 14:51 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 14:51 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 14:50 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 14:50 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 14:50 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 14:50 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:49 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:49 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:49 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1136.eqiad.wmnet
  • 14:49 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:48 claime: Redeploying mw-on-k8s for T350430
  • 14:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 141626
  • 14:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 141626
  • 14:43 jbond: re-enable puppet on puppet7 agents
  • 14:43 kartik@deploy2002: Finished scap: Backport for TranslatablePageMarker: Add patrol status for translatable page (T351273) (duration: 21m 41s)
  • 14:37 kartik@deploy2002: kartik and abi: Continuing with sync
  • 14:23 kartik@deploy2002: kartik and abi: Backport for TranslatablePageMarker: Add patrol status for translatable page (T351273) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:21 kartik@deploy2002: Started scap: Backport for TranslatablePageMarker: Add patrol status for translatable page (T351273)
  • 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: kafka::monitoring_bullseye
  • 14:15 jbond: stop puppet on puppet7 agents to debug puppet performance
  • 14:10 hnowlan@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1019*,lvs2013*} and A:lvs (T349796)
  • 14:09 hnowlan@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1019*,lvs2013*} and A:lvs (T349796)
  • 14:08 hnowlan@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T349796)
  • 14:07 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: kafka::monitoring_bullseye
  • 14:07 hnowlan@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T349796)
  • 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: prometheus
  • 13:49 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: prometheus
  • 13:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2050.codfw.wmnet
  • 13:44 jynus: restart bacula at backup1001
  • 13:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2050.codfw.wmnet
  • 13:39 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host backup2001.codfw.wmnet
  • 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ms-be2050.codfw.wmnet
  • 13:34 sergi0: stat1008: Add `sowiki`, `stwiki`, `tgwiki` and `ugwiki` to `/srv/published/datasets/one-off/research-mwaddlink/wikis.txt` (T340944)
  • 13:33 jbond@cumin1001: START - Cookbook sre.puppet.migrate-host for host backup2001.codfw.wmnet
  • 13:30 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host dbprov2001.codfw.wmnet
  • 13:29 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host ms-be2050.codfw.wmnet
  • 13:28 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host ms-be2050.codfw.wmnet
  • 13:21 jbond@cumin1001: START - Cookbook sre.puppet.migrate-host for host dbprov2001.codfw.wmnet
  • 13:19 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host backup1001.eqiad.wmnet
  • 13:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1014.eqiad.wmnet
  • 13:10 jbond@cumin1001: START - Cookbook sre.puppet.migrate-host for host backup1001.eqiad.wmnet
  • 13:09 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db1133.eqiad.wmnet
  • 13:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe1014.eqiad.wmnet
  • 13:02 jbond@cumin1001: START - Cookbook sre.puppet.migrate-host for host db1133.eqiad.wmnet
  • 13:00 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1141.eqiad.wmnet onto db1241.eqiad.wmnet
  • 12:56 arnaudb@cumin1001: dbctl commit (dc=all): 'cloning db1141 - T350458', diff saved to https://phabricator.wikimedia.org/P53516 and previous config saved to /var/cache/conftool/dbconfig/20231116-125649-arnaudb.json
  • 12:56 cmooney@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.4 - cmooney@cumin1001
  • 12:55 arnaudb@cumin1001: dbctl commit (dc=all): 'cloning db1141 - T350458', diff saved to https://phabricator.wikimedia.org/P53515 and previous config saved to /var/cache/conftool/dbconfig/20231116-125515-arnaudb.json
  • 12:55 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1241.eqiad.wmnet with reason: provisionning db1241.eqiad.wmnet - T344036
  • 12:54 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1241.eqiad.wmnet with reason: provisionning db1241.eqiad.wmnet - T344036
  • 12:54 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: provisionning db1241.eqiad.wmnet - T344036
  • 12:54 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: provisionning db1241.eqiad.wmnet - T344036
  • 12:54 cmooney@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.4 - cmooney@cumin1001
  • 12:33 jmm@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cumin2002.codfw.wmnet
  • 12:33 marostegui: Install Test MariaDB 10.6.16 (Bookworm) on pc2014 T351283
  • 12:29 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 12:29 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 12:29 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 12:27 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db1124.eqiad.wmnet
  • 12:27 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 12:23 jmm@cumin1001: START - Cookbook sre.puppet.migrate-host for host cumin2002.codfw.wmnet
  • 12:16 jbond@cumin1001: START - Cookbook sre.puppet.migrate-host for host db1124.eqiad.wmnet
  • 12:07 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ms-fe1014.eqiad.wmnet
  • 11:55 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host ms-fe1014.eqiad.wmnet
  • 11:55 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host clouddb1021.eqiad.wmnet
  • 11:51 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
  • 11:50 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
  • 11:50 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
  • 11:49 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
  • 11:49 taavi@cumin1001: START - Cookbook sre.puppet.migrate-host for host clouddb1021.eqiad.wmnet
  • 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: insetup::serviceops
  • 11:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T348183)', diff saved to https://phabricator.wikimedia.org/P53514 and previous config saved to /var/cache/conftool/dbconfig/20231116-114511-arnaudb.json
  • 11:45 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 11:44 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 11:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T348183)', diff saved to https://phabricator.wikimedia.org/P53513 and previous config saved to /var/cache/conftool/dbconfig/20231116-114450-arnaudb.json
  • 11:34 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 11:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest1004.eqiad.wmnet
  • 11:34 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 11:33 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: insetup::serviceops
  • 11:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P53512 and previous config saved to /var/cache/conftool/dbconfig/20231116-112942-arnaudb.json
  • 09:40 arnaudb@cumin1001: dbctl commit (dc=all): 'db1238 (re)pooling @ 15%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53502 and previous config saved to /var/cache/conftool/dbconfig/20231116-094005-arnaudb.json
  • 09:25 arnaudb@cumin1001: dbctl commit (dc=all): 'db1238 (re)pooling @ 10%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53501 and previous config saved to /var/cache/conftool/dbconfig/20231116-092500-arnaudb.json
  • 09:09 arnaudb@cumin1001: dbctl commit (dc=all): 'db1238 (re)pooling @ 5%: Post warmup repooling', diff saved to https://phabricator.wikimedia.org/P53500 and previous config saved to /var/cache/conftool/dbconfig/20231116-090955-arnaudb.json
  • 09:00 godog: bounce prometheus instances on prometheus2006 to test p7 upgrade
  • 08:59 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: kubernetes::worker
  • 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: thanos::frontend
  • 08:37 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 08:37 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 08:34 moritzm: installing ruby-rails-html-sanitizer security updates
  • 08:30 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: thanos::frontend
  • 08:25 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host clouddumps1001.wikimedia.org
  • 08:22 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host prometheus2006.codfw.wmnet
  • 08:19 taavi@cumin1001: START - Cookbook sre.puppet.migrate-host for host clouddumps1001.wikimedia.org
  • 08:18 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcumin2001.codfw.wmnet
  • 08:17 moritzm: installing elfutils security updates
  • 08:12 taavi@cumin1001: START - Cookbook sre.puppet.migrate-host for host cloudcumin2001.codfw.wmnet
  • 08:09 moritzm: installing python-git security updates
  • 08:07 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host prometheus2006.codfw.wmnet
  • 08:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ncredir4001.ulsfo.wmnet
  • 07:54 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host ncredir4001.ulsfo.wmnet
  • 07:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: prometheus::pop
  • 07:30 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: prometheus::pop
  • 06:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2132,2160].codfw.wmnet,db[1119,1164,1217].eqiad.wmnet with reason: Switch
  • 06:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2132,2160].codfw.wmnet,db[1119,1164,1217].eqiad.wmnet with reason: Switch
  • 06:07 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2004.codfw.wmnet with OS bullseye
  • 05:48 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1007.wikimedia.org with OS bullseye
  • 05:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T348183)', diff saved to https://phabricator.wikimedia.org/P53499 and previous config saved to /var/cache/conftool/dbconfig/20231116-053616-arnaudb.json
  • 05:36 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 05:36 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 05:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T348183)', diff saved to https://phabricator.wikimedia.org/P53498 and previous config saved to /var/cache/conftool/dbconfig/20231116-053554-arnaudb.json
  • 05:20 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P53497 and previous config saved to /var/cache/conftool/dbconfig/20231116-052048-arnaudb.json
  • 05:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P53496 and previous config saved to /var/cache/conftool/dbconfig/20231116-050542-arnaudb.json
  • 04:57 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2001-dev.codfw.wmnet with OS bookworm
  • 04:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T348183)', diff saved to https://phabricator.wikimedia.org/P53495 and previous config saved to /var/cache/conftool/dbconfig/20231116-045035-arnaudb.json
  • 04:38 cstone: payments-wiki upgraded from 6affb60a to eae2f35e
  • 04:30 cstone: payments-wiki upgraded from 084370bb to 6affb60a
  • 04:24 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.wikimedia.org with OS bullseye
  • 03:44 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage
  • 03:40 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage
  • 03:40 ejegg: fundraising civicrm upgraded from 6e53198c to 32679ea3
  • 03:19 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.codfw.wmnet with OS bookworm
  • 01:53 cstone: payments-wiki upgraded from b4465e23 to 084370bb
  • 01:34 eileen: civicrm upgraded from ec6992e0 to 6e53198c
  • 00:27 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1008.wikimedia.org with OS bullseye

2023-11-15

  • 23:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T348183)', diff saved to https://phabricator.wikimedia.org/P53494 and previous config saved to /var/cache/conftool/dbconfig/20231115-235044-arnaudb.json
  • 23:50 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 23:50 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 23:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T348183)', diff saved to https://phabricator.wikimedia.org/P53493 and previous config saved to /var/cache/conftool/dbconfig/20231115-235023-arnaudb.json
  • 23:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P53492 and previous config saved to /var/cache/conftool/dbconfig/20231115-233516-arnaudb.json
  • 23:20 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P53491 and previous config saved to /var/cache/conftool/dbconfig/20231115-232010-arnaudb.json
  • 23:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T348183)', diff saved to https://phabricator.wikimedia.org/P53490 and previous config saved to /var/cache/conftool/dbconfig/20231115-230504-arnaudb.json
  • 23:04 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1008.wikimedia.org with OS bullseye
  • 22:59 bking@cumin2002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for cloudelastic1007.wikimedia.org: Renew puppet certificate - bking@cumin2002
  • 22:58 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for cloudelastic1007.wikimedia.org: Renew puppet certificate - bking@cumin2002
  • 22:57 bking@cumin2002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for cloudelastic1008.wikimedia.org: Renew puppet certificate - bking@cumin2002
  • 22:57 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for cloudelastic1008.wikimedia.org: Renew puppet certificate - bking@cumin2002
  • 22:41 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cloudelastic[1007-1010].wikimedia.org with reason: new cloudelastic hosts TT351354
  • 22:41 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cloudelastic[1007-1010].wikimedia.org with reason: new cloudelastic hosts TT351354
  • 22:20 ryankemper: T351354 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/974693; running puppet on hosts
  • 19:39 topranks: re-enabling puppet on DNS hosts to adjust TTL setting in BIRD (T350488)
  • 19:37 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1010.wikimedia.org with OS bullseye
  • 19:36 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1009.wikimedia.org with OS bullseye
  • 19:34 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1008.wikimedia.org with OS bullseye
  • 19:23 jhuneidi@deploy2002: Synchronized php: group1 wikis to 1.42.0-wmf.5 refs T350081 (duration: 05m 52s)
  • 19:17 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.42.0-wmf.5 refs T350081
  • 19:15 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: aphlict
  • 19:10 topranks: merging patch to remove TTL restriction on Bird Anycast BGP peerings (T350488)
  • 19:09 dzahn@cumin1001: START - Cookbook sre.puppet.migrate-role for role: aphlict
  • 19:07 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudlb2001-dev.codfw.wmnet
  • 19:07 mutante: aphlict2001 - restart aphlict service after puppet 7 upgrade
  • 19:05 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wmcs::openstack::codfw1dev::virt_ceph
  • 19:01 taavi@cumin1001: START - Cookbook sre.puppet.migrate-host for host cloudlb2001-dev.codfw.wmnet
  • 19:00 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudgw2003-dev.codfw.wmnet
  • 18:59 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wmcs::openstack::codfw1dev::services
  • 18:59 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host aphlict2001.codfw.wmnet
  • 18:59 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::openstack::codfw1dev::virt_ceph
  • 18:58 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-role (exit_code=99) for role: wmcs::openstack::codfw1dev::virt_ceph
  • 18:56 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
  • 18:54 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::openstack::codfw1dev::virt_ceph
  • 18:54 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wmcs::openstack::codfw1dev::net
  • 18:54 dzahn@cumin1001: START - Cookbook sre.puppet.migrate-host for host aphlict2001.codfw.wmnet
  • 18:54 taavi@cumin1001: START - Cookbook sre.puppet.migrate-host for host cloudgw2003-dev.codfw.wmnet
  • 18:51 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::openstack::codfw1dev::services
  • 18:49 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudgw2002-dev.codfw.wmnet
  • 18:45 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::openstack::codfw1dev::net
  • 18:42 topranks: Reset BGP to lvs4010 from cr3-ulsfo to validate new config T350488
  • 18:41 taavi@cumin1001: START - Cookbook sre.puppet.migrate-host for host cloudgw2002-dev.codfw.wmnet
  • 18:36 topranks: remove TTL setting on server-facing BGP peerings on cr3-ulsfo T350488
  • 18:25 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wmcs::openstack::codfw1dev::db
  • 18:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.wikimedia.org with OS bullseye
  • 18:15 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.wikimedia.org with OS bullseye
  • 18:14 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::openstack::codfw1dev::db
  • 18:12 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1008.wikimedia.org with OS bullseye
  • 18:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T348183)', diff saved to https://phabricator.wikimedia.org/P53488 and previous config saved to /var/cache/conftool/dbconfig/20231115-180503-arnaudb.json
  • 18:04 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 18:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 18:01 jynus: All restart_daemons were successful
  • 18:01 root@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
  • 17:57 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 17:57 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 17:56 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 17:56 root@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
  • 17:52 inflatador: bking@wdqs1024 reboot host to hopefully reduce data reload failures T349011
  • 17:51 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 17:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1019*,lvs2013*} and A:lvs (T349796)
  • 17:27 hnowlan@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1019*,lvs2013*} and A:lvs (T349796)
  • 17:26 hnowlan@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T349796)
  • 17:23 hnowlan@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T349796)
  • 17:19 hnowlan@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T349796)
  • 17:18 hnowlan@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T349796)
  • 16:52 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1102.eqiad.wmnet
  • 16:52 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp1102.eqiad.wmnet
  • 16:45 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1102.eqiad.wmnet
  • 16:36 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1102.eqiad.wmnet
  • 16:35 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1102.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 16:25 fabfur@cumin1001: START - Cookbook sre.hosts.provision for host cp1102.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 16:25 elukey: reload thanos-rule on titan[12]001 to pick up new pyrra generated configs
  • 16:21 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp1102.eqiad.wmnet with reason: BIOS settings fix
  • 16:21 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on cp1102.eqiad.wmnet with reason: BIOS settings fix
  • 16:19 fabfur: depooling cp1102 for BIOS options fix
  • 16:16 arnaudb@cumin1001: dbctl commit (dc=all): 'depool db1130', diff saved to https://phabricator.wikimedia.org/P53486 and previous config saved to /var/cache/conftool/dbconfig/20231115-161600-arnaudb.json
  • 16:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host lvs6003.drmrs.wmnet
  • 15:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1130.eqiad.wmnet
  • 15:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1130.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
  • 15:57 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1130.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
  • 15:56 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host lvs6003.drmrs.wmnet
  • 15:55 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
  • 15:49 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1130.eqiad.wmnet
  • 15:48 fabfur: swapped cp1107 <-> cp1082 (T349244)
  • 15:46 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host doh6001.wikimedia.org
  • 15:46 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1107.eqiad.wmnet
  • 15:46 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp1107.eqiad.wmnet
  • 15:44 fabfur: swapped cp1106 <-> cp1081 (T349244)
  • 15:43 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1106.eqiad.wmnet
  • 15:43 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp1106.eqiad.wmnet
  • 15:41 godog: bounce prometheus-blackbox-exporter on prometheus4002
  • 15:40 godog: bounce prometheus@ops on prometheus4002
  • 15:39 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host doh6001.wikimedia.org
  • 15:33 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host durum1001.eqiad.wmnet
  • 15:28 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 15:28 arnaudb@cumin1001: dbctl commit (dc=all): 'depool db1127', diff saved to https://phabricator.wikimedia.org/P53485 and previous config saved to /var/cache/conftool/dbconfig/20231115-152836-arnaudb.json
  • 15:26 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 15:25 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host durum1001.eqiad.wmnet
  • 15:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host restbase1024.eqiad.wmnet
  • 15:22 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1127.eqiad.wmnet
  • 15:22 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:22 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1127.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
  • 15:21 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1127.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
  • 15:19 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
  • 15:16 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1012.eqiad.wmnet with OS bullseye
  • 15:13 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1127.eqiad.wmnet
  • 15:12 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host restbase1024.eqiad.wmnet
  • 15:09 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 15:08 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2004.codfw.wmnet with reason: host reimage
  • 15:05 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: vrts
  • 15:05 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2004.codfw.wmnet with reason: host reimage
  • 15:00 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: vrts
  • 14:50 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest2004.codfw.wmnet with OS bullseye
  • 14:47 awight@deploy2002: Finished scap: Backport for GrowthExperiments: enable AddLink backend for 16,17th rounds of wikis (T308142 T308143) (duration: 08m 16s)
  • 14:47 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bullseye
  • 14:45 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-role (exit_code=99) for role: wmcs::openstack::codfw1dev::control
  • 14:42 awight@deploy2002: sgimeno and awight: Continuing with sync
  • 14:41 awight@deploy2002: sgimeno and awight: Backport for GrowthExperiments: enable AddLink backend for 16,17th rounds of wikis (T308142 T308143) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:39 awight@deploy2002: Started scap: Backport for GrowthExperiments: enable AddLink backend for 16,17th rounds of wikis (T308142 T308143)
  • 14:37 awight@deploy2002: Finished scap: Backport for prod: Enable $wgCampaignEventsEnableParticipantQuestions (T347607) (duration: 16m 09s)
  • 14:35 claime: Raised mw-on-k8s to 20% of external traffic, rollout will happen over the next half hour - T348122
  • 14:31 awight@deploy2002: daimona and awight: Continuing with sync
  • 14:31 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1012.eqiad.wmnet with OS bullseye
  • 14:30 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aqs1012.eqiad.wmnet with OS bullseye
  • 14:26 joal@deploy2002: Finished deploy [analytics/refinery@3e9df5d] (hadoop-test): Regular analytics weekly train - TEST - HOTFIX [analytics/refinery@3e9df5d8] (duration: 03m 13s)
  • 14:23 joal@deploy2002: Started deploy [analytics/refinery@3e9df5d] (hadoop-test): Regular analytics weekly train - TEST - HOTFIX [analytics/refinery@3e9df5d8]
  • 14:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host kubernetes2054.codfw.wmnet
  • 14:23 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage
  • 14:23 joal@deploy2002: Finished deploy [analytics/refinery@3e9df5d] (thin): Regular analytics weekly train - THIN - HOTFIX [analytics/refinery@3e9df5d8] (duration: 00m 07s)
  • 14:23 joal@deploy2002: Started deploy [analytics/refinery@3e9df5d] (thin): Regular analytics weekly train - THIN - HOTFIX [analytics/refinery@3e9df5d8]
  • 14:22 awight@deploy2002: daimona and awight: Backport for prod: Enable $wgCampaignEventsEnableParticipantQuestions (T347607) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:21 awight@deploy2002: Started scap: Backport for prod: Enable $wgCampaignEventsEnableParticipantQuestions (T347607)
  • 14:20 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage
  • 14:18 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host kubernetes2054.codfw.wmnet
  • 14:09 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host thanos-be2001.codfw.wmnet
  • 14:08 sukhe: running authdns-update to depool esams
  • 14:03 brouberol@cumin1001: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bullseye
  • 14:03 joal@deploy2002: Finished deploy [analytics/refinery@3e9df5d]: Regular analytics weekly train - HOTFIX [analytics/refinery@3e9df5d8] (duration: 00m 06s)
  • 14:03 joal@deploy2002: Started deploy [analytics/refinery@3e9df5d]: Regular analytics weekly train - HOTFIX [analytics/refinery@3e9df5d8]
  • 14:03 XioNoX: reboot fpc0 on cr1-esams - T346779
  • 14:00 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1012.eqiad.wmnet with OS bullseye
  • 13:59 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host thanos-be2001.codfw.wmnet
  • 13:59 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::openstack::codfw1dev::control
  • 13:55 XioNoX: disable peering/transit on cr1-esams for linecard reboot - T346779
  • 13:52 joal@deploy2002: Finished deploy [analytics/refinery@3e9df5d]: Regular analytics weekly train - HOTFIX [analytics/refinery@3e9df5d8] (duration: 08m 16s)
  • 13:50 taavi: deploy https://gerrit.wikimedia.org/r/c/operations/homer/public/+/973769/ core routers
  • 13:44 joal@deploy2002: Started deploy [analytics/refinery@3e9df5d]: Regular analytics weekly train - HOTFIX [analytics/refinery@3e9df5d8]
  • 13:42 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:41 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 13:40 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: etcd::v3::kubernetes
  • 13:38 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 13:31 sfaci@deploy2002: Finished deploy [airflow-dags/analytics_test@5a47584]: Regular analytics weekly train [airflow/analytics_test@5a475842] (duration: 00m 14s)
  • 13:31 sfaci@deploy2002: Started deploy [airflow-dags/analytics_test@5a47584]: Regular analytics weekly train [airflow/analytics_test@5a475842]
  • 13:29 sfaci@deploy2002: Finished deploy [airflow-dags/analytics@5a47584]: Regular analytics weekly train [airflow/analytics@5a475842] (duration: 00m 27s)
  • 13:29 sfaci@deploy2002: Started deploy [airflow-dags/analytics@5a47584]: Regular analytics weekly train [airflow/analytics@5a475842]
  • 13:28 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: etcd::v3::kubernetes
  • 13:22 sfaci@deploy2002: Finished deploy [airflow-dags/analytics_test@be05071]: Regular analytics weekly train [airflow/analytics_test@c203642a] (duration: 00m 06s)
  • 13:21 sfaci@deploy2002: Started deploy [airflow-dags/analytics_test@be05071]: Regular analytics weekly train [airflow/analytics_test@c203642a]
  • 13:18 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 13:18 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 13:17 topranks: resetting FPC1 card in cr1-esams which has a major error and gone offline (T351304)
  • 13:14 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2003.codfw.wmnet with OS bullseye
  • 13:14 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmooney@cumin1001"
  • 13:10 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmooney@cumin1001"
  • 13:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 13:05 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 12:57 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 12:57 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 12:57 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 12:57 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 12:57 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 12:57 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 12:57 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 12:56 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 12:56 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 12:55 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 12:54 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 12:54 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 12:52 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2003.codfw.wmnet with reason: host reimage
  • 12:49 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2003.codfw.wmnet with reason: host reimage
  • 12:33 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest2003.codfw.wmnet with OS bullseye
  • 11:57 stevemunene@deploy2002: Finished deploy [airflow-dags/wmde@91810bc]: (no justification provided) (duration: 00m 10s)
  • 11:56 stevemunene@deploy2002: Started deploy [airflow-dags/wmde@91810bc]: (no justification provided)
  • 11:52 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: insetup::unowned
  • 11:48 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: insetup::unowned
  • 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host thanos-fe2001.codfw.wmnet
  • 11:24 taavi: update cr*-{codfw,eqiad} firewall policy via homer to update cloudcontrol1006 addressing
  • 11:24 btullis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
  • 11:21 btullis@deploy2002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
  • 11:20 btullis@cumin1001: END (ERROR) - Cookbook sre.druid.roll-restart-workers (exit_code=97) for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 11:18 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 11:17 btullis@deploy2002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
  • 11:15 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host thanos-fe2001.codfw.wmnet
  • 11:14 btullis@deploy2002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
  • 10:46 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: miscweb
  • 10:44 tchanders@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 10:42 tchanders@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 10:41 tchanders@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 10:40 tchanders@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 10:39 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: sync
  • 10:39 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: sync
  • 10:39 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: sync
  • 10:39 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: sync
  • 10:39 _joe_: roll restart of mobileapps in codfw and eqiad
  • 10:34 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 10:31 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 10:31 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 10:30 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 10:22 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: miscweb
  • 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:cassandra-dev
  • 09:37 moritzm: imported php-igbinary 3.2.1+2.0.8-2+wmf1+bullseye1 to component/php74 for bullseye-wikimedia
  • 09:26 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: insetup_noferm
  • 09:19 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: insetup_noferm
  • 09:09 jmm@cumin2002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:cassandra-dev
  • 08:37 moritzm: rolling restart of Cassandra in cassandra-dev following migration to Puppet 7
  • 08:27 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: cassandra_dev
  • 08:02 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: cassandra_dev
  • 08:01 marostegui@deploy2002: Finished scap: Backport for Revert "Revert "Revert "ProductionServices.php: Promote pc2014 to pc3 master""" (duration: 06m 54s)
  • 08:00 arnaudb@cumin1001: dbctl commit (dc=all): 'depool db1127', diff saved to https://phabricator.wikimedia.org/P53483 and previous config saved to /var/cache/conftool/dbconfig/20231115-080033-arnaudb.json
  • 07:55 marostegui@deploy2002: marostegui: Continuing with sync
  • 07:55 marostegui@deploy2002: marostegui: Backport for Revert "Revert "Revert "ProductionServices.php: Promote pc2014 to pc3 master""" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:54 marostegui@deploy2002: Started scap: Backport for Revert "Revert "Revert "ProductionServices.php: Promote pc2014 to pc3 master"""
  • 07:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2013.codfw.wmnet with OS bookworm
  • 07:47 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: pybaltest
  • 07:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc2013.codfw.wmnet with reason: host reimage
  • 07:35 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: pybaltest
  • 07:34 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.migrate-role (exit_code=99) for role: mariadb::misc::analytics::backup
  • 07:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2013.codfw.wmnet with reason: host reimage
  • 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc2013.codfw.wmnet with OS bookworm
  • 07:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc[2013-2014].codfw.wmnet,pc[1013-1014].eqiad.wmnet with reason: Reimage
  • 07:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc[2013-2014].codfw.wmnet,pc[1013-1014].eqiad.wmnet with reason: Reimage
  • 07:15 marostegui@deploy2002: Finished scap: Backport for Revert "Revert "ProductionServices.php: Promote pc2014 to pc3 master"" (duration: 06m 53s)
  • 07:10 marostegui@deploy2002: marostegui: Continuing with sync
  • 07:10 marostegui@deploy2002: marostegui: Backport for Revert "Revert "ProductionServices.php: Promote pc2014 to pc3 master"" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 40934
  • 07:08 marostegui@deploy2002: Started scap: Backport for Revert "Revert "ProductionServices.php: Promote pc2014 to pc3 master""
  • 07:07 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 40934
  • 07:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 983
  • 07:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 983
  • 01:22 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1012.eqiad.wmnet with OS bullseye
  • 01:11 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1012.eqiad.wmnet with OS bullseye
  • 00:22 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1012.eqiad.wmnet with OS bullseye
  • 00:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T348183)', diff saved to https://phabricator.wikimedia.org/P53482 and previous config saved to /var/cache/conftool/dbconfig/20231115-000545-arnaudb.json

2023-11-14

  • 23:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P53481 and previous config saved to /var/cache/conftool/dbconfig/20231114-235039-arnaudb.json
  • 23:37 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1012.eqiad.wmnet with OS bullseye
  • 23:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P53480 and previous config saved to /var/cache/conftool/dbconfig/20231114-233532-arnaudb.json
  • 23:26 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1012.eqiad.wmnet with OS bullseye
  • 23:20 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T348183)', diff saved to https://phabricator.wikimedia.org/P53479 and previous config saved to /var/cache/conftool/dbconfig/20231114-232026-arnaudb.json
  • 22:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T348183)', diff saved to https://phabricator.wikimedia.org/P53478 and previous config saved to /var/cache/conftool/dbconfig/20231114-225258-arnaudb.json
  • 22:52 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 22:52 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 22:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T348183)', diff saved to https://phabricator.wikimedia.org/P53477 and previous config saved to /var/cache/conftool/dbconfig/20231114-225236-arnaudb.json
  • 22:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P53476 and previous config saved to /var/cache/conftool/dbconfig/20231114-223730-arnaudb.json
  • 22:33 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1012.eqiad.wmnet with OS bullseye
  • 22:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P53474 and previous config saved to /var/cache/conftool/dbconfig/20231114-222224-arnaudb.json
  • 22:19 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aqs1012.eqiad.wmnet with OS bullseye
  • 22:07 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1012.eqiad.wmnet with OS bullseye
  • 22:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T348183)', diff saved to https://phabricator.wikimedia.org/P53473 and previous config saved to /var/cache/conftool/dbconfig/20231114-220717-arnaudb.json
  • 22:05 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1046.eqiad.wmnet with OS bookworm
  • 22:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T348183)', diff saved to https://phabricator.wikimedia.org/P53472 and previous config saved to /var/cache/conftool/dbconfig/20231114-220241-arnaudb.json
  • 22:02 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 22:02 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 22:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T348183)', diff saved to https://phabricator.wikimedia.org/P53471 and previous config saved to /var/cache/conftool/dbconfig/20231114-220220-arnaudb.json
  • 22:00 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1043.eqiad.wmnet with OS bookworm
  • 21:52 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1056.eqiad.wmnet with OS bookworm
  • 21:48 urbanecm@deploy2002: Finished scap: Backport for [Vector] enable Zebra CSS module on test wikis (T347711), PageRerenderSerializer: Match stream name with conventions (duration: 07m 36s)
  • 21:47 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P53470 and previous config saved to /var/cache/conftool/dbconfig/20231114-214713-arnaudb.json
  • 21:42 urbanecm@deploy2002: urbanecm and jdrewniak and ebernhardson: Continuing with sync
  • 21:42 urbanecm@deploy2002: urbanecm and jdrewniak and ebernhardson: Backport for [Vector] enable Zebra CSS module on test wikis (T347711), PageRerenderSerializer: Match stream name with conventions synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:40 urbanecm@deploy2002: Started scap: Backport for [Vector] enable Zebra CSS module on test wikis (T347711), PageRerenderSerializer: Match stream name with conventions
  • 21:39 urbanecm@deploy2002: Finished scap: Backport for [Zebra] Remove underline from pages with blank title (T351119) (duration: 09m 59s)
  • 21:34 urbanecm@deploy2002: urbanecm and jdrewniak: Continuing with sync
  • 21:33 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: host reimage
  • 21:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P53469 and previous config saved to /var/cache/conftool/dbconfig/20231114-213207-arnaudb.json
  • 21:31 urbanecm@deploy2002: urbanecm and jdrewniak: Backport for [Zebra] Remove underline from pages with blank title (T351119) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:30 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: host reimage
  • 21:29 urbanecm@deploy2002: Started scap: Backport for [Zebra] Remove underline from pages with blank title (T351119)
  • 21:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1056.eqiad.wmnet with reason: host reimage
  • 21:23 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1046.eqiad.wmnet with OS bookworm
  • 21:23 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1056.eqiad.wmnet with reason: host reimage
  • 21:21 urbanecm@deploy2002: Finished scap: Backport for Deploy Reader Demographics 2 survey on enwiki (T344393), throttle.php: Cleanup old rules, add new one (T351002) (duration: 06m 49s)
  • 21:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T348183)', diff saved to https://phabricator.wikimedia.org/P53468 and previous config saved to /var/cache/conftool/dbconfig/20231114-211700-arnaudb.json
  • 21:16 robh@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1043.eqiad.wmnet with OS bookworm
  • 21:16 urbanecm@deploy2002: dani and urbanecm and zoranzoki21: Continuing with sync
  • 21:15 urbanecm@deploy2002: dani and urbanecm and zoranzoki21: Backport for Deploy Reader Demographics 2 survey on enwiki (T344393), throttle.php: Cleanup old rules, add new one (T351002) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:14 urbanecm@deploy2002: Started scap: Backport for Deploy Reader Demographics 2 survey on enwiki (T344393), throttle.php: Cleanup old rules, add new one (T351002)
  • 21:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T348183)', diff saved to https://phabricator.wikimedia.org/P53467 and previous config saved to /var/cache/conftool/dbconfig/20231114-211231-arnaudb.json
  • 21:12 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 21:12 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 21:12 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1043.eqiad.wmnet with OS bullseye
  • 21:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T348183)', diff saved to https://phabricator.wikimedia.org/P53466 and previous config saved to /var/cache/conftool/dbconfig/20231114-211209-arnaudb.json
  • 21:09 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1055.eqiad.wmnet with OS bookworm
  • 21:09 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1056.eqiad.wmnet with OS bookworm
  • 21:03 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1054.eqiad.wmnet with OS bookworm
  • 20:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P53465 and previous config saved to /var/cache/conftool/dbconfig/20231114-205703-arnaudb.json
  • 20:54 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1053.eqiad.wmnet with OS bookworm
  • 20:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts phab-test1001.eqiad.wmnet
  • 20:51 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:51 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: phab-test1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin1001"
  • 20:49 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: phab-test1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin1001"
  • 20:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: host reimage
  • 20:47 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 20:47 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1055.eqiad.wmnet with reason: host reimage
  • 20:46 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: host reimage
  • 20:44 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1055.eqiad.wmnet with reason: host reimage
  • 20:43 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts phab-test1001.eqiad.wmnet
  • 20:42 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1054.eqiad.wmnet with reason: host reimage
  • 20:42 mutante: destroying phab-test1001.eqiad.wmnet - T351115
  • 20:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P53464 and previous config saved to /var/cache/conftool/dbconfig/20231114-204156-arnaudb.json
  • 20:41 mutante: doc2002 - systemctl start rsync-doc-host-data-sync - failed unit after maintenance reboot
  • 20:39 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1054.eqiad.wmnet with reason: host reimage
  • 20:33 robh@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1043.eqiad.wmnet with OS bullseye
  • 20:32 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1043.eqiad.wmnet with OS bullseye
  • 20:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on doc1003.eqiad.wmnet with reason: maintenance
  • 20:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on doc1003.eqiad.wmnet with reason: maintenance
  • 20:30 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1055.eqiad.wmnet with OS bookworm
  • 20:30 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on doc2002.codfw.wmnet with reason: maintenance
  • 20:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on doc2002.codfw.wmnet with reason: maintenance
  • 20:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
  • 20:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T348183)', diff saved to https://phabricator.wikimedia.org/P53463 and previous config saved to /var/cache/conftool/dbconfig/20231114-202650-arnaudb.json
  • 20:25 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1054.eqiad.wmnet with OS bookworm
  • 20:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on people2003.codfw.wmnet with reason: maintenance
  • 20:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
  • 20:25 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on people2003.codfw.wmnet with reason: maintenance
  • 20:24 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1052.eqiad.wmnet with OS bookworm
  • 20:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on people1004.eqiad.wmnet with reason: maintenance
  • 20:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on people1004.eqiad.wmnet with reason: maintenance
  • 20:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T348183)', diff saved to https://phabricator.wikimedia.org/P53462 and previous config saved to /var/cache/conftool/dbconfig/20231114-202232-arnaudb.json
  • 20:22 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 20:22 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 20:22 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 20:21 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 20:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T348183)', diff saved to https://phabricator.wikimedia.org/P53461 and previous config saved to /var/cache/conftool/dbconfig/20231114-202154-arnaudb.json
  • 20:21 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: doc
  • 20:21 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:21 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:17 dzahn@cumin1001: START - Cookbook sre.puppet.migrate-role for role: doc
  • 20:11 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1053.eqiad.wmnet with OS bookworm
  • 20:09 robh@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1043.eqiad.wmnet with OS bullseye
  • 20:08 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host doc2002.codfw.wmnet
  • 20:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P53460 and previous config saved to /var/cache/conftool/dbconfig/20231114-200648-arnaudb.json
  • 20:04 robh@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1043']
  • 20:03 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1043']
  • 20:02 dzahn@cumin1001: START - Cookbook sre.puppet.migrate-host for host doc2002.codfw.wmnet
  • 20:01 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
  • 19:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1050.eqiad.wmnet with OS bookworm
  • 19:57 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: etherpad
  • 19:57 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
  • 19:52 dzahn@cumin1001: START - Cookbook sre.puppet.migrate-role for role: etherpad
  • 19:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P53459 and previous config saved to /var/cache/conftool/dbconfig/20231114-195141-arnaudb.json
  • 19:41 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1052.eqiad.wmnet with OS bookworm
  • 19:40 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1049.eqiad.wmnet with OS bookworm
  • 19:39 sfaci@deploy2002: Finished deploy [analytics/refinery@2f94afe] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2f94afe0] (duration: 03m 14s)
  • 19:36 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1050.eqiad.wmnet with reason: host reimage
  • 19:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T348183)', diff saved to https://phabricator.wikimedia.org/P53458 and previous config saved to /var/cache/conftool/dbconfig/20231114-193635-arnaudb.json
  • 19:36 sfaci@deploy2002: Started deploy [analytics/refinery@2f94afe] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2f94afe0]
  • 19:35 sfaci@deploy2002: Finished deploy [analytics/refinery@2f94afe] (thin): Regular analytics weekly train THIN [analytics/refinery@2f94afe0] (duration: 00m 06s)
  • 19:35 sfaci@deploy2002: Started deploy [analytics/refinery@2f94afe] (thin): Regular analytics weekly train THIN [analytics/refinery@2f94afe0]
  • 19:33 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1050.eqiad.wmnet with reason: host reimage
  • 19:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T348183)', diff saved to https://phabricator.wikimedia.org/P53457 and previous config saved to /var/cache/conftool/dbconfig/20231114-193217-arnaudb.json
  • 19:32 sfaci@deploy2002: Finished deploy [analytics/refinery@2f94afe]: Regular analytics weekly train [analytics/refinery@2f94afe0] (duration: 07m 04s)
  • 19:32 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 19:32 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 19:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T348183)', diff saved to https://phabricator.wikimedia.org/P53456 and previous config saved to /var/cache/conftool/dbconfig/20231114-193156-arnaudb.json
  • 19:25 sfaci@deploy2002: Started deploy [analytics/refinery@2f94afe]: Regular analytics weekly train [analytics/refinery@2f94afe0]
  • 19:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on moscovium.eqiad.wmnet with reason: maintenance
  • 19:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on moscovium.eqiad.wmnet with reason: maintenance
  • 19:18 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1050.eqiad.wmnet with OS bookworm
  • 19:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P53455 and previous config saved to /var/cache/conftool/dbconfig/20231114-191649-arnaudb.json
  • 19:16 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1011.eqiad.wmnet with OS bullseye
  • 19:15 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1049.eqiad.wmnet with reason: host reimage
  • 19:14 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.5 refs T350081
  • 19:13 ejegg: fundraising civicrm upgraded from 88361167 to ec6992e0
  • 19:12 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1049.eqiad.wmnet with reason: host reimage
  • 19:04 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: stewards
  • 19:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P53454 and previous config saved to /var/cache/conftool/dbconfig/20231114-190143-arnaudb.json
  • 18:58 dzahn@cumin1001: START - Cookbook sre.puppet.migrate-role for role: stewards
  • 18:56 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1049.eqiad.wmnet with OS bookworm
  • 18:53 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1048.eqiad.wmnet with OS bookworm
  • 18:53 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1011.eqiad.wmnet with reason: host reimage
  • 18:50 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1047.eqiad.wmnet with OS bookworm
  • 18:50 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1011.eqiad.wmnet with reason: host reimage
  • 18:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T348183)', diff saved to https://phabricator.wikimedia.org/P53453 and previous config saved to /var/cache/conftool/dbconfig/20231114-184637-arnaudb.json
  • 18:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T348183)', diff saved to https://phabricator.wikimedia.org/P53452 and previous config saved to /var/cache/conftool/dbconfig/20231114-184204-arnaudb.json
  • 18:41 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 18:41 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 18:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T348183)', diff saved to https://phabricator.wikimedia.org/P53451 and previous config saved to /var/cache/conftool/dbconfig/20231114-184142-arnaudb.json
  • 18:36 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1011.eqiad.wmnet with OS bullseye
  • 18:33 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1046.eqiad.wmnet with OS bookworm
  • 18:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1048.eqiad.wmnet with reason: host reimage
  • 18:27 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1048.eqiad.wmnet with reason: host reimage
  • 18:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P53450 and previous config saved to /var/cache/conftool/dbconfig/20231114-182636-arnaudb.json
  • 18:22 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1047.eqiad.wmnet with reason: host reimage
  • 18:19 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1047.eqiad.wmnet with reason: host reimage
  • 18:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P53449 and previous config saved to /var/cache/conftool/dbconfig/20231114-181130-arnaudb.json
  • 18:11 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1048.eqiad.wmnet with OS bookworm
  • 18:04 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bookworm
  • 17:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T348183)', diff saved to https://phabricator.wikimedia.org/P53448 and previous config saved to /var/cache/conftool/dbconfig/20231114-175623-arnaudb.json
  • 17:55 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 17:54 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-role (exit_code=99) for role: wmcs::openstack::codfw1dev::control
  • 17:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T348183)', diff saved to https://phabricator.wikimedia.org/P53447 and previous config saved to /var/cache/conftool/dbconfig/20231114-175202-arnaudb.json
  • 17:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 17:51 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 17:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T348183)', diff saved to https://phabricator.wikimedia.org/P53446 and previous config saved to /var/cache/conftool/dbconfig/20231114-175140-arnaudb.json
  • 17:45 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 17:43 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::openstack::codfw1dev::control
  • 17:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P53445 and previous config saved to /var/cache/conftool/dbconfig/20231114-173634-arnaudb.json
  • 17:21 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1043.eqiad.wmnet with OS bookworm
  • 17:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P53444 and previous config saved to /var/cache/conftool/dbconfig/20231114-172127-arnaudb.json
  • 17:12 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1046.eqiad.wmnet with OS bookworm
  • 17:12 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1046.eqiad.wmnet with OS bookworm
  • 17:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T348183)', diff saved to https://phabricator.wikimedia.org/P53442 and previous config saved to /var/cache/conftool/dbconfig/20231114-170621-arnaudb.json
  • 17:03 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 17:02 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 17:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T348183)', diff saved to https://phabricator.wikimedia.org/P53441 and previous config saved to /var/cache/conftool/dbconfig/20231114-170158-arnaudb.json
  • 17:02 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 17:01 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 17:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T348183)', diff saved to https://phabricator.wikimedia.org/P53440 and previous config saved to /var/cache/conftool/dbconfig/20231114-170136-arnaudb.json
  • 16:50 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@0ae1184]: make cirrus index imports world readable in hdfs (duration: 00m 28s)
  • 16:50 ebernhardson@deploy2002: Started deploy [airflow-dags/search@0ae1184]: make cirrus index imports world readable in hdfs
  • 16:47 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 16:47 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 16:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P53438 and previous config saved to /var/cache/conftool/dbconfig/20231114-164630-arnaudb.json
  • 16:44 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1044.eqiad.wmnet with OS bookworm
  • 16:37 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1046.eqiad.wmnet with OS bookworm
  • 16:35 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@017fbf1]: search: clean wcqs revision map (duration: 00m 29s)
  • 16:34 ebernhardson@deploy2002: Started deploy [airflow-dags/search@017fbf1]: search: clean wcqs revision map
  • 16:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P53437 and previous config saved to /var/cache/conftool/dbconfig/20231114-163123-arnaudb.json
  • 16:30 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts1002.eqiad.wmnet
  • 16:26 aokoth@cumin1001: START - Cookbook sre.hosts.reboot-single for host vrts1002.eqiad.wmnet
  • 16:17 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
  • 16:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T348183)', diff saved to https://phabricator.wikimedia.org/P53436 and previous config saved to /var/cache/conftool/dbconfig/20231114-161617-arnaudb.json
  • 16:14 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
  • 16:14 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1046.eqiad.wmnet with OS bookworm
  • 16:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T348183)', diff saved to https://phabricator.wikimedia.org/P53435 and previous config saved to /var/cache/conftool/dbconfig/20231114-161157-arnaudb.json
  • 16:11 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 16:11 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 16:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: insetup::serviceops_collab
  • 16:11 brennen@deploy2002: Finished deploy [phabricator/deployment@0b76984]: deploy to phab1004 for T350876 (duration: 01m 04s)
  • 16:09 brennen@deploy2002: Started deploy [phabricator/deployment@0b76984]: deploy to phab1004 for T350876
  • 16:09 brennen@deploy2002: Finished deploy [phabricator/deployment@0b76984]: test deploy to phab2002 for T350876 (duration: 00m 32s)
  • 16:09 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 16:08 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 16:08 brennen@deploy2002: Started deploy [phabricator/deployment@0b76984]: test deploy to phab2002 for T350876
  • 16:06 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 16:06 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 16:04 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 16:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 16:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T348183)', diff saved to https://phabricator.wikimedia.org/P53434 and previous config saved to /var/cache/conftool/dbconfig/20231114-160356-arnaudb.json
  • 16:01 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: insetup::serviceops_collab
  • 16:00 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1043.eqiad.wmnet with OS bookworm
  • 16:00 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1044.eqiad.wmnet with OS bookworm
  • 15:59 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1043.eqiad.wmnet with OS bookworm
  • 15:59 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1044.eqiad.wmnet with OS bookworm
  • 15:53 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 15:53 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 15:50 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 15:50 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 15:49 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host vrts1002.eqiad.wmnet
  • 15:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P53433 and previous config saved to /var/cache/conftool/dbconfig/20231114-154850-arnaudb.json
  • 15:48 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1043.eqiad.wmnet with OS bookworm
  • 15:48 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1044.eqiad.wmnet with OS bookworm
  • 15:47 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 15:46 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 15:40 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1044']
  • 15:40 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1046']
  • 15:40 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1043']
  • 15:40 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1043']
  • 15:39 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1044']
  • 15:39 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1046']
  • 15:39 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1043']
  • 15:39 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1044']
  • 15:39 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1043']
  • 15:39 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1046']
  • 15:39 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1044']
  • 15:38 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1043']
  • 15:38 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1044']
  • 15:37 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1043']
  • 15:37 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1043']
  • 15:35 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1044']
  • 15:35 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1044']
  • 15:34 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host vrts1002.eqiad.wmnet
  • 15:33 arnaudb@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: Host failed to be depooled properly', diff saved to https://phabricator.wikimedia.org/P53432 and previous config saved to /var/cache/conftool/dbconfig/20231114-153355-arnaudb.json
  • 15:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P53431 and previous config saved to /var/cache/conftool/dbconfig/20231114-153344-arnaudb.json
  • 15:32 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 15:32 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 15:29 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 15:29 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 15:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1043']
  • 15:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1043']
  • 15:26 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1044']
  • 15:25 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1044']
  • 15:23 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 15:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: mariadb::analytics_replica
  • 15:22 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 15:22 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1046']
  • 15:21 btullis@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
  • 15:20 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1043']
  • 15:18 arnaudb@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 90%: Host failed to be depooled properly', diff saved to https://phabricator.wikimedia.org/P53430 and previous config saved to /var/cache/conftool/dbconfig/20231114-151850-arnaudb.json
  • 15:18 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1044']
  • 15:17 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudrabbit1003']
  • 15:17 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudrabbit1003']
  • 15:17 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudrabbit1003']
  • 15:16 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudrabbit1003']
  • 15:16 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudrabbit1003']
  • 15:16 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudrabbit1003']
  • 15:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1236 (T348183)', diff saved to https://phabricator.wikimedia.org/P53428 and previous config saved to /var/cache/conftool/dbconfig/20231114-151602-arnaudb.json
  • 15:15 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Maintenance
  • 15:15 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Maintenance
  • 15:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T348183)', diff saved to https://phabricator.wikimedia.org/P53427 and previous config saved to /var/cache/conftool/dbconfig/20231114-151541-arnaudb.json
  • 15:10 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: mariadb::analytics_replica
  • 15:03 arnaudb@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 75%: Host failed to be depooled properly', diff saved to https://phabricator.wikimedia.org/P53426 and previous config saved to /var/cache/conftool/dbconfig/20231114-150345-arnaudb.json
  • 15:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P53425 and previous config saved to /var/cache/conftool/dbconfig/20231114-150034-arnaudb.json
  • 14:58 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wmcs::openstack::codfw1dev::backups
  • 14:53 kamila@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:52 kamila@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 14:52 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 14:51 kamila@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:50 btullis@cumin1001: START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
  • 14:50 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wmcs::openstack::codfw1dev::backups
  • 14:48 arnaudb@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 60%: Host failed to be depooled properly', diff saved to https://phabricator.wikimedia.org/P53423 and previous config saved to /var/cache/conftool/dbconfig/20231114-144840-arnaudb.json
  • 14:46 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 14:46 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 14:45 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 14:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P53421 and previous config saved to /var/cache/conftool/dbconfig/20231114-144528-arnaudb.json
  • 14:45 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 14:44 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 14:44 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 14:42 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bullseye
  • 14:38 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 14:38 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 14:33 arnaudb@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 45%: Host failed to be depooled properly', diff saved to https://phabricator.wikimedia.org/P53420 and previous config saved to /var/cache/conftool/dbconfig/20231114-143335-arnaudb.json
  • 14:32 fabfur: swapped cp1105 <-> cp1080 (T349244)
  • 14:32 urbanecm@deploy2002: Finished scap: Backport for IP Masking: Expire temporary accounts in 1 year (T344695), TempUser: Fix unchecked array access for optional key, IP Masking: Add expireTemporaryAccounts.php (T344695) (duration: 07m 03s)
  • 14:31 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1105.eqiad.wmnet
  • 14:31 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp1105.eqiad.wmnet
  • 14:30 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T348183)', diff saved to https://phabricator.wikimedia.org/P53418 and previous config saved to /var/cache/conftool/dbconfig/20231114-143021-arnaudb.json
  • 14:30 bking@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts search-loader2001.codfw.wmnet,search-loader1001.eqiad.wmnet
  • 14:30 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:30 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: search-loader2001.codfw.wmnet,search-loader1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
  • 14:29 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: search-loader2001.codfw.wmnet,search-loader1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
  • 14:28 fabfur: swapped cp1104 <-> cp1079 (T349244)
  • 14:27 fnegri@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1046.eqiad.wmnet with OS bookworm
  • 14:26 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 14:26 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 14:26 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1104.eqiad.wmnet
  • 14:26 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp1104.eqiad.wmnet
  • 14:26 urbanecm@deploy2002: urbanecm: Backport for IP Masking: Expire temporary accounts in 1 year (T344695), TempUser: Fix unchecked array access for optional key, IP Masking: Add expireTemporaryAccounts.php (T344695) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1227 (T348183)', diff saved to https://phabricator.wikimedia.org/P53417 and previous config saved to /var/cache/conftool/dbconfig/20231114-142608-arnaudb.json
  • 14:26 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: Maintenance
  • 14:25 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: Maintenance
  • 14:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T348183)', diff saved to https://phabricator.wikimedia.org/P53416 and previous config saved to /var/cache/conftool/dbconfig/20231114-142547-arnaudb.json
  • 14:24 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage
  • 14:22 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage
  • 14:20 ayounsi@cumin1001: START - Cookbook sre.hosts.dhcp for host sretest1004.eqiad.wmnet
  • 14:20 bking@cumin2002: START - Cookbook sre.hosts.decommission for hosts search-loader2001.codfw.wmnet,search-loader1001.eqiad.wmnet
  • 14:20 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host sretest1004.eqiad.wmnet
  • 14:18 arnaudb@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 30%: Host failed to be depooled properly', diff saved to https://phabricator.wikimedia.org/P53415 and previous config saved to /var/cache/conftool/dbconfig/20231114-141830-arnaudb.json
  • 14:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P53414 and previous config saved to /var/cache/conftool/dbconfig/20231114-141041-arnaudb.json
  • 14:04 brouberol@cumin1001: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bullseye
  • 14:03 arnaudb@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 15%: Host failed to be depooled properly', diff saved to https://phabricator.wikimedia.org/P53413 and previous config saved to /var/cache/conftool/dbconfig/20231114-140325-arnaudb.json
  • 13:55 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P53412 and previous config saved to /var/cache/conftool/dbconfig/20231114-135534-arnaudb.json
  • 13:48 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: ml_cache::storage
  • 13:45 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1138.eqiad.wmnet onto db1238.eqiad.wmnet
  • 13:43 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 13:43 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
  • 13:42 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 13:42 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 13:41 klausman@cumin1001: START - Cookbook sre.puppet.migrate-role for role: ml_cache::storage
  • 13:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T348183)', diff saved to https://phabricator.wikimedia.org/P53411 and previous config saved to /var/cache/conftool/dbconfig/20231114-134028-arnaudb.json
  • 13:38 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ml-cache1003.eqiad.wmnet
  • 13:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1202 (T348183)', diff saved to https://phabricator.wikimedia.org/P53410 and previous config saved to /var/cache/conftool/dbconfig/20231114-133755-arnaudb.json
  • 13:37 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 13:37 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 13:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T348183)', diff saved to https://phabricator.wikimedia.org/P53409 and previous config saved to /var/cache/conftool/dbconfig/20231114-133734-arnaudb.json
  • 13:34 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-cache1003.eqiad.wmnet
  • 13:30 taavi@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cloudcontrol2005-dev.codfw.wmnet
  • 13:29 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: releases
  • 13:26 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ml-cache2003.codfw.wmnet
  • 13:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P53408 and previous config saved to /var/cache/conftool/dbconfig/20231114-132227-arnaudb.json
  • 13:20 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-cache2003.codfw.wmnet
  • 13:20 ayounsi@cumin1001: START - Cookbook sre.hosts.dhcp for host sretest1004.eqiad.wmnet
  • 13:19 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host sretest1004.eqiad.wmnet
  • 13:19 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: releases
  • 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_cluster::mariadb
  • 13:10 taavi@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol2005-dev.codfw.wmnet
  • 13:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P53407 and previous config saved to /var/cache/conftool/dbconfig/20231114-130721-arnaudb.json
  • 13:06 ayounsi@cumin1001: START - Cookbook sre.hosts.dhcp for host sretest1004.eqiad.wmnet
  • 13:05 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1046.eqiad.wmnet with OS bookworm
  • 13:02 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: analytics_cluster::mariadb
  • 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: mariadb::misc::analytics::backup
  • 12:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T348183)', diff saved to https://phabricator.wikimedia.org/P53406 and previous config saved to /var/cache/conftool/dbconfig/20231114-125214-arnaudb.json
  • 12:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T348183)', diff saved to https://phabricator.wikimedia.org/P53405 and previous config saved to /var/cache/conftool/dbconfig/20231114-124942-arnaudb.json
  • 12:49 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 12:49 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 12:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T348183)', diff saved to https://phabricator.wikimedia.org/P53404 and previous config saved to /var/cache/conftool/dbconfig/20231114-124921-arnaudb.json
  • 12:48 hashar@deploy2002: Finished deploy [gerrit/gerrit@a087269]: Plugin to process Puppet Catalog Compiler results - https://gerrit.wikimedia.org/r/969981 (duration: 00m 07s)
  • 12:48 hashar@deploy2002: Started deploy [gerrit/gerrit@a087269]: Plugin to process Puppet Catalog Compiler results - https://gerrit.wikimedia.org/r/969981
  • 12:46 hashar@deploy2002: Finished deploy [gerrit/gerrit@a087269]: Plugin to process Puppet Catalog Compiler results - https://gerrit.wikimedia.org/r/969981 (duration: 00m 04s)
  • 12:46 hashar@deploy2002: Started deploy [gerrit/gerrit@a087269]: Plugin to process Puppet Catalog Compiler results - https://gerrit.wikimedia.org/r/969981
  • 12:42 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:42 kamila@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:41 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: mariadb::misc::analytics::backup
  • 12:37 kamila@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:37 kamila@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 12:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P53403 and previous config saved to /var/cache/conftool/dbconfig/20231114-123414-arnaudb.json
  • 12:33 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: mariadb::misc::analytics::backup
  • 12:20 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2002.codfw.wmnet with OS bullseye
  • 12:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P53402 and previous config saved to /var/cache/conftool/dbconfig/20231114-121908-arnaudb.json
  • 12:17 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: gitlab
  • 12:08 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
  • 12:08 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/page-analytics: apply
  • 12:07 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
  • 12:06 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: gitlab
  • 12:06 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
  • 12:06 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
  • 12:05 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/page-analytics: apply
  • 12:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T348183)', diff saved to https://phabricator.wikimedia.org/P53401 and previous config saved to /var/cache/conftool/dbconfig/20231114-120401-arnaudb.json
  • 12:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1191 (T348183)', diff saved to https://phabricator.wikimedia.org/P53400 and previous config saved to /var/cache/conftool/dbconfig/20231114-120129-arnaudb.json
  • 12:02 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_cluster::presto::server
  • 12:01 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 12:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T348183)', diff saved to https://phabricator.wikimedia.org/P53399 and previous config saved to /var/cache/conftool/dbconfig/20231114-120108-arnaudb.json
  • 11:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P53398 and previous config saved to /var/cache/conftool/dbconfig/20231114-114602-arnaudb.json
  • 11:45 moritzm: imported xdebug 3.0.3+2.9.8+2.8.1+2.5.5-0+deb11u1+wmf1+bullseye1 to component/php74 for bullseye-wikimedia
  • 11:40 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: analytics_cluster::presto::server
  • 11:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host gitlab1003.wikimedia.org
  • 11:30 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P53397 and previous config saved to /var/cache/conftool/dbconfig/20231114-113055-arnaudb.json
  • 11:25 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host gitlab1003.wikimedia.org
  • 11:18 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host an-presto1001.eqiad.wmnet
  • 11:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T348183)', diff saved to https://phabricator.wikimedia.org/P53396 and previous config saved to /var/cache/conftool/dbconfig/20231114-111549-arnaudb.json
  • 11:15 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: ml_k8s::staging::worker
  • 11:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T348183)', diff saved to https://phabricator.wikimedia.org/P53395 and previous config saved to /var/cache/conftool/dbconfig/20231114-111316-arnaudb.json
  • 11:13 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 11:12 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 11:11 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 11:10 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 11:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T348183)', diff saved to https://phabricator.wikimedia.org/P53394 and previous config saved to /var/cache/conftool/dbconfig/20231114-111037-arnaudb.json
  • 11:09 klausman@cumin1001: START - Cookbook sre.puppet.migrate-role for role: ml_k8s::staging::worker
  • 10:58 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ml-staging2001.codfw.wmnet
  • 10:57 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host an-presto1001.eqiad.wmnet
  • 10:55 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P53393 and previous config saved to /var/cache/conftool/dbconfig/20231114-105530-arnaudb.json
  • 10:55 moritzm: imported php-msgpack 2.1.2+0.5.7-2+wmf1+bullseye1 to component/php74 for bullseye-wikimedia
  • 10:54 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1138.eqiad.wmnet onto db1238.eqiad.wmnet
  • 10:50 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-staging2001.codfw.wmnet
  • 10:48 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: ml_k8s::staging::master
  • 10:46 arnaudb@cumin1001: dbctl commit (dc=all): 'migrate db1138 to db1238 - T344036', diff saved to https://phabricator.wikimedia.org/P53392 and previous config saved to /var/cache/conftool/dbconfig/20231114-104603-arnaudb.json
  • 10:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P53391 and previous config saved to /var/cache/conftool/dbconfig/20231114-104024-arnaudb.json
  • 10:40 klausman@cumin1001: START - Cookbook sre.puppet.migrate-role for role: ml_k8s::staging::master
  • 10:39 arnaudb@cumin1001: dbctl commit (dc=all): 'T351184 - weight mirror', diff saved to https://phabricator.wikimedia.org/P53390 and previous config saved to /var/cache/conftool/dbconfig/20231114-103941-arnaudb.json
  • 10:38 moritzm: imported php-redis 5.3.2+4.3.0-2+deb11u1+wmf2+bullseye1 to component/php74 for bullseye-wikimedia
  • 10:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Promote db1160 to s4 primary T351184', diff saved to https://phabricator.wikimedia.org/P53389 and previous config saved to /var/cache/conftool/dbconfig/20231114-103601-arnaudb.json
  • 10:34 arnaudb: Starting s4 eqiad failover from db1138 to db1160 - T351184
  • 10:33 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ml-staging-ctrl2002.codfw.wmnet
  • 10:26 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-staging-ctrl2002.codfw.wmnet
  • 10:26 jnuche@deploy2002: Pruned MediaWiki: 1.42.0-wmf.3 (duration: 02m 06s)
  • 10:25 moritzm: imported 5.1.19+4.0.11-3+wmf2+bullseye1 to component/php74 for bullseye-wikimedia
  • 10:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T348183)', diff saved to https://phabricator.wikimedia.org/P53388 and previous config saved to /var/cache/conftool/dbconfig/20231114-102517-arnaudb.json
  • 10:24 jnuche@deploy2002: Finished scap: testwikis wikis to 1.42.0-wmf.5 refs T350081 (duration: 20m 19s)
  • 10:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T348183)', diff saved to https://phabricator.wikimedia.org/P53387 and previous config saved to /var/cache/conftool/dbconfig/20231114-102206-arnaudb.json
  • 10:22 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 10:21 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 10:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T348183)', diff saved to https://phabricator.wikimedia.org/P53386 and previous config saved to /var/cache/conftool/dbconfig/20231114-102145-arnaudb.json
  • 10:15 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: etcd::v3::ml_etcd::staging
  • 10:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Set db1160 with weight 0 T351184', diff saved to https://phabricator.wikimedia.org/P53385 and previous config saved to /var/cache/conftool/dbconfig/20231114-100843-arnaudb.json
  • 10:08 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 T351184
  • 10:07 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 T351184
  • 10:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P53384 and previous config saved to /var/cache/conftool/dbconfig/20231114-100638-arnaudb.json
  • 10:05 klausman@cumin1001: START - Cookbook sre.puppet.migrate-role for role: etcd::v3::ml_etcd::staging
  • 10:03 jnuche@deploy2002: Started scap: testwikis wikis to 1.42.0-wmf.5 refs T350081
  • 09:54 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2002.codfw.wmnet with reason: host reimage
  • 09:53 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: provisionning db1238.eqiad.wmnet - T344036
  • 09:53 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: provisionning db1238.eqiad.wmnet - T344036
  • 09:53 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: provisionning db1238.eqiad.wmnet - T344036
  • 09:53 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: provisionning db1238.eqiad.wmnet - T344036
  • 09:53 marostegui@deploy2002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc2014 to pc3 master" (duration: 07m 26s)
  • 09:52 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2002.codfw.wmnet with reason: host reimage
  • 09:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P53383 and previous config saved to /var/cache/conftool/dbconfig/20231114-095132-arnaudb.json
  • 09:47 marostegui@deploy2002: marostegui: Continuing with sync
  • 09:47 marostegui@deploy2002: marostegui: Backport for Revert "ProductionServices.php: Promote pc2014 to pc3 master" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:45 marostegui@deploy2002: Started scap: Backport for Revert "ProductionServices.php: Promote pc2014 to pc3 master"
  • 09:45 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ml-staging-etcd2003.codfw.wmnet
  • 09:39 marostegui@deploy2002: Finished scap: Backport for ProductionServices.php: Promote pc2014 to pc3 master (duration: 07m 11s)
  • 09:38 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 09:38 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 09:36 jayme: reimaging kubestage2002 to verify with puppet7
  • 09:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T348183)', diff saved to https://phabricator.wikimedia.org/P53380 and previous config saved to /var/cache/conftool/dbconfig/20231114-093625-arnaudb.json
  • 09:34 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2002.codfw.wmnet with OS bullseye
  • 09:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T348183)', diff saved to https://phabricator.wikimedia.org/P53379 and previous config saved to /var/cache/conftool/dbconfig/20231114-093353-arnaudb.json
  • 09:34 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 09:34 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 09:34 marostegui@deploy2002: marostegui: Continuing with sync
  • 09:34 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 09:34 marostegui@deploy2002: marostegui: Backport for ProductionServices.php: Promote pc2014 to pc3 master synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:33 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 09:32 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 09:32 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 09:32 marostegui@deploy2002: Started scap: Backport for ProductionServices.php: Promote pc2014 to pc3 master
  • 09:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc[2013-2014].codfw.wmnet,pc[1013-1014].eqiad.wmnet with reason: Switch
  • 09:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on pc[2013-2014].codfw.wmnet,pc[1013-1014].eqiad.wmnet with reason: Switch
  • 09:30 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 09:30 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 09:28 klausman@cumin1001: START - Cookbook sre.puppet.migrate-host for host ml-staging-etcd2003.codfw.wmnet
  • 09:26 marostegui@deploy2002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc3 master" (duration: 07m 02s)
  • 09:26 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 09:25 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 09:21 marostegui@deploy2002: marostegui: Continuing with sync
  • 09:20 marostegui@deploy2002: marostegui: Backport for Revert "ProductionServices.php: Promote pc1014 to pc3 master" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:19 marostegui@deploy2002: Started scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc3 master"
  • 09:12 marostegui@deploy2002: Finished scap: Backport for ProductionServices.php: Promote pc1014 to pc3 master (duration: 07m 24s)
  • 09:07 marostegui@deploy2002: marostegui: Continuing with sync
  • 09:06 marostegui@deploy2002: marostegui: Backport for ProductionServices.php: Promote pc1014 to pc3 master synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:05 marostegui@deploy2002: Started scap: Backport for ProductionServices.php: Promote pc1014 to pc3 master
  • 09:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc[2013-2014].codfw.wmnet,pc[1013-1014].eqiad.wmnet with reason: Switch
  • 09:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc[2013-2014].codfw.wmnet,pc[1013-1014].eqiad.wmnet with reason: Switch
  • 08:56 godog: add 80g to prometheus/k8s-ml-serve in eqiad
  • 08:56 godog: add 80g to prometheus/ops in eqiad
  • 08:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1164.eqiad.wmnet with OS bookworm
  • 08:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1164.eqiad.wmnet with reason: host reimage
  • 08:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1164.eqiad.wmnet with reason: host reimage
  • 08:16 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1164.eqiad.wmnet with OS bookworm
  • 08:06 moritzm: installing nghttp2 security updates
  • 08:04 marostegui: Failover m1 from db1164 to db1119 - T350022
  • 07:59 moritzm: installing dbus security updates on bullseye
  • 07:39 jynus: stop bacula dir (and puppet) at backup1001 T350022
  • 07:27 vgutierrez: include golang-github-mmatczuk-anyflag_0.0~git20231026.5f42d2f in apt.wm.org (bookworm)
  • 07:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2132,2160].codfw.wmnet,db[1119,1164,1217].eqiad.wmnet with reason: Switch
  • 07:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2132,2160].codfw.wmnet,db[1119,1164,1217].eqiad.wmnet with reason: Switch
  • 05:45 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1044.eqiad.wmnet with OS bookworm
  • 05:42 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1045.eqiad.wmnet with OS bookworm
  • 05:20 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: host reimage
  • 05:18 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1044.eqiad.wmnet with OS bookworm
  • 05:18 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1044.eqiad.wmnet with OS bookworm
  • 05:17 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: host reimage
  • 05:05 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1043.eqiad.wmnet with OS bookworm
  • 05:03 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1045.eqiad.wmnet with OS bookworm
  • 04:58 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1044.eqiad.wmnet with OS bookworm
  • 04:54 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.42.0-wmf.5 refs T350081 (duration: 51m 15s)
  • 04:04 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: host reimage
  • 04:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.42.0-wmf.5 refs T350081
  • 04:01 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: host reimage
  • 03:49 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1043.eqiad.wmnet with OS bookworm
  • 03:49 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1041.eqiad.wmnet with OS bookworm
  • 03:48 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1042.eqiad.wmnet with OS bookworm
  • 03:46 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1042.eqiad.wmnet with OS bookworm
  • 03:43 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1040.eqiad.wmnet with OS bookworm
  • 03:33 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1042.eqiad.wmnet with OS bookworm
  • 03:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: host reimage
  • 03:22 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: host reimage
  • 03:16 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: host reimage
  • 03:13 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: host reimage
  • 03:09 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1041.eqiad.wmnet with OS bookworm
  • 03:08 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1039.eqiad.wmnet with OS bookworm
  • 02:59 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1040.eqiad.wmnet with OS bookworm
  • 02:46 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1039.eqiad.wmnet with reason: host reimage
  • 02:43 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1039.eqiad.wmnet with reason: host reimage
  • 02:26 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1039.eqiad.wmnet with OS bookworm
  • 00:33 dzahn@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host stewards1001.eqiad.wmnet
  • 00:31 dzahn@cumin1001: START - Cookbook sre.puppet.migrate-host for host stewards1001.eqiad.wmnet
  • 00:27 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host stewards1001.eqiad.wmnet with OS bookworm
  • 00:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on stewards1001.eqiad.wmnet with reason: host reimage
  • 00:11 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on stewards1001.eqiad.wmnet with reason: host reimage
  • 00:03 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host stewards1001.eqiad.wmnet with OS bookworm

2023-11-13

  • 23:57 dzahn@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host stewards1001.eqiad.wmnet
  • 23:57 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host stewards1001.eqiad.wmnet with OS bookworm
  • 23:41 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1021.eqiad.wmnet
  • 23:33 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host clouddb1021.eqiad.wmnet
  • 23:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on stewards1001.eqiad.wmnet with reason: host reimage
  • 23:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on stewards1001.eqiad.wmnet with reason: host reimage
  • 23:13 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host stewards1001.eqiad.wmnet with OS bookworm
  • 23:13 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host stewards1001.eqiad.wmnet with OS bookworm
  • 23:12 mutante: wmf-reimage for stewards1001 failed with [self-signed certificate in certificate chain
  • 23:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1038.eqiad.wmnet with OS bookworm
  • 23:10 tgr: UTC late deploys done
  • 22:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1037.eqiad.wmnet with OS bookworm
  • 22:55 tgr@deploy2002: Finished scap: Backport for session: Remove incorrect warning (T348852) (duration: 08m 03s)
  • 22:52 root@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts search-loader2001.codfw.wmnet,search-loader1001.eqiad.wmnet
  • 22:49 root@cumin2002: START - Cookbook sre.hosts.decommission for hosts search-loader2001.codfw.wmnet,search-loader1001.eqiad.wmnet
  • 22:49 tgr@deploy2002: tgr: Continuing with sync
  • 22:48 tgr@deploy2002: tgr: Backport for session: Remove incorrect warning (T348852) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:47 tgr@deploy2002: Started scap: Backport for session: Remove incorrect warning (T348852)
  • 22:42 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: host reimage
  • 22:41 tgr@deploy2002: Finished scap: Backport for Remove support for HTTPS-only sessions on HTTP/HTTPS wikis (T348852) (duration: 18m 17s)
  • 22:39 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: host reimage
  • 22:35 tgr@deploy2002: tgr: Continuing with sync
  • 22:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1037.eqiad.wmnet with reason: host reimage
  • 22:27 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1037.eqiad.wmnet with reason: host reimage
  • 22:24 tgr@deploy2002: tgr: Backport for Remove support for HTTPS-only sessions on HTTP/HTTPS wikis (T348852) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:23 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1038.eqiad.wmnet with OS bookworm
  • 22:22 tgr@deploy2002: Started scap: Backport for Remove support for HTTPS-only sessions on HTTP/HTTPS wikis (T348852)
  • 22:17 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1036.eqiad.wmnet with OS bookworm
  • 22:12 bvibber: brion halting requeueTranscode jobs to let queues even out before continuing with lighter load
  • 22:10 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1037.eqiad.wmnet with OS bookworm
  • 22:05 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1035.eqiad.wmnet with OS bookworm
  • 21:54 urbanecm@deploy2002: Finished scap: Backport for mobile: Add MobileUrlCallback (T257852), Parsoid-VE-MCR hack: Always return main slot output if useParsoid is set (T351026 T351113) (duration: 18m 34s)
  • 21:49 urbanecm@deploy2002: urbanecm and ssastry and tgr: Continuing with sync
  • 21:48 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1036.eqiad.wmnet with reason: host reimage
  • 21:48 bking@deploy2002: Finished deploy [search/mjolnir/deploy@0f8bb60]: (no justification provided) (duration: 00m 35s)
  • 21:47 bking@deploy2002: Started deploy [search/mjolnir/deploy@0f8bb60]: (no justification provided)
  • 21:46 inflatador: bking@deploy2002 deploy mjolnir 2.4.0 on newly-built bullseye hosts T346039
  • 21:45 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1036.eqiad.wmnet with reason: host reimage
  • 21:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T348183)', diff saved to https://phabricator.wikimedia.org/P53376 and previous config saved to /var/cache/conftool/dbconfig/20231113-214411-arnaudb.json
  • 21:40 btullis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
  • 21:37 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1035.eqiad.wmnet with reason: host reimage
  • 21:37 urbanecm@deploy2002: urbanecm and ssastry and tgr: Backport for mobile: Add MobileUrlCallback (T257852), Parsoid-VE-MCR hack: Always return main slot output if useParsoid is set (T351026 T351113) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:36 btullis@deploy2002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
  • 21:36 btullis@deploy2002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
  • 21:36 urbanecm@deploy2002: Started scap: Backport for mobile: Add MobileUrlCallback (T257852), Parsoid-VE-MCR hack: Always return main slot output if useParsoid is set (T351026 T351113)
  • 21:34 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1035.eqiad.wmnet with reason: host reimage
  • 21:33 btullis@deploy2002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
  • 21:32 btullis@deploy2002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 21:32 urbanecm@deploy2002: Finished scap: Backport for Undeploy pilot survey on metawiki (T349854), Don't change transcode rows during read operations (T152851), Fixes to requeueTranscodes to make it easier to batch-fill (T68722), Only include completed transcodes in .m3u8 playlist (T350996) (duration: 10m 37s)
  • 21:30 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1036.eqiad.wmnet with OS bookworm
  • 21:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P53375 and previous config saved to /var/cache/conftool/dbconfig/20231113-212904-arnaudb.json
  • 21:28 btullis@deploy2002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 21:26 urbanecm@deploy2002: urbanecm and brion and dani: Continuing with sync
  • 21:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1034.eqiad.wmnet with OS bookworm
  • 21:22 urbanecm@deploy2002: urbanecm and brion and dani: Backport for Undeploy pilot survey on metawiki (T349854), Don't change transcode rows during read operations (T152851), Fixes to requeueTranscodes to make it easier to batch-fill (T68722), Only include completed transcodes in .m3u8 playlist (T350996) synced to the testservers (https://wikitech.wiki
  • 21:21 urbanecm@deploy2002: Started scap: Backport for Undeploy pilot survey on metawiki (T349854), Don't change transcode rows during read operations (T152851), Fixes to requeueTranscodes to make it easier to batch-fill (T68722), Only include completed transcodes in .m3u8 playlist (T350996)
  • 21:20 urbanecm@deploy2002: Sync cancelled.
  • 21:19 urbanecm@deploy2002: urbanecm and dani: Backport for Undeploy pilot survey on metawiki (T349854) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:19 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1030.eqiad.wmnet
  • 21:18 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:18 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1030.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 21:18 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1035.eqiad.wmnet with OS bookworm
  • 21:18 urbanecm@deploy2002: Started scap: Backport for Undeploy pilot survey on metawiki (T349854)
  • 21:17 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1030.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 21:16 urbanecm@deploy2002: Finished scap: Backport for Enable edit check on swwiki (T350921), Fix Reader Demographics 2 survey (T345951) (duration: 10m 15s)
  • 21:15 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 21:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P53374 and previous config saved to /var/cache/conftool/dbconfig/20231113-211358-arnaudb.json
  • 21:11 urbanecm@deploy2002: dani and kemayo and urbanecm: Continuing with sync
  • 21:11 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1030.eqiad.wmnet
  • 21:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1029.eqiad.wmnet
  • 21:10 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:10 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1029.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 21:09 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1029.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 21:07 urbanecm@deploy2002: dani and kemayo and urbanecm: Backport for Enable edit check on swwiki (T350921), Fix Reader Demographics 2 survey (T345951) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:06 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 21:06 urbanecm@deploy2002: Started scap: Backport for Enable edit check on swwiki (T350921), Fix Reader Demographics 2 survey (T345951)
  • 21:02 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1034.eqiad.wmnet with reason: host reimage
  • 21:02 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1029.eqiad.wmnet
  • 21:02 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1028.eqiad.wmnet
  • 21:02 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:01 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1028.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 21:00 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1028.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 20:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1033.eqiad.wmnet with OS bookworm
  • 20:59 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1034.eqiad.wmnet with reason: host reimage
  • 20:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T348183)', diff saved to https://phabricator.wikimedia.org/P53373 and previous config saved to /var/cache/conftool/dbconfig/20231113-205852-arnaudb.json
  • 20:58 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 20:53 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1028.eqiad.wmnet
  • 20:52 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on wdqs1023.eqiad.wmnet with reason: T347504
  • 20:52 bking@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on wdqs1023.eqiad.wmnet with reason: T347504
  • 20:52 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1027.eqiad.wmnet
  • 20:52 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:52 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1027.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 20:52 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on wdqs1024.eqiad.wmnet with reason: T347504
  • 20:52 bking@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on wdqs1024.eqiad.wmnet with reason: T347504
  • 20:51 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on wdqs1022.eqiad.wmnet with reason: T347504
  • 20:51 bking@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on wdqs1022.eqiad.wmnet with reason: T347504
  • 20:51 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1027.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 20:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T348183)', diff saved to https://phabricator.wikimedia.org/P53372 and previous config saved to /var/cache/conftool/dbconfig/20231113-205032-arnaudb.json
  • 20:50 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 20:50 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 20:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T348183)', diff saved to https://phabricator.wikimedia.org/P53371 and previous config saved to /var/cache/conftool/dbconfig/20231113-205010-arnaudb.json
  • 20:49 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 20:47 urbanecm: mwmaint2002: `mwscript extensions/GrowthExperiments/maintenance/reassignMentees.php --wiki=arwiki --all --performer='Martin Urbanec (WMF)'` (T330071)
  • 20:44 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1027.eqiad.wmnet
  • 20:44 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1026.eqiad.wmnet
  • 20:44 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:43 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1026.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 20:43 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1034.eqiad.wmnet with OS bookworm
  • 20:42 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1026.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 20:41 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1032.eqiad.wmnet with OS bookworm
  • 20:40 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 20:36 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 20:36 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1033.eqiad.wmnet with reason: host reimage
  • 20:36 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1026.eqiad.wmnet
  • 20:35 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1025.eqiad.wmnet
  • 20:35 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:35 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1025.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 20:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P53370 and previous config saved to /var/cache/conftool/dbconfig/20231113-203504-arnaudb.json
  • 20:34 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1025.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 20:32 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1033.eqiad.wmnet with reason: host reimage
  • 20:32 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 20:27 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 20:27 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 20:27 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1025.eqiad.wmnet
  • 20:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P53369 and previous config saved to /var/cache/conftool/dbconfig/20231113-201957-arnaudb.json
  • 20:18 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1033.eqiad.wmnet with OS bookworm
  • 20:17 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1032.eqiad.wmnet with reason: host reimage
  • 20:15 ebernhardson: start reindex of enwiki indexes in cloudelastic search cluster from mwmaint2002
  • 20:14 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1032.eqiad.wmnet with reason: host reimage
  • 20:09 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1033.eqiad.wmnet with OS bookworm
  • 20:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T348183)', diff saved to https://phabricator.wikimedia.org/P53368 and previous config saved to /var/cache/conftool/dbconfig/20231113-200451-arnaudb.json
  • 20:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on stewards1001.eqiad.wmnet with reason: host reimage
  • 20:00 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1032.eqiad.wmnet with OS bookworm
  • 19:59 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1032.eqiad.wmnet with OS bookworm
  • 19:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T348183)', diff saved to https://phabricator.wikimedia.org/P53367 and previous config saved to /var/cache/conftool/dbconfig/20231113-195934-arnaudb.json
  • 19:59 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 19:59 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 19:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T348183)', diff saved to https://phabricator.wikimedia.org/P53366 and previous config saved to /var/cache/conftool/dbconfig/20231113-195913-arnaudb.json
  • 19:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on stewards1001.eqiad.wmnet with reason: host reimage
  • 19:56 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1033.eqiad.wmnet with OS bookworm
  • 19:55 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 19:55 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 19:55 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 19:55 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 19:53 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1033.eqiad.wmnet with OS bookworm
  • 19:48 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host stewards1001.eqiad.wmnet with OS bookworm
  • 19:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P53365 and previous config saved to /var/cache/conftool/dbconfig/20231113-194406-arnaudb.json
  • 19:38 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1033.eqiad.wmnet with OS bookworm
  • 19:38 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1032.eqiad.wmnet with OS bookworm
  • 19:37 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1032.eqiad.wmnet with OS bookworm
  • 19:35 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1033.eqiad.wmnet with OS bookworm
  • 19:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P53364 and previous config saved to /var/cache/conftool/dbconfig/20231113-192900-arnaudb.json
  • 19:25 dzahn@cumin1001: START - Cookbook sre.puppet.migrate-host for host stewards1001.eqiad.wmnet
  • 19:24 dzahn@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host stewards1001.eqiad.wmnet
  • 19:20 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1032.eqiad.wmnet with OS bookworm
  • 19:19 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1033.eqiad.wmnet with OS bookworm
  • 19:17 dzahn@cumin1001: START - Cookbook sre.puppet.migrate-host for host stewards1001.eqiad.wmnet
  • 19:15 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1032.eqiad.wmnet with OS bookworm
  • 19:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T348183)', diff saved to https://phabricator.wikimedia.org/P53363 and previous config saved to /var/cache/conftool/dbconfig/20231113-191354-arnaudb.json
  • 19:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T348183)', diff saved to https://phabricator.wikimedia.org/P53362 and previous config saved to /var/cache/conftool/dbconfig/20231113-190849-arnaudb.json
  • 19:08 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 19:08 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 19:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T348183)', diff saved to https://phabricator.wikimedia.org/P53361 and previous config saved to /var/cache/conftool/dbconfig/20231113-190827-arnaudb.json
  • 19:00 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1032.eqiad.wmnet with OS bookworm
  • 18:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P53360 and previous config saved to /var/cache/conftool/dbconfig/20231113-185321-arnaudb.json
  • 18:50 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:50 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:49 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:49 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:49 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:49 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:42 sukhe: pool cp4052 as first cp host for bookworm testing: T342154
  • 18:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P53359 and previous config saved to /var/cache/conftool/dbconfig/20231113-183814-arnaudb.json
  • 18:28 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: microsites::peopleweb
  • 18:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T348183)', diff saved to https://phabricator.wikimedia.org/P53358 and previous config saved to /var/cache/conftool/dbconfig/20231113-182308-arnaudb.json
  • 18:20 dzahn@cumin1001: START - Cookbook sre.puppet.migrate-role for role: microsites::peopleweb
  • 18:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T348183)', diff saved to https://phabricator.wikimedia.org/P53357 and previous config saved to /var/cache/conftool/dbconfig/20231113-181751-arnaudb.json
  • 18:17 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 18:17 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 18:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T348183)', diff saved to https://phabricator.wikimedia.org/P53356 and previous config saved to /var/cache/conftool/dbconfig/20231113-181729-arnaudb.json
  • 18:16 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host people2003.codfw.wmnet
  • 18:09 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:09 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:08 bking@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 18:07 bking@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 18:07 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 18:07 bking@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 18:06 bking@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 18:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P53355 and previous config saved to /var/cache/conftool/dbconfig/20231113-180222-arnaudb.json
  • 17:59 dzahn@cumin1001: START - Cookbook sre.puppet.migrate-host for host people2003.codfw.wmnet
  • 17:59 bking@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 17:59 bking@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 17:59 bking@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 17:47 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P53354 and previous config saved to /var/cache/conftool/dbconfig/20231113-174716-arnaudb.json
  • 17:36 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 17:35 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 17:34 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 17:34 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 17:34 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 17:34 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 17:33 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 17:33 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 17:32 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 17:32 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 17:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T348183)', diff saved to https://phabricator.wikimedia.org/P53353 and previous config saved to /var/cache/conftool/dbconfig/20231113-173209-arnaudb.json
  • 17:31 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 17:31 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 17:31 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 17:28 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 17:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T348183)', diff saved to https://phabricator.wikimedia.org/P53352 and previous config saved to /var/cache/conftool/dbconfig/20231113-172748-arnaudb.json
  • 17:27 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 17:27 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 17:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 17:27 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 17:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 17:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T348183)', diff saved to https://phabricator.wikimedia.org/P53351 and previous config saved to /var/cache/conftool/dbconfig/20231113-172712-arnaudb.json
  • 17:27 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 17:26 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 17:21 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 17:21 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 17:20 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 17:17 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 17:17 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 17:16 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 17:15 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 17:14 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 17:13 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 17:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P53350 and previous config saved to /var/cache/conftool/dbconfig/20231113-171205-arnaudb.json
  • 17:10 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 17:09 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 17:09 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 17:09 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 17:08 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 17:06 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 17:06 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 17:05 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 17:05 ottomata: deploying eventgates to pick up change to use mw-api-int-async-ro with retries - T326002
  • 17:04 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 17:04 otto@deploy2002: Finished deploy [analytics/refinery@25ef91f]: deploying refinery with refinery-source 0.2.25 jars for T321854 [analytics/refinery@25ef91f2] (duration: 06m 36s)
  • 16:57 otto@deploy2002: Started deploy [analytics/refinery@25ef91f]: deploying refinery with refinery-source 0.2.25 jars for T321854 [analytics/refinery@25ef91f2]
  • 16:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P53349 and previous config saved to /var/cache/conftool/dbconfig/20231113-165659-arnaudb.json
  • 16:51 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 05m 42s)
  • 16:46 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 14s)
  • 16:43 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 16:43 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 16:43 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 16:42 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 16:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T348183)', diff saved to https://phabricator.wikimedia.org/P53348 and previous config saved to /var/cache/conftool/dbconfig/20231113-164152-arnaudb.json
  • 16:40 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1031.eqiad.wmnet with OS bookworm
  • 16:39 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host sretest1004.eqiad.wmnet
  • 16:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T348183)', diff saved to https://phabricator.wikimedia.org/P53347 and previous config saved to /var/cache/conftool/dbconfig/20231113-163730-arnaudb.json
  • 16:37 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 16:37 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 16:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T348183)', diff saved to https://phabricator.wikimedia.org/P53346 and previous config saved to /var/cache/conftool/dbconfig/20231113-163709-arnaudb.json
  • 16:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P53345 and previous config saved to /var/cache/conftool/dbconfig/20231113-162202-arnaudb.json
  • 16:14 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 16:13 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 16:12 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1031.eqiad.wmnet with reason: host reimage
  • 16:11 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 16:11 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 16:09 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1031.eqiad.wmnet with reason: host reimage
  • 16:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P53344 and previous config saved to /var/cache/conftool/dbconfig/20231113-160656-arnaudb.json
  • 15:55 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1031.eqiad.wmnet with OS bookworm
  • 15:54 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 15:52 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 15:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T348183)', diff saved to https://phabricator.wikimedia.org/P53343 and previous config saved to /var/cache/conftool/dbconfig/20231113-155149-arnaudb.json
  • 15:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T348183)', diff saved to https://phabricator.wikimedia.org/P53342 and previous config saved to /var/cache/conftool/dbconfig/20231113-154641-arnaudb.json
  • 15:46 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 15:46 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 15:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 15:42 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 15:41 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 15:40 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 15:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T348183)', diff saved to https://phabricator.wikimedia.org/P53341 and previous config saved to /var/cache/conftool/dbconfig/20231113-154044-arnaudb.json
  • 15:39 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 15:38 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 15:31 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 15:31 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 15:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P53340 and previous config saved to /var/cache/conftool/dbconfig/20231113-152537-arnaudb.json
  • 15:14 fabfur: swapped cp1103 <-> cp1078 (T349244)
  • 15:14 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1020.eqiad.wmnet
  • 15:13 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1103.eqiad.wmnet
  • 15:13 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp1103.eqiad.wmnet
  • 15:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P53339 and previous config saved to /var/cache/conftool/dbconfig/20231113-151031-arnaudb.json
  • 15:08 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host clouddb1020.eqiad.wmnet
  • 15:07 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1019.eqiad.wmnet
  • 15:07 fabfur: swapped cp1102 <-> cp1077 (T349244)
  • 15:04 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1102.eqiad.wmnet
  • 15:04 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp1102.eqiad.wmnet
  • 15:01 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:00 kamila@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:00 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host clouddb1019.eqiad.wmnet
  • 14:59 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 14:59 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 14:58 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 14:58 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 14:57 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1018.eqiad.wmnet
  • 14:56 kamila@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:56 kamila@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:55 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T348183)', diff saved to https://phabricator.wikimedia.org/P53338 and previous config saved to /var/cache/conftool/dbconfig/20231113-145524-arnaudb.json
  • 14:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1230 (T348183)', diff saved to https://phabricator.wikimedia.org/P53337 and previous config saved to /var/cache/conftool/dbconfig/20231113-145223-arnaudb.json
  • 14:52 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 14:52 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 14:51 urbanecm: mwmaint2002: stop `extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --wiki frwiki` again, memory leak didn't stop (T315510)
  • 14:50 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 14:49 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 14:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T348183)', diff saved to https://phabricator.wikimedia.org/P53336 and previous config saved to /var/cache/conftool/dbconfig/20231113-144947-arnaudb.json
  • 14:46 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host clouddb1018.eqiad.wmnet
  • 14:43 urbanecm: mwmaint2002: foreachwiki extensions/WikimediaMaintenance/createExtensionTables.php MediaModeration (T350321)
  • 14:41 bblack: cp2027: varnish-frontend-restart to test tcp listen port changes
  • 14:40 urbanecm@deploy2002: Finished scap: Backport for Deploy Reader Demographics 2 survey (T345951), Add mediamoderation_scan table (T350321) (duration: 09m 13s)
  • 14:38 urbanecm: mwmaint2002: Start several instances of `extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php` (T315510)
  • 14:35 urbanecm@deploy2002: urbanecm and dani: Continuing with sync
  • 14:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P53335 and previous config saved to /var/cache/conftool/dbconfig/20231113-143440-arnaudb.json
  • 14:34 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host clouddb1017.eqiad.wmnet
  • 14:32 urbanecm@deploy2002: urbanecm and dani: Backport for Deploy Reader Demographics 2 survey (T345951), Add mediamoderation_scan table (T350321) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:31 urbanecm@deploy2002: Started scap: Backport for Deploy Reader Demographics 2 survey (T345951), Add mediamoderation_scan table (T350321)
  • 14:30 urbanecm@deploy2002: Finished scap: Backport for ParserOutputAccess: Limit local cache size (T315510) (duration: 06m 42s)
  • 14:30 moritzm: installing debianutils bugfix updates from Bookworm point release
  • 14:25 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
  • 14:25 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
  • 14:25 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
  • 14:25 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
  • 14:24 urbanecm@deploy2002: Started scap: Backport for ParserOutputAccess: Limit local cache size (T315510)
  • 14:22 urbanecm@deploy2002: Finished scap: Backport for Add MediaModeration to addWiki.php (T350321), Add MediaModeration to createExtensionTables.php (T350321) (duration: 06m 58s)
  • 14:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P53334 and previous config saved to /var/cache/conftool/dbconfig/20231113-141934-arnaudb.json
  • 14:16 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 14:16 urbanecm@deploy2002: urbanecm: Backport for Add MediaModeration to addWiki.php (T350321), Add MediaModeration to createExtensionTables.php (T350321) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:15 urbanecm@deploy2002: Started scap: Backport for Add MediaModeration to addWiki.php (T350321), Add MediaModeration to createExtensionTables.php (T350321)
  • 14:06 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host clouddb1017.eqiad.wmnet
  • 14:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T348183)', diff saved to https://phabricator.wikimedia.org/P53333 and previous config saved to /var/cache/conftool/dbconfig/20231113-140427-arnaudb.json
  • 14:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3315 (T348183)', diff saved to https://phabricator.wikimedia.org/P53332 and previous config saved to /var/cache/conftool/dbconfig/20231113-140136-arnaudb.json
  • 14:01 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 14:01 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 14:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T348183)', diff saved to https://phabricator.wikimedia.org/P53331 and previous config saved to /var/cache/conftool/dbconfig/20231113-140115-arnaudb.json
  • 13:54 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: requesttracker
  • 13:53 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2003.codfw.wmnet with OS bullseye
  • 13:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P53330 and previous config saved to /var/cache/conftool/dbconfig/20231113-134608-arnaudb.json
  • 13:46 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: requesttracker
  • 13:45 moritzm: restarting FPM/Apache on mw canaries
  • 13:42 moritzm: installing nghttp2 security updates
  • 13:38 moritzm: installing tomcat9 security updates
  • 13:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P53329 and previous config saved to /var/cache/conftool/dbconfig/20231113-133102-arnaudb.json
  • 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: kafka::jumbo::broker
  • 13:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T348183)', diff saved to https://phabricator.wikimedia.org/P53328 and previous config saved to /var/cache/conftool/dbconfig/20231113-131556-arnaudb.json
  • 13:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1210 (T348183)', diff saved to https://phabricator.wikimedia.org/P53327 and previous config saved to /var/cache/conftool/dbconfig/20231113-131207-arnaudb.json
  • 13:12 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 13:11 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 13:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T348183)', diff saved to https://phabricator.wikimedia.org/P53326 and previous config saved to /var/cache/conftool/dbconfig/20231113-131146-arnaudb.json
  • 13:10 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: kafka::jumbo::broker
  • 12:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P53325 and previous config saved to /var/cache/conftool/dbconfig/20231113-125640-arnaudb.json
  • 12:55 ayounsi@cumin1001: START - Cookbook sre.hosts.dhcp for host sretest1004.eqiad.wmnet
  • 12:53 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet with OS bookworm
  • 12:43 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest2003.codfw.wmnet with OS bullseye
  • 12:42 cmooney@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2004.codfw.wmnet with OS bullseye
  • 12:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P53324 and previous config saved to /var/cache/conftool/dbconfig/20231113-124133-arnaudb.json
  • 12:38 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host centrallog2002.codfw.wmnet
  • 12:35 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 12:34 effie: restarting memcached on mc2038
  • 12:32 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest2004.codfw.wmnet with OS bullseye
  • 12:32 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 12:31 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest2003.codfw.wmnet
  • 12:29 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 12:28 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1004.eqiad.wmnet with OS bullseye
  • 12:28 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 12:28 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 12:27 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 12:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T348183)', diff saved to https://phabricator.wikimedia.org/P53323 and previous config saved to /var/cache/conftool/dbconfig/20231113-122627-arnaudb.json
  • 12:24 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1200 (T348183)', diff saved to https://phabricator.wikimedia.org/P53322 and previous config saved to /var/cache/conftool/dbconfig/20231113-122332-arnaudb.json
  • 12:23 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 12:23 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 12:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T348183)', diff saved to https://phabricator.wikimedia.org/P53321 and previous config saved to /var/cache/conftool/dbconfig/20231113-122310-arnaudb.json
  • 12:21 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host centrallog2002.codfw.wmnet
  • 12:19 cmooney@cumin1001: START - Cookbook sre.hosts.dhcp for host sretest2003.codfw.wmnet
  • 12:16 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host kafka-jumbo1007.eqiad.wmnet
  • 12:16 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 12:15 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1004.eqiad.wmnet with OS bullseye
  • 12:08 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host kafka-jumbo1007.eqiad.wmnet
  • 12:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P53320 and previous config saved to /var/cache/conftool/dbconfig/20231113-120803-arnaudb.json
  • 12:04 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1016.eqiad.wmnet
  • 11:59 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host prometheus4002.ulsfo.wmnet
  • 11:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P53319 and previous config saved to /var/cache/conftool/dbconfig/20231113-115257-arnaudb.json
  • 11:43 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1015.eqiad.wmnet
  • 11:40 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons.
  • 11:39 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host clouddb1015.eqiad.wmnet
  • 11:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T348183)', diff saved to https://phabricator.wikimedia.org/P53318 and previous config saved to /var/cache/conftool/dbconfig/20231113-113751-arnaudb.json
  • 11:37 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1014.eqiad.wmnet
  • 11:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1185 (T348183)', diff saved to https://phabricator.wikimedia.org/P53317 and previous config saved to /var/cache/conftool/dbconfig/20231113-113458-arnaudb.json
  • 11:34 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 11:34 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 11:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T348183)', diff saved to https://phabricator.wikimedia.org/P53316 and previous config saved to /var/cache/conftool/dbconfig/20231113-113437-arnaudb.json
  • 11:33 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host clouddb1014.eqiad.wmnet
  • 11:30 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1013.eqiad.wmnet
  • 11:30 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons.
  • 11:28 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host prometheus4002.ulsfo.wmnet
  • 11:20 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 11:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P53315 and previous config saved to /var/cache/conftool/dbconfig/20231113-111930-arnaudb.json
  • 11:12 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: webperf
  • 11:07 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host clouddb1013.eqiad.wmnet
  • 11:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P53314 and previous config saved to /var/cache/conftool/dbconfig/20231113-110424-arnaudb.json
  • 11:02 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: webperf
  • 11:01 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudmetrics[1003-1004].eqiad.wmnet
  • 11:01 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:01 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudmetrics[1003-1004].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - taavi@cumin1001"
  • 11:00 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudmetrics[1003-1004].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - taavi@cumin1001"
  • 10:57 taavi@cumin1001: START - Cookbook sre.dns.netbox
  • 10:50 jbond: roll restart pybal after failed etcd cr
  • 10:49 taavi@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudmetrics[1003-1004].eqiad.wmnet
  • 10:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T348183)', diff saved to https://phabricator.wikimedia.org/P53313 and previous config saved to /var/cache/conftool/dbconfig/20231113-104917-arnaudb.json
  • 10:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T348183)', diff saved to https://phabricator.wikimedia.org/P53312 and previous config saved to /var/cache/conftool/dbconfig/20231113-104534-arnaudb.json
  • 10:45 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 10:45 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 10:45 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 10:44 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 10:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 10:42 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 10:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T348183)', diff saved to https://phabricator.wikimedia.org/P53311 and previous config saved to /var/cache/conftool/dbconfig/20231113-104245-arnaudb.json
  • 10:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 983
  • 10:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P53309 and previous config saved to /var/cache/conftool/dbconfig/20231113-102739-arnaudb.json
  • 10:27 arnaudb@cumin1001: dbctl commit (dc=all): 'depool T350458', diff saved to https://phabricator.wikimedia.org/P53308 and previous config saved to /var/cache/conftool/dbconfig/20231113-102730-arnaudb.json
  • 10:24 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 10:00 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: graphite::production
  • 09:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P53307 and previous config saved to /var/cache/conftool/dbconfig/20231113-095725-arnaudb.json
  • 09:44 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: graphite::production
  • 09:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: arclamp
  • 09:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T348183)', diff saved to https://phabricator.wikimedia.org/P53306 and previous config saved to /var/cache/conftool/dbconfig/20231113-094218-arnaudb.json
  • 09:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T348183)', diff saved to https://phabricator.wikimedia.org/P53305 and previous config saved to /var/cache/conftool/dbconfig/20231113-093824-arnaudb.json
  • 09:38 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 09:38 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 09:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T348183)', diff saved to https://phabricator.wikimedia.org/P53304 and previous config saved to /var/cache/conftool/dbconfig/20231113-093802-arnaudb.json
  • 09:36 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: arclamp
  • 09:31 moritzm: installing dbus security updates on bullseye
  • 09:31 jnuche@deploy2002: rebuilt and synchronized wikiversions files: labswiki to 1.42.0-wmf.4 (T350836 T350080)
  • 09:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P53303 and previous config saved to /var/cache/conftool/dbconfig/20231113-092256-arnaudb.json
  • 09:17 hashar@deploy2002: Finished deploy [integration/docroot@9bf1967]: Replace WikimediaUI Base with Codex design tokens T331403 T334934 (duration: 00m 07s)
  • 09:16 hashar@deploy2002: Started deploy [integration/docroot@9bf1967]: Replace WikimediaUI Base with Codex design tokens T331403 T334934
  • 09:14 jnuche@deploy2002: Finished scap: Backport for Fix BlockDisablesLogin recursion (T350836 T350080) (duration: 07m 49s)
  • 09:08 jnuche@deploy2002: bd808 and jnuche: Continuing with sync
  • 09:08 jnuche@deploy2002: bd808 and jnuche: Backport for Fix BlockDisablesLogin recursion (T350836 T350080) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P53302 and previous config saved to /var/cache/conftool/dbconfig/20231113-090750-arnaudb.json
  • 09:06 jnuche@deploy2002: Started scap: Backport for Fix BlockDisablesLogin recursion (T350836 T350080)
  • 08:58 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host webperf2003.codfw.wmnet
  • 08:55 godog: bounce prometheus eqiad for k8s / k8s-aux - T343529
  • 08:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T348183)', diff saved to https://phabricator.wikimedia.org/P53301 and previous config saved to /var/cache/conftool/dbconfig/20231113-085243-arnaudb.json
  • 08:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1130 (T348183)', diff saved to https://phabricator.wikimedia.org/P53300 and previous config saved to /var/cache/conftool/dbconfig/20231113-084945-arnaudb.json
  • 08:49 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 08:49 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 08:45 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host webperf2003.codfw.wmnet
  • 08:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host graphite2004.codfw.wmnet
  • 08:34 hashar@deploy2002: Finished deploy [integration/docroot@bc8aaba]: Add more libraries to doc.wikimedia.org homepage - T327604 (duration: 00m 06s)
  • 08:34 hashar@deploy2002: Started deploy [integration/docroot@bc8aaba]: Add more libraries to doc.wikimedia.org homepage - T327604
  • 08:30 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host graphite2004.codfw.wmnet
  • 08:29 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host arclamp2001.codfw.wmnet
  • 08:20 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host arclamp2001.codfw.wmnet
  • 07:54 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: search::loader
  • 07:42 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: search::loader

2023-11-12

  • 21:28 jiji@cumin2002: END (PASS) - Cookbook sre.mediawiki.restart-appservers (exit_code=0)
  • 21:27 jiji@cumin2002: START - Cookbook sre.mediawiki.restart-appservers
  • 21:26 effie: restart php-fpm on jobrunners

2023-11-11

  • 01:47 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1058.eqiad.wmnet with OS bookworm
  • 01:20 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1058.eqiad.wmnet with reason: host reimage
  • 01:17 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1058.eqiad.wmnet with reason: host reimage
  • 01:03 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1058.eqiad.wmnet with OS bookworm
  • 00:14 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1061.eqiad.wmnet with OS bookworm

2023-11-10

  • 23:51 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1061.eqiad.wmnet with reason: host reimage
  • 23:48 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1061.eqiad.wmnet with reason: host reimage
  • 23:34 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1061.eqiad.wmnet with OS bookworm
  • 21:00 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1064.eqiad.wmnet with OS bookworm
  • 20:51 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1064.eqiad.wmnet with OS bookworm
  • 20:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1064.eqiad.wmnet with reason: host reimage
  • 20:22 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1064.eqiad.wmnet with reason: host reimage
  • 20:04 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bookworm
  • 18:47 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm
  • 18:04 bvibber: brion adding more vp9 backfill to the transcode runs on mwmaint2002 (requeueTranscodes -> job queue runners). Should increase load on transcode scaler job runners but not elsewhere
  • 17:54 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1066.eqiad.wmnet with OS bookworm
  • 17:53 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1064.eqiad.wmnet with reason: host reimage
  • 17:52 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1067.eqiad.wmnet with OS bookworm
  • 17:51 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1065.eqiad.wmnet with OS bookworm
  • 17:50 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage
  • 17:49 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1064.eqiad.wmnet with reason: host reimage
  • 17:48 topranks: withdrawing IPv6 prefixes announced to AS1299 in esams to troubleshoot connectivity problem report
  • 17:47 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage
  • 17:33 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm
  • 17:33 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bookworm
  • 17:33 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1063.eqiad.wmnet with OS bookworm
  • 17:33 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1064.eqiad.wmnet with OS bookworm
  • 17:01 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1066.eqiad.wmnet with reason: host reimage
  • 16:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage
  • 16:59 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1067.eqiad.wmnet with reason: host reimage
  • 16:58 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1064.eqiad.wmnet with reason: host reimage
  • 16:56 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1065.eqiad.wmnet with reason: host reimage
  • 16:54 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage
  • 16:54 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1067.eqiad.wmnet with reason: host reimage
  • 16:54 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1066.eqiad.wmnet with reason: host reimage
  • 16:53 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1064.eqiad.wmnet with reason: host reimage
  • 16:53 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1065.eqiad.wmnet with reason: host reimage
  • 16:51 cmooney@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.4 - cmooney@cumin1001
  • 16:49 cmooney@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.4 - cmooney@cumin1001
  • 16:39 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bookworm
  • 16:39 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1064.eqiad.wmnet with OS bookworm
  • 16:39 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1066.eqiad.wmnet with OS bookworm
  • 16:39 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1065.eqiad.wmnet with OS bookworm
  • 16:38 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm
  • 16:38 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bookworm
  • 16:38 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1066.eqiad.wmnet with OS bookworm
  • 16:38 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1065.eqiad.wmnet with OS bookworm
  • 16:38 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1063.eqiad.wmnet with OS bookworm
  • 16:38 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1067.eqiad.wmnet with OS bookworm
  • 16:36 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudvirt1062.private.eqiad.wikimedia.cloud on all recursors
  • 16:36 cmooney@cumin1001: START - Cookbook sre.dns.wipe-cache cloudvirt1062.private.eqiad.wikimedia.cloud on all recursors
  • 16:34 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:34 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add cloud-private subnet entries for new cloudvirt hosts - cmooney@cumin1001"
  • 16:33 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add cloud-private subnet entries for new cloudvirt hosts - cmooney@cumin1001"
  • 16:31 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 16:31 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 16:29 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1067.eqiad.wmnet with reason: host reimage
  • 16:29 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 16:27 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1066.eqiad.wmnet with reason: host reimage
  • 16:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1065.eqiad.wmnet with reason: host reimage
  • 16:24 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1064.eqiad.wmnet with reason: host reimage
  • 16:24 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1066.eqiad.wmnet with reason: host reimage
  • 16:23 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1067.eqiad.wmnet with reason: host reimage
  • 16:22 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1064.eqiad.wmnet with reason: host reimage
  • 16:22 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1065.eqiad.wmnet with reason: host reimage
  • 16:20 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage
  • 16:18 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage
  • 16:12 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bookworm
  • 16:11 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1066.eqiad.wmnet with OS bookworm
  • 16:10 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1065.eqiad.wmnet with OS bookworm
  • 16:10 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bookworm
  • 16:06 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm
  • 15:54 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1062.eqiad.wmnet with OS bookworm
  • 15:38 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1062.eqiad.wmnet with reason: host reimage
  • 15:36 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1062.eqiad.wmnet with reason: host reimage
  • 15:23 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1062.eqiad.wmnet with OS bookworm
  • 14:15 denisse@deploy2002: Finished deploy [librenms/librenms@f049593]: Upgrade LibreNMS to v23.10.0 - T349492 (duration: 00m 10s)
  • 14:15 denisse@deploy2002: Started deploy [librenms/librenms@f049593]: Upgrade LibreNMS to v23.10.0 - T349492
  • 14:11 denisse: upgradeing LibreNMS to 23.10
  • 13:46 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 13:45 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 13:45 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 13:45 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 13:45 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 13:44 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 13:23 moritzm: imported php-geoip 1.1.1-7+wmf2+bullseye1 to component/php74 for bullseye-wikimedia
  • 13:05 moritzm: imported php-yaml 2.2.1+2.1.0+2.0.4+1.3.2-2+wmf1~bullseye1 to component/php74 for bullseye-wikimedia
  • 12:59 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1115.eqiad.wmnet with OS bullseye
  • 12:37 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1115.eqiad.wmnet with reason: host reimage
  • 12:34 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1115.eqiad.wmnet with reason: host reimage
  • 12:25 moritzm: imported php-pcov 1.0.6-4+wmf1~bullseye1 to component/php74 for bullseye-wikimedia
  • 12:12 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS bullseye
  • 12:08 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1115.eqiad.wmnet with OS bullseye
  • 12:04 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS bullseye
  • 12:03 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1115.eqiad.wmnet with OS bullseye
  • 11:59 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS bullseye
  • 11:58 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1115.eqiad.wmnet with OS bullseye
  • 11:56 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS bullseye
  • 11:56 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1115.eqiad.wmnet with OS bullseye
  • 11:56 moritzm: imported php-wmerrors 2.0.0~git20190628.183ef7d-3+wmf1+bullseye1 to component/php74 for bullseye-wikimedia
  • 11:50 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS bullseye
  • 11:50 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1115.eqiad.wmnet with OS bullseye
  • 11:46 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS bullseye
  • 11:18 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 11:17 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 11:16 moritzm: imported tideways 5.0.4-2+wmf1+bullseye1 to component/php74 for bullseye-wikimedia
  • 11:05 moritzm: imported php-imagick 3.4.4+php8.0+3.4.4-2+deb11u2+wmf1+bullseye1 to component/php74 for bullseye-wikimedia
  • 10:53 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1109.eqiad.wmnet with OS bullseye
  • 10:41 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 10:41 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 10:38 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 10:38 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 10:35 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1109.eqiad.wmnet with reason: host reimage
  • 10:32 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1109.eqiad.wmnet with reason: host reimage
  • 10:25 moritzm: imported dh-php 0.35+wmf1+bullseye1 to component/php74 for bullseye-wikimedia
  • 10:16 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1109.eqiad.wmnet with OS bullseye
  • 10:16 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1109.eqiad.wmnet with OS bullseye
  • 10:10 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1109.eqiad.wmnet with OS bullseye
  • 10:09 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1109.eqiad.wmnet with OS bullseye
  • 10:05 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 10:05 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 10:02 moritzm: imported php-excimer 1.0.2-1+wmf3+bullseye1 to component/php74 for bullseye-wikimedia
  • 10:02 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1109.eqiad.wmnet with OS bullseye
  • 10:01 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1109.eqiad.wmnet with OS bullseye
  • 09:57 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 09:57 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 09:54 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1109.eqiad.wmnet with OS bullseye
  • 09:29 moritzm: imported wikidiff2 1.14.1-0+wmf1+bullseye1 to component/php74 for bullseye-wikimedia
  • 09:12 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts1001.eqiad.wmnet
  • 09:09 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 09:09 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 09:07 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host vrts1001.eqiad.wmnet
  • 08:35 moritzm: imported php-defaults 2:7.4+76+wmf1~bullseye1 to component/php74 for bullseye-wikimedia
  • 08:35 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 08:34 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 08:01 moritzm: imported php7.4 1:7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf11u1 to component/php74 for bullseye-wikimedia
  • 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: insetup::search_platform
  • 07:01 vgutierrez: cleaning up digicert-2022 update-ocsp config bits from cp servers
  • 06:29 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: insetup::search_platform
  • 03:20 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T348183)', diff saved to https://phabricator.wikimedia.org/P53289 and previous config saved to /var/cache/conftool/dbconfig/20231110-032053-arnaudb.json
  • 03:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P53288 and previous config saved to /var/cache/conftool/dbconfig/20231110-030547-arnaudb.json
  • 02:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P53287 and previous config saved to /var/cache/conftool/dbconfig/20231110-025041-arnaudb.json
  • 02:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T348183)', diff saved to https://phabricator.wikimedia.org/P53286 and previous config saved to /var/cache/conftool/dbconfig/20231110-023534-arnaudb.json
  • 02:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T348183)', diff saved to https://phabricator.wikimedia.org/P53285 and previous config saved to /var/cache/conftool/dbconfig/20231110-022351-arnaudb.json
  • 02:23 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 02:23 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 02:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T348183)', diff saved to https://phabricator.wikimedia.org/P53284 and previous config saved to /var/cache/conftool/dbconfig/20231110-022330-arnaudb.json
  • 02:15 tzatziki: removing 3 files for legal compliance
  • 02:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P53283 and previous config saved to /var/cache/conftool/dbconfig/20231110-020823-arnaudb.json
  • 01:58 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1114.eqiad.wmnet with OS bullseye
  • 01:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P53282 and previous config saved to /var/cache/conftool/dbconfig/20231110-015317-arnaudb.json
  • 01:42 tzatziki: removing 16 files for legal compliance
  • 01:39 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1114.eqiad.wmnet with reason: host reimage
  • 01:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T348183)', diff saved to https://phabricator.wikimedia.org/P53281 and previous config saved to /var/cache/conftool/dbconfig/20231110-013810-arnaudb.json
  • 01:36 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1114.eqiad.wmnet with reason: host reimage
  • 01:27 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1112.eqiad.wmnet with OS bullseye
  • 01:21 sukhe@cumin1001: START - Cookbook sre.hosts.reimage for host cp1114.eqiad.wmnet with OS bullseye
  • 01:20 sukhe@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1114.eqiad.wmnet with OS bullseye
  • 01:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T348183)', diff saved to https://phabricator.wikimedia.org/P53280 and previous config saved to /var/cache/conftool/dbconfig/20231110-011712-arnaudb.json
  • 01:17 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 01:17 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 01:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T348183)', diff saved to https://phabricator.wikimedia.org/P53279 and previous config saved to /var/cache/conftool/dbconfig/20231110-011701-arnaudb.json
  • 01:15 sukhe@cumin1001: START - Cookbook sre.hosts.reimage for host cp1114.eqiad.wmnet with OS bullseye
  • 01:15 sukhe@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1114.eqiad.wmnet with OS bullseye
  • 01:13 bd808: SAL test (T343157)
  • 01:10 sukhe@cumin1001: START - Cookbook sre.hosts.reimage for host cp1114.eqiad.wmnet with OS bullseye
  • 01:10 sukhe@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1114.eqiad.wmnet with OS bullseye
  • 01:08 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1112.eqiad.wmnet with reason: host reimage
  • 01:05 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1112.eqiad.wmnet with reason: host reimage
  • 01:02 sukhe@cumin1001: START - Cookbook sre.hosts.reimage for host cp1114.eqiad.wmnet with OS bullseye
  • 01:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P53278 and previous config saved to /var/cache/conftool/dbconfig/20231110-010154-arnaudb.json
  • 01:00 wfan: update fraud filter, config revision changed from 4cfbb04b to 39a846b3
  • 00:50 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1112.eqiad.wmnet with OS bullseye
  • 00:50 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1112.eqiad.wmnet with OS bullseye
  • 00:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P53277 and previous config saved to /var/cache/conftool/dbconfig/20231110-004647-arnaudb.json
  • 00:45 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1112.eqiad.wmnet with OS bullseye
  • 00:44 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1112.eqiad.wmnet with OS bullseye
  • 00:37 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1112.eqiad.wmnet with OS bullseye
  • 00:31 tzatziki: removing 1 file for legal compliance
  • 00:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T348183)', diff saved to https://phabricator.wikimedia.org/P53276 and previous config saved to /var/cache/conftool/dbconfig/20231110-003141-arnaudb.json
  • 00:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T348183)', diff saved to https://phabricator.wikimedia.org/P53275 and previous config saved to /var/cache/conftool/dbconfig/20231110-002747-arnaudb.json
  • 00:27 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 00:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 00:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T348183)', diff saved to https://phabricator.wikimedia.org/P53274 and previous config saved to /var/cache/conftool/dbconfig/20231110-002725-arnaudb.json
  • 00:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P53273 and previous config saved to /var/cache/conftool/dbconfig/20231110-001219-arnaudb.json
  • 00:09 ejegg: fundraising python tools upgraded from a4cbbbe7 to 117e1f9c
  • 00:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 100%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P53272 and previous config saved to /var/cache/conftool/dbconfig/20231110-000322-root.json

2023-11-09

  • 23:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P53271 and previous config saved to /var/cache/conftool/dbconfig/20231109-235712-arnaudb.json
  • 23:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 75%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P53270 and previous config saved to /var/cache/conftool/dbconfig/20231109-234817-root.json
  • 23:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T348183)', diff saved to https://phabricator.wikimedia.org/P53269 and previous config saved to /var/cache/conftool/dbconfig/20231109-234206-arnaudb.json
  • 23:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T348183)', diff saved to https://phabricator.wikimedia.org/P53268 and previous config saved to /var/cache/conftool/dbconfig/20231109-233816-arnaudb.json
  • 23:38 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 23:37 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 23:37 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 23:37 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 23:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T348183)', diff saved to https://phabricator.wikimedia.org/P53267 and previous config saved to /var/cache/conftool/dbconfig/20231109-233728-arnaudb.json
  • 23:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 50%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P53266 and previous config saved to /var/cache/conftool/dbconfig/20231109-233312-root.json
  • 23:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P53265 and previous config saved to /var/cache/conftool/dbconfig/20231109-232221-arnaudb.json
  • 23:18 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 25%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P53264 and previous config saved to /var/cache/conftool/dbconfig/20231109-231807-root.json
  • 23:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P53263 and previous config saved to /var/cache/conftool/dbconfig/20231109-230715-arnaudb.json
  • 23:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 10%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P53262 and previous config saved to /var/cache/conftool/dbconfig/20231109-230302-root.json
  • 22:59 ejegg: payments-wiki upgraded from 6f27bf65 to 2018a390
  • 22:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T348183)', diff saved to https://phabricator.wikimedia.org/P53261 and previous config saved to /var/cache/conftool/dbconfig/20231109-225208-arnaudb.json
  • 22:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2151 (T348183)', diff saved to https://phabricator.wikimedia.org/P53260 and previous config saved to /var/cache/conftool/dbconfig/20231109-224818-arnaudb.json
  • 22:48 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 22:48 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 22:47 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T348183)', diff saved to https://phabricator.wikimedia.org/P53259 and previous config saved to /var/cache/conftool/dbconfig/20231109-224757-arnaudb.json
  • 22:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P53258 and previous config saved to /var/cache/conftool/dbconfig/20231109-223250-arnaudb.json
  • 22:28 bking@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 22:27 bking@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 22:27 bking@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 22:27 bking@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 22:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P53257 and previous config saved to /var/cache/conftool/dbconfig/20231109-221744-arnaudb.json
  • 22:08 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 22:08 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 22:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T348183)', diff saved to https://phabricator.wikimedia.org/P53256 and previous config saved to /var/cache/conftool/dbconfig/20231109-220238-arnaudb.json
  • 21:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2129 (T348183)', diff saved to https://phabricator.wikimedia.org/P53255 and previous config saved to /var/cache/conftool/dbconfig/20231109-215741-arnaudb.json
  • 21:57 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 21:57 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 21:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T348183)', diff saved to https://phabricator.wikimedia.org/P53254 and previous config saved to /var/cache/conftool/dbconfig/20231109-215719-arnaudb.json
  • 21:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P53253 and previous config saved to /var/cache/conftool/dbconfig/20231109-214213-arnaudb.json
  • 21:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2146.codfw.wmnet with OS bookworm
  • 21:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P53252 and previous config saved to /var/cache/conftool/dbconfig/20231109-212707-arnaudb.json
  • 21:24 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 21:24 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 21:24 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 21:19 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:18 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2146.codfw.wmnet with reason: host reimage
  • 21:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2146.codfw.wmnet with reason: host reimage
  • 21:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T348183)', diff saved to https://phabricator.wikimedia.org/P53251 and previous config saved to /var/cache/conftool/dbconfig/20231109-211200-arnaudb.json
  • 21:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T348183)', diff saved to https://phabricator.wikimedia.org/P53250 and previous config saved to /var/cache/conftool/dbconfig/20231109-210806-arnaudb.json
  • 21:08 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 21:07 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 21:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T348183)', diff saved to https://phabricator.wikimedia.org/P53249 and previous config saved to /var/cache/conftool/dbconfig/20231109-210744-arnaudb.json
  • 21:04 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:04 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove old crX-codfw sandbox int IPs - cmooney@cumin1001"
  • 21:03 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove old crX-codfw sandbox int IPs - cmooney@cumin1001"
  • 21:02 brennen: no pathces for utc late backport & config
  • 21:01 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 21:01 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 20:56 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2146.codfw.wmnet with OS bookworm
  • 20:55 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 20:55 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 20:55 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 20:55 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 20:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2146 T350916', diff saved to https://phabricator.wikimedia.org/P53248 and previous config saved to /var/cache/conftool/dbconfig/20231109-205445-root.json
  • 20:54 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1005.eqiad.wmnet with OS bullseye
  • 20:54 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1110.eqiad.wmnet with OS bullseye
  • 20:54 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 20:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P53247 and previous config saved to /var/cache/conftool/dbconfig/20231109-205238-arnaudb.json
  • 20:45 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.3.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 20:45 cmooney@cumin1001: START - Cookbook sre.dns.wipe-cache 2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.3.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 20:41 topranks: change anycast gw type to single-IP on ssw1-aX-codfw for sandbox1-a-codfw vlan (T350579)
  • 20:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P53246 and previous config saved to /var/cache/conftool/dbconfig/20231109-203731-arnaudb.json
  • 20:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1110.eqiad.wmnet with reason: host reimage
  • 20:32 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1110.eqiad.wmnet with reason: host reimage
  • 20:32 volans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1108.eqiad.wmnet with OS bullseye
  • 20:32 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1005.eqiad.wmnet with reason: host reimage
  • 20:29 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:29 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS entries for ssw1-aX-codfw xlink IPs. - cmooney@cumin1001"
  • 20:28 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS entries for ssw1-aX-codfw xlink IPs. - cmooney@cumin1001"
  • 20:28 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1005.eqiad.wmnet with reason: host reimage
  • 20:25 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 20:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T348183)', diff saved to https://phabricator.wikimedia.org/P53245 and previous config saved to /var/cache/conftool/dbconfig/20231109-202225-arnaudb.json
  • 20:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T348183)', diff saved to https://phabricator.wikimedia.org/P53244 and previous config saved to /var/cache/conftool/dbconfig/20231109-201830-arnaudb.json
  • 20:18 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 20:18 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 20:17 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1110.eqiad.wmnet with OS bullseye
  • 20:17 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1110.eqiad.wmnet with OS bullseye
  • 20:16 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 20:16 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 20:15 topranks: resetting asw-a-codfw et-2/0/52 to shift traffic away from ssw1-a8-codfw (T347191)
  • 20:14 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 20:14 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 20:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T348183)', diff saved to https://phabricator.wikimedia.org/P53243 and previous config saved to /var/cache/conftool/dbconfig/20231109-201409-arnaudb.json
  • 20:13 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1108.eqiad.wmnet with reason: host reimage
  • 20:13 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1110.eqiad.wmnet with OS bullseye
  • 20:12 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1110.eqiad.wmnet with OS bullseye
  • 20:12 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw-a-codfw,ssw1-a8-codfw,ssw1-a8-codfw.mgmt with reason: Adjust vlans trunked to asw-a-codfw from ssw1-a8-codfw T347191
  • 20:12 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on asw-a-codfw,ssw1-a8-codfw,ssw1-a8-codfw.mgmt with reason: Adjust vlans trunked to asw-a-codfw from ssw1-a8-codfw T347191
  • 20:10 volans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1108.eqiad.wmnet with reason: host reimage
  • 20:08 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1107.eqiad.wmnet with OS bullseye
  • 20:06 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1110.eqiad.wmnet with OS bullseye
  • 20:06 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1110.eqiad.wmnet with OS bullseye
  • 19:59 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1110.eqiad.wmnet with OS bullseye
  • 19:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P53242 and previous config saved to /var/cache/conftool/dbconfig/20231109-195903-arnaudb.json
  • 19:55 volans@cumin1001: START - Cookbook sre.hosts.reimage for host cp1108.eqiad.wmnet with OS bullseye
  • 19:54 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1060.eqiad.wmnet with OS bookworm
  • 19:52 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1057.eqiad.wmnet with OS bookworm
  • 19:51 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs2012.codfw.wmnet: Applying JVM security upgrade - eevans@cumin1001
  • 19:50 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1107.eqiad.wmnet with reason: host reimage
  • 19:50 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1026.eqiad.wmnet with OS bookworm
  • 19:50 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1059.eqiad.wmnet with OS bookworm
  • 19:48 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1051.eqiad.wmnet with OS bookworm
  • 19:47 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1107.eqiad.wmnet with reason: host reimage
  • 19:47 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs2012.codfw.wmnet: Applying JVM security upgrade - eevans@cumin1001
  • 19:45 urbanecm@deploy2002: Finished scap: Backport for wikimaniawiki: Revert wordmark and tagline back (T350640) (duration: 07m 22s)
  • 19:44 volans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1108.eqiad.wmnet with OS bullseye
  • 19:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P53241 and previous config saved to /var/cache/conftool/dbconfig/20231109-194357-arnaudb.json
  • 19:43 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1027.eqiad.wmnet with OS bookworm
  • 19:41 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:41 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:38 volans@cumin1001: START - Cookbook sre.hosts.reimage for host cp1108.eqiad.wmnet with OS bullseye
  • 19:38 urbanecm@deploy2002: Started scap: Backport for wikimaniawiki: Revert wordmark and tagline back (T350640)
  • 19:34 urbanecm@deploy2002: Finished scap: Backport for wikimaniawiki: Switch back to standard logo (T350640) (duration: 07m 11s)
  • 19:33 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:33 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:32 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1108.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 19:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1105.eqiad.wmnet with OS bullseye
  • 19:32 sukhe@cumin1001: START - Cookbook sre.hosts.reimage for host cp1107.eqiad.wmnet with OS bullseye
  • 19:32 sukhe@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1107.eqiad.wmnet with OS bullseye
  • 19:30 volans@cumin1001: START - Cookbook sre.hosts.provision for host cp1108.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 19:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T348183)', diff saved to https://phabricator.wikimedia.org/P53240 and previous config saved to /var/cache/conftool/dbconfig/20231109-192850-arnaudb.json
  • 19:28 fnegri@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1060.eqiad.wmnet with reason: host reimage
  • 19:28 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 19:28 urbanecm@deploy2002: urbanecm: Backport for wikimaniawiki: Switch back to standard logo (T350640) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 19:27 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1057.eqiad.wmnet with reason: host reimage
  • 19:26 urbanecm@deploy2002: Started scap: Backport for wikimaniawiki: Switch back to standard logo (T350640)
  • 19:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1231 (T348183)', diff saved to https://phabricator.wikimedia.org/P53239 and previous config saved to /var/cache/conftool/dbconfig/20231109-192621-arnaudb.json
  • 19:26 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
  • 19:26 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
  • 19:25 sukhe@cumin1001: START - Cookbook sre.hosts.reimage for host cp1107.eqiad.wmnet with OS bullseye
  • 19:25 topranks: shutting down et-1/1/5.2201 (sandbox1-a-codfw) interfaces on crX-codfw (T348159)
  • 19:25 sukhe@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1107.eqiad.wmnet with OS bullseye
  • 19:24 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1059.eqiad.wmnet with reason: host reimage
  • 19:24 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 19:24 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 19:24 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T348183)', diff saved to https://phabricator.wikimedia.org/P53238 and previous config saved to /var/cache/conftool/dbconfig/20231109-192416-arnaudb.json
  • 19:22 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1026.eqiad.wmnet with reason: host reimage
  • 19:22 fnegri@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1051.eqiad.wmnet with reason: host reimage
  • 19:20 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1051.eqiad.wmnet with reason: host reimage
  • 19:20 sukhe@cumin1001: START - Cookbook sre.hosts.reimage for host cp1107.eqiad.wmnet with OS bullseye
  • 19:20 sukhe@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1107.eqiad.wmnet with OS bullseye
  • 19:19 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1027.eqiad.wmnet with reason: host reimage
  • 19:18 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1059.eqiad.wmnet with reason: host reimage
  • 19:18 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1060.eqiad.wmnet with reason: host reimage
  • 19:18 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1057.eqiad.wmnet with reason: host reimage
  • 19:17 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1026.eqiad.wmnet with reason: host reimage
  • 19:16 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1027.eqiad.wmnet with reason: host reimage
  • 19:15 sukhe@cumin1001: START - Cookbook sre.hosts.reimage for host cp1107.eqiad.wmnet with OS bullseye
  • 19:15 sukhe@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1107.eqiad.wmnet with OS bullseye
  • 19:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1105.eqiad.wmnet with reason: host reimage
  • 19:12 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 19:11 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1105.eqiad.wmnet with reason: host reimage
  • 19:09 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P53237 and previous config saved to /var/cache/conftool/dbconfig/20231109-190910-arnaudb.json
  • 19:07 sukhe@cumin1001: START - Cookbook sre.hosts.reimage for host cp1107.eqiad.wmnet with OS bullseye
  • 19:06 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1103.eqiad.wmnet with OS bullseye
  • 19:06 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 19:06 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:06 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS entries for sandbox1-codfw IPs - cmooney@cumin1001"
  • 19:05 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 19:05 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS entries for sandbox1-codfw IPs - cmooney@cumin1001"
  • 19:05 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 19:05 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1060.eqiad.wmnet with OS bookworm
  • 19:05 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1059.eqiad.wmnet with OS bookworm
  • 19:05 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 19:04 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1051.eqiad.wmnet with OS bookworm
  • 19:04 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host acmechief-test2001.codfw.wmnet with OS bookworm
  • 19:04 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 19:04 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1057.eqiad.wmnet with OS bookworm
  • 19:04 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1027.eqiad.wmnet with OS bookworm
  • 19:03 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 19:02 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1026.eqiad.wmnet with OS bookworm
  • 18:56 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1105.eqiad.wmnet with OS bullseye
  • 18:56 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1105.eqiad.wmnet with OS bullseye
  • 18:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P53236 and previous config saved to /var/cache/conftool/dbconfig/20231109-185403-arnaudb.json
  • 18:52 topranks: renumber VRRP GW VIP on crX-codfw for sandbox1-a-codfw (T348159)
  • 18:52 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1105.eqiad.wmnet with OS bullseye
  • 18:51 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1105.eqiad.wmnet with OS bullseye
  • 18:51 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on acmechief-test2001.codfw.wmnet with reason: host reimage
  • 18:49 topranks: Adding anycast gw config to ssw*codfw for vlan sandbox1-a-codfw (T348159)
  • 18:48 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1103.eqiad.wmnet with reason: host reimage
  • 18:46 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on acmechief-test2001.codfw.wmnet with reason: host reimage
  • 18:45 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1103.eqiad.wmnet with reason: host reimage
  • 18:40 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1105.eqiad.wmnet with OS bullseye
  • 18:40 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1105.eqiad.wmnet with OS bullseye
  • 18:39 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T348183)', diff saved to https://phabricator.wikimedia.org/P53235 and previous config saved to /var/cache/conftool/dbconfig/20231109-183857-arnaudb.json
  • 18:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1224 (T348183)', diff saved to https://phabricator.wikimedia.org/P53234 and previous config saved to /var/cache/conftool/dbconfig/20231109-183626-arnaudb.json
  • 18:36 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 18:36 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 18:36 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 18:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T348183)', diff saved to https://phabricator.wikimedia.org/P53233 and previous config saved to /var/cache/conftool/dbconfig/20231109-183603-arnaudb.json
  • 18:35 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 18:34 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 18:33 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 18:33 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 18:32 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1105.eqiad.wmnet with OS bullseye
  • 18:32 bd808@deploy2002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 18:31 brett@cumin2002: START - Cookbook sre.hosts.reimage for host acmechief-test2001.codfw.wmnet with OS bookworm
  • 18:30 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
  • 18:30 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1103.eqiad.wmnet with OS bullseye
  • 18:30 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 18:30 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 18:29 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 18:29 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 18:29 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 18:29 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 18:29 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:28 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:24 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
  • 18:24 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1103.eqiad.wmnet with OS bullseye
  • 18:23 eevans@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching aqs20[09-12].codfw.wmnet: Applying JVM security upgrade - eevans@cumin1001
  • 18:23 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:23 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P53232 and previous config saved to /var/cache/conftool/dbconfig/20231109-182057-arnaudb.json
  • 18:18 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
  • 18:15 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:15 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:15 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:15 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:12 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs20[09-12].codfw.wmnet: Applying JVM security upgrade - eevans@cumin1001
  • 18:07 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:07 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P53230 and previous config saved to /var/cache/conftool/dbconfig/20231109-180551-arnaudb.json
  • 18:03 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:03 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 17:51 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wdqs::test
  • 17:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T348183)', diff saved to https://phabricator.wikimedia.org/P53229 and previous config saved to /var/cache/conftool/dbconfig/20231109-175044-arnaudb.json
  • 17:48 fabfur: depooled service ats-be for cp1101 (T349244)
  • 17:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3316 (T348183)', diff saved to https://phabricator.wikimedia.org/P53228 and previous config saved to /var/cache/conftool/dbconfig/20231109-174801-arnaudb.json
  • 17:47 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 17:47 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 17:47 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T348183)', diff saved to https://phabricator.wikimedia.org/P53227 and previous config saved to /var/cache/conftool/dbconfig/20231109-174740-arnaudb.json
  • 17:45 fabfur: pooled cp1101 into upload cluster (both cdn and ats-be): T349244
  • 17:41 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: wdqs::test
  • 17:38 fabfur: removed cp1076 from HAProxy/Varnish pool (NOT ats-be pool) for T349244
  • 17:37 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1101.eqiad.wmnet
  • 17:37 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp1101.eqiad.wmnet
  • 17:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1101.eqiad.wmnet with OS bullseye
  • 17:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P53226 and previous config saved to /var/cache/conftool/dbconfig/20231109-173233-arnaudb.json
  • 17:25 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: elasticsearch::relforge
  • 17:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P53225 and previous config saved to /var/cache/conftool/dbconfig/20231109-171727-arnaudb.json
  • 17:16 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: elasticsearch::relforge
  • 17:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1101.eqiad.wmnet with reason: host reimage
  • 17:13 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1101.eqiad.wmnet with reason: host reimage
  • 17:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T348183)', diff saved to https://phabricator.wikimedia.org/P53224 and previous config saved to /var/cache/conftool/dbconfig/20231109-170220-arnaudb.json
  • 16:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T348183)', diff saved to https://phabricator.wikimedia.org/P53223 and previous config saved to /var/cache/conftool/dbconfig/20231109-165947-arnaudb.json
  • 16:59 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 16:59 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 16:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T348183)', diff saved to https://phabricator.wikimedia.org/P53222 and previous config saved to /var/cache/conftool/dbconfig/20231109-165925-arnaudb.json
  • 16:58 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1101.eqiad.wmnet with OS bullseye
  • 16:57 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1101.eqiad.wmnet with OS bullseye
  • 16:55 eevans@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching A:aqs-codfw: Applying JVM security upgrade - eevans@cumin1001
  • 16:52 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1101.eqiad.wmnet with OS bullseye
  • 16:51 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1101.eqiad.wmnet with OS bullseye
  • 16:48 ladsgroup@deploy2002: Finished scap: Backport for Enable pagelinks write both on enwiki (T345732) (duration: 08m 09s)
  • 16:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P53221 and previous config saved to /var/cache/conftool/dbconfig/20231109-164419-arnaudb.json
  • 16:43 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1101.eqiad.wmnet with OS bullseye
  • 16:42 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 16:41 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 16:41 ladsgroup@deploy2002: ladsgroup: Backport for Enable pagelinks write both on enwiki (T345732) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:41 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 16:40 ladsgroup@deploy2002: Started scap: Backport for Enable pagelinks write both on enwiki (T345732)
  • 16:31 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 16:31 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 16:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P53220 and previous config saved to /var/cache/conftool/dbconfig/20231109-162913-arnaudb.json
  • 16:26 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 16:26 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 16:23 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-codfw: Applying JVM security upgrade - eevans@cumin1001
  • 16:20 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-eqiad: Applying JVM security upgrade - eevans@cumin1001
  • 16:14 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 16:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T348183)', diff saved to https://phabricator.wikimedia.org/P53219 and previous config saved to /var/cache/conftool/dbconfig/20231109-161406-arnaudb.json
  • 16:13 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 16:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T348183)', diff saved to https://phabricator.wikimedia.org/P53218 and previous config saved to /var/cache/conftool/dbconfig/20231109-161134-arnaudb.json
  • 16:11 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 16:11 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 16:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T348183)', diff saved to https://phabricator.wikimedia.org/P53217 and previous config saved to /var/cache/conftool/dbconfig/20231109-161112-arnaudb.json
  • 16:08 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 16:07 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 16:07 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 16:06 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 16:06 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 16:06 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 15:58 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1025.eqiad.wmnet with OS bookworm
  • 15:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P53216 and previous config saved to /var/cache/conftool/dbconfig/20231109-155606-arnaudb.json
  • 15:47 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 15:46 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 15:46 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 15:45 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 15:45 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 15:45 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 15:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P53215 and previous config saved to /var/cache/conftool/dbconfig/20231109-154100-arnaudb.json
  • 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3315 (re)pooling @ 100%: Puppet changes', diff saved to https://phabricator.wikimedia.org/P53214 and previous config saved to /var/cache/conftool/dbconfig/20231109-153321-root.json
  • 15:32 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1025.eqiad.wmnet with reason: host reimage
  • 15:29 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-eqiad: Applying JVM security upgrade - eevans@cumin1001
  • 15:29 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1025.eqiad.wmnet with reason: host reimage
  • 15:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 100%: Puppet changes', diff saved to https://phabricator.wikimedia.org/P53213 and previous config saved to /var/cache/conftool/dbconfig/20231109-152856-root.json
  • 15:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T348183)', diff saved to https://phabricator.wikimedia.org/P53212 and previous config saved to /var/cache/conftool/dbconfig/20231109-152553-arnaudb.json
  • 15:25 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1100.eqiad.wmnet
  • 15:25 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp1100.eqiad.wmnet
  • 15:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: Puppet changes', diff saved to https://phabricator.wikimedia.org/P53211 and previous config saved to /var/cache/conftool/dbconfig/20231109-152438-root.json
  • 15:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T348183)', diff saved to https://phabricator.wikimedia.org/P53210 and previous config saved to /var/cache/conftool/dbconfig/20231109-152320-arnaudb.json
  • 15:23 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 15:23 fabfur: cp1100 inserted into cluster_text pool
  • 15:23 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 15:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T348183)', diff saved to https://phabricator.wikimedia.org/P53209 and previous config saved to /var/cache/conftool/dbconfig/20231109-152259-arnaudb.json
  • 15:21 fabfur: removed cp1075 from HAProxy/Varnish pool (NOT ats-be pool) for T349244
  • 15:20 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dbproxy1017.eqiad.wmnet
  • 15:20 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:20 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dbproxy1017.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
  • 15:19 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dbproxy1017.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
  • 15:18 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3315 (re)pooling @ 75%: Puppet changes', diff saved to https://phabricator.wikimedia.org/P53208 and previous config saved to /var/cache/conftool/dbconfig/20231109-151816-root.json
  • 15:17 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
  • 15:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 75%: Puppet changes', diff saved to https://phabricator.wikimedia.org/P53207 and previous config saved to /var/cache/conftool/dbconfig/20231109-151351-root.json
  • 15:12 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbproxy1017.eqiad.wmnet
  • 15:12 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bookworm
  • 15:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: Puppet changes', diff saved to https://phabricator.wikimedia.org/P53206 and previous config saved to /var/cache/conftool/dbconfig/20231109-150933-root.json
  • 15:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P53205 and previous config saved to /var/cache/conftool/dbconfig/20231109-150752-arnaudb.json
  • 15:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3315 (re)pooling @ 50%: Puppet changes', diff saved to https://phabricator.wikimedia.org/P53204 and previous config saved to /var/cache/conftool/dbconfig/20231109-150311-root.json
  • 15:00 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 15:00 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 14:59 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 14:59 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 14:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 50%: Puppet changes', diff saved to https://phabricator.wikimedia.org/P53203 and previous config saved to /var/cache/conftool/dbconfig/20231109-145846-root.json
  • 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: Puppet changes', diff saved to https://phabricator.wikimedia.org/P53202 and previous config saved to /var/cache/conftool/dbconfig/20231109-145428-root.json
  • 14:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P53201 and previous config saved to /var/cache/conftool/dbconfig/20231109-145246-arnaudb.json
  • 14:52 kostajh: UTC afternoon deploys done
  • 14:50 kharlan@deploy2002: Finished scap: Backport for MediaModeration: Define virtual domains mapping config (T350321) (duration: 07m 07s)
  • 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3315 (re)pooling @ 25%: Puppet changes', diff saved to https://phabricator.wikimedia.org/P53200 and previous config saved to /var/cache/conftool/dbconfig/20231109-144806-root.json
  • 14:46 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: ceph::server
  • 14:46 brouberol@cumin1001: START - Cookbook sre.hosts.reimage for host an-druid1005.eqiad.wmnet with OS bullseye
  • 14:44 kharlan@deploy2002: kharlan and dreamyjazz: Continuing with sync
  • 14:44 kharlan@deploy2002: kharlan and dreamyjazz: Backport for MediaModeration: Define virtual domains mapping config (T350321) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 25%: Puppet changes', diff saved to https://phabricator.wikimedia.org/P53199 and previous config saved to /var/cache/conftool/dbconfig/20231109-144342-root.json
  • 14:43 kharlan@deploy2002: Started scap: Backport for MediaModeration: Define virtual domains mapping config (T350321)
  • 14:41 kharlan@deploy2002: Finished scap: Backport for Revert "CheckUser: Set 'debug' log level" (T345591) (duration: 07m 43s)
  • 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: Puppet changes', diff saved to https://phabricator.wikimedia.org/P53198 and previous config saved to /var/cache/conftool/dbconfig/20231109-143924-root.json
  • 14:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T348183)', diff saved to https://phabricator.wikimedia.org/P53197 and previous config saved to /var/cache/conftool/dbconfig/20231109-143739-arnaudb.json
  • 14:36 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: ceph::server
  • 14:36 kharlan@deploy2002: kharlan and dreamyjazz: Continuing with sync
  • 14:35 kharlan@deploy2002: kharlan and dreamyjazz: Backport for Revert "CheckUser: Set 'debug' log level" (T345591) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T348183)', diff saved to https://phabricator.wikimedia.org/P53196 and previous config saved to /var/cache/conftool/dbconfig/20231109-143508-arnaudb.json
  • 14:35 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 14:34 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 14:34 kharlan@deploy2002: Started scap: Backport for Revert "CheckUser: Set 'debug' log level" (T345591)
  • 14:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3315 (re)pooling @ 10%: Puppet changes', diff saved to https://phabricator.wikimedia.org/P53195 and previous config saved to /var/cache/conftool/dbconfig/20231109-143301-root.json
  • 14:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_cluster::datahub::opensearch
  • 14:32 arnaudb@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53194 and previous config saved to /var/cache/conftool/dbconfig/20231109-143254-arnaudb.json
  • 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2171:3315 to test puppet changes', diff saved to https://phabricator.wikimedia.org/P53193 and previous config saved to /var/cache/conftool/dbconfig/20231109-143051-root.json
  • 14:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 10%: Puppet changes', diff saved to https://phabricator.wikimedia.org/P53191 and previous config saved to /var/cache/conftool/dbconfig/20231109-142837-root.json
  • 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1224 to test puppet changes', diff saved to https://phabricator.wikimedia.org/P53190 and previous config saved to /var/cache/conftool/dbconfig/20231109-142621-root.json
  • 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: Puppet changes', diff saved to https://phabricator.wikimedia.org/P53189 and previous config saved to /var/cache/conftool/dbconfig/20231109-142419-root.json
  • 14:22 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: analytics_cluster::datahub::opensearch
  • 14:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134 to test puppet changes', diff saved to https://phabricator.wikimedia.org/P53188 and previous config saved to /var/cache/conftool/dbconfig/20231109-142139-root.json
  • 14:21 moritzm: restarting turnilo on an-tool1007
  • 14:19 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: schema update via T343198
  • 14:18 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: schema update via T343198
  • 14:17 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_cluster::turnilo
  • 14:17 arnaudb@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 90%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53187 and previous config saved to /var/cache/conftool/dbconfig/20231109-141749-arnaudb.json
  • 14:10 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: analytics_cluster::turnilo
  • 14:06 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: karapace
  • 14:02 arnaudb@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53185 and previous config saved to /var/cache/conftool/dbconfig/20231109-140245-arnaudb.json
  • 13:55 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: karapace
  • 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host stat1009.eqiad.wmnet
  • 13:47 arnaudb@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 60%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53184 and previous config saved to /var/cache/conftool/dbconfig/20231109-134740-arnaudb.json
  • 13:41 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host stat1009.eqiad.wmnet
  • 13:32 arnaudb@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 45%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53183 and previous config saved to /var/cache/conftool/dbconfig/20231109-133235-arnaudb.json
  • 13:17 arnaudb@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 30%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53180 and previous config saved to /var/cache/conftool/dbconfig/20231109-131730-arnaudb.json
  • 13:02 arnaudb@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 15%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53179 and previous config saved to /var/cache/conftool/dbconfig/20231109-130225-arnaudb.json
  • 12:44 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T348183)', diff saved to https://phabricator.wikimedia.org/P53178 and previous config saved to /var/cache/conftool/dbconfig/20231109-124404-arnaudb.json
  • 12:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:43 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 12:43 kamila@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:43 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 12:42 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:42 kamila@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:38 moritzm: installing qemu security updates
  • 12:33 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: dumps::generation::server::misccrons
  • 12:23 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
  • 12:23 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
  • 12:21 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: dumps::generation::server::misccrons
  • 12:16 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: dumps::generation::server::xmldumps
  • 12:09 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: dumps::generation::server::xmldumps
  • 12:08 moritzm: installing python-reportlab security updates
  • 11:54 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: dumps::generation::server::xmlfallback
  • 11:43 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: dumps::generation::server::xmlfallback
  • 11:38 _joe_: disabled requestctl cache-text/wikifeeds_featured T350645 T346657
  • 11:26 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: url_downloader
  • 11:20 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: url_downloader
  • 11:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host urldownloader2003.wikimedia.org
  • 11:09 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
  • 11:09 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
  • 11:09 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
  • 11:08 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
  • 11:08 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
  • 11:08 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
  • 11:07 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 11:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 11:06 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
  • 11:06 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
  • 11:06 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
  • 11:05 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
  • 11:05 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
  • 11:05 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 11:05 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
  • 11:04 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 11:04 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 11:03 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 10:32 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 10:31 jiji@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 10:28 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 10:27 jiji@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 09:55 btullis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
  • 09:52 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host urldownloader2003.wikimedia.org
  • 09:51 btullis@deploy2002: helmfile [eqiad] START helmfile.d/services/datahub: sync on main
  • 09:51 btullis@deploy2002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
  • 09:41 btullis@deploy2002: helmfile [codfw] START helmfile.d/services/datahub: sync on main
  • 09:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: Deploy 1.42.0-wmf.4 to group2 (labswiki staying at 1.42.0-wmf.3 due to T350836)
  • 09:35 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: insetup::infrastructure_foundations
  • 09:24 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: insetup::infrastructure_foundations
  • 09:18 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: insetup::data_engineering
  • 09:08 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: insetup::data_engineering
  • 08:47 godog: add 50G to prometheus/ml-serve in codfw
  • 08:35 Emperor: restart vopsbot.service on alert1001
  • 08:25 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
  • 08:25 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
  • 08:21 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: etcd::v3::dse_k8s_etcd
  • 08:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagetcd2001.codfw.wmnet
  • 08:19 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 08:19 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host kubestagetcd2001.codfw.wmnet
  • 08:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: etcd::v3::kubernetes::staging
  • 08:14 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 08:13 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 08:07 oblivian@deploy2002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 08:07 oblivian@deploy2002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 08:07 oblivian@deploy2002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 08:07 oblivian@deploy2002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 07:55 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: etcd::v3::kubernetes::staging
  • 07:35 moritzm: installing openjdk-8 security updates
  • 07:15 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 07:15 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 07:09 arnaudb@cumin1001: dbctl commit (dc=all): 'Depool db2112 T350142', diff saved to https://phabricator.wikimedia.org/P53177 and previous config saved to /var/cache/conftool/dbconfig/20231109-070936-arnaudb.json
  • 07:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Promote db2103 to s1 primary and set section read-write T350142', diff saved to https://phabricator.wikimedia.org/P53176 and previous config saved to /var/cache/conftool/dbconfig/20231109-070410-arnaudb.json
  • 07:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Set s1 codfw as read-only for maintenance - T350142', diff saved to https://phabricator.wikimedia.org/P53175 and previous config saved to /var/cache/conftool/dbconfig/20231109-070012-arnaudb.json
  • 07:00 arnaudb: Starting s1 codfw failover from db2112 to db2103 - T350142
  • 06:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Set db2103 with weight 0 T350142', diff saved to https://phabricator.wikimedia.org/P53174 and previous config saved to /var/cache/conftool/dbconfig/20231109-062725-arnaudb.json
  • 06:26 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 35 hosts with reason: Primary switchover s1 T350142
  • 06:26 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 35 hosts with reason: Primary switchover s1 T350142

2023-11-08

  • 23:31 wfan: civicrm upgraded from 81bd4c7d to 88361167
  • 23:28 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart (java 11 sec updates) - ryankemper@cumin1001 - T350703
  • 22:24 milimetric@deploy2002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
  • 22:24 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase10[25-27,30,33].eqiad.wmnet: Applying JVM security upgrade (row A) - eevans@cumin1001
  • 22:24 milimetric@deploy2002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
  • 22:24 milimetric@deploy2002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
  • 22:23 milimetric@deploy2002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
  • 22:23 milimetric@deploy2002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
  • 22:23 milimetric@deploy2002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
  • 22:14 milimetric@deploy2002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
  • 22:13 milimetric@deploy2002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
  • 22:13 milimetric@deploy2002: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
  • 22:13 milimetric@deploy2002: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
  • 22:13 milimetric@deploy2002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
  • 22:13 milimetric@deploy2002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
  • 22:13 milimetric@deploy2002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
  • 22:12 milimetric@deploy2002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
  • 22:12 milimetric@deploy2002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
  • 22:12 milimetric@deploy2002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
  • 22:08 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart (java 11 sec updates) - ryankemper@cumin1001 - T350703
  • 21:50 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10[25-27,30,33].eqiad.wmnet: Applying JVM security upgrade (row A) - eevans@cumin1001
  • 21:48 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase10[22-24,29,32].eqiad.wmnet: Applying JVM security upgrade (row A) - eevans@cumin1001
  • 20:28 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host acmechief-test1001.eqiad.wmnet with OS bookworm
  • 20:26 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:25 otto@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:23 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*.codfw.wmnet: Applying JVM security upgrade - eevans@cumin1001
  • 20:21 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:21 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:19 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*.eqiad.wmnet: Applying JVM security upgrade - eevans@cumin1001
  • 20:10 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*.eqiad.wmnet: Applying JVM security upgrade - eevans@cumin1001
  • 20:08 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Applying JVM security upgrade - eevans@cumin1001
  • 20:05 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on acmechief-test1001.eqiad.wmnet with reason: host reimage
  • 20:02 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on acmechief-test1001.eqiad.wmnet with reason: host reimage
  • 20:01 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:01 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:57 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Applying JVM security upgrade - eevans@cumin1001
  • 19:49 brett@cumin2002: START - Cookbook sre.hosts.reimage for host acmechief-test1001.eqiad.wmnet with OS bookworm
  • 16:54 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:54 otto@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:48 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad
  • 16:47 jforrester@deploy2002: Finished scap: Backport for Skip PerformanceBudgetTest::testTotalModulesSize (T350338), Modify regex to reflect updated DOM (T350777) (duration: 07m 29s)
  • 16:41 jforrester@deploy2002: jforrester: Continuing with sync
  • 16:40 jforrester@deploy2002: jforrester: Backport for Skip PerformanceBudgetTest::testTotalModulesSize (T350338), Modify regex to reflect updated DOM (T350777) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:39 jforrester@deploy2002: Started scap: Backport for Skip PerformanceBudgetTest::testTotalModulesSize (T350338), Modify regex to reflect updated DOM (T350777)
  • 16:38 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
  • 16:34 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@869cca4]: Set group ownership of processed sparql queries (duration: 00m 27s)
  • 16:33 ebernhardson@deploy2002: Started deploy [airflow-dags/search@869cca4]: Set group ownership of processed sparql queries
  • 16:31 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
  • 16:24 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: dse_k8s::master
  • 16:23 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad
  • 16:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1014.eqiad.wmnet with OS bookworm
  • 16:11 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: dse_k8s::master
  • 16:09 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: dse_k8s::worker
  • 16:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1014.eqiad.wmnet with reason: host reimage
  • 16:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1014.eqiad.wmnet with reason: host reimage
  • 15:57 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: dse_k8s::worker
  • 15:57 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 15:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage2001.codfw.wmnet
  • 15:48 bvibber: brion running requeueTranscodes.php on mwmaint2002 to continue backfill for iOS-compatible low-res video (throttled)
  • 15:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host kubestage2001.codfw.wmnet
  • 15:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagemaster2001.codfw.wmnet
  • 15:41 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
  • 15:41 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
  • 15:33 bvibber: brion running requeueTranscodes.php to batch-remove old low-res VP9 WebM transcodes (should be low impact)
  • 15:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host kubestagemaster2001.codfw.wmnet
  • 15:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc2014.codfw.wmnet with reason: host reimage
  • 15:27 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:27 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:26 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 15:26 jiji@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 15:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2014.codfw.wmnet with reason: host reimage
  • 15:18 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: kubernetes::staging::master
  • 15:08 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc2014.codfw.wmnet with OS bookworm
  • 15:07 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: kubernetes::staging::master
  • 15:04 marostegui@deploy2002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc2014 to pc2 master" (duration: 06m 51s)
  • 14:59 marostegui@deploy2002: marostegui: Continuing with sync
  • 14:59 marostegui@deploy2002: marostegui: Backport for Revert "ProductionServices.php: Promote pc2014 to pc2 master" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:59 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: kubernetes::staging::worker
  • 14:58 marostegui@deploy2002: Started scap: Backport for Revert "ProductionServices.php: Promote pc2014 to pc2 master"
  • 14:53 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:53 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:51 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: kubernetes::staging::worker
  • 14:51 marostegui@deploy2002: Finished scap: Backport for ProductionServices.php: Promote pc2014 to pc2 master (duration: 08m 41s)
  • 14:46 marostegui@deploy2002: marostegui: Continuing with sync
  • 14:44 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_cluster::zookeeper
  • 14:44 marostegui@deploy2002: marostegui: Backport for ProductionServices.php: Promote pc2014 to pc2 master synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:42 marostegui@deploy2002: Started scap: Backport for ProductionServices.php: Promote pc2014 to pc2 master
  • 14:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc[2012,2014].codfw.wmnet,pc1012.eqiad.wmnet with reason: Upgrade
  • 14:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc[2012,2014].codfw.wmnet,pc1012.eqiad.wmnet with reason: Upgrade
  • 14:40 taavi@deploy2002: Finished scap: Backport for [bnwikisource] Change the wordmark (T350482), [plwiki] Add 'abusefilter-log-private' flag to sysops (T350509) (duration: 07m 45s)
  • 14:35 _joe_: Running puppet on cp-text to pick up the increase in traffic to mw on k8s
  • 14:35 taavi@deploy2002: taavi and superpes: Continuing with sync
  • 14:34 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: analytics_cluster::zookeeper
  • 14:34 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 14:34 taavi@deploy2002: taavi and superpes: Backport for [bnwikisource] Change the wordmark (T350482), [plwiki] Add 'abusefilter-log-private' flag to sysops (T350509) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:33 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:32 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:32 taavi@deploy2002: Started scap: Backport for [bnwikisource] Change the wordmark (T350482), [plwiki] Add 'abusefilter-log-private' flag to sysops (T350509)
  • 14:32 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:32 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:32 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:32 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:31 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1006.eqiad.wmnet with reason: host reimage
  • 14:30 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:28 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:28 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:27 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1006.eqiad.wmnet with reason: host reimage
  • 14:26 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:26 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:23 taavi@deploy2002: Finished scap: Backport for Remove feature flag for email (T347067), Remove feature flag for email (T347067), prod: Stop setting $wgCampaignEventsEnableEmail, unused (T347067) (duration: 12m 19s)
  • 14:17 taavi@deploy2002: taavi and daimona: Continuing with sync
  • 14:12 taavi@deploy2002: taavi and daimona: Backport for Remove feature flag for email (T347067), Remove feature flag for email (T347067), prod: Stop setting $wgCampaignEventsEnableEmail, unused (T347067) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:10 taavi@deploy2002: Started scap: Backport for Remove feature flag for email (T347067), Remove feature flag for email (T347067), prod: Stop setting $wgCampaignEventsEnableEmail, unused (T347067)
  • 14:10 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1006.eqiad.wmnet with OS bookworm
  • 14:06 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: kafka::test::broker
  • 14:04 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 14:04 jiji@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 14:04 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 14:03 jiji@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 13:59 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on 15 hosts with reason: not pooled, reimaging in progress
  • 13:59 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on 15 hosts with reason: not pooled, reimaging in progress
  • 13:55 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: kafka::test::broker
  • 13:55 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_cluster::hadoop::worker
  • 13:34 moritzm: installing libxpm security updates
  • 13:19 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: openldap::replica
  • 13:14 taavi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:14 taavi@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: free up nfs-maps IPs T350259 - taavi@cumin1001"
  • 13:12 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: free up nfs-maps IPs T350259 - taavi@cumin1001"
  • 13:10 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.4 refs T350080
  • 13:10 taavi@cumin1001: START - Cookbook sre.dns.netbox
  • 13:08 stevemunene@cumin1001: END (FAIL) - Cookbook sre.druid.roll-restart-workers (exit_code=99) for Druid public cluster: Roll restart of Druid jvm daemons.
  • 13:04 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: openldap::replica
  • 11:49 ladsgroup@deploy2002: Finished scap: Backport for Only take one field in fetchFieldValues (T350726) (duration: 07m 00s)
  • 11:43 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 11:43 ladsgroup@deploy2002: ladsgroup: Backport for Only take one field in fetchFieldValues (T350726) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 11:42 ladsgroup@deploy2002: Started scap: Backport for Only take one field in fetchFieldValues (T350726)
  • 11:37 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 11:37 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 11:37 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 11:33 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 11:32 effie: stopping puppet from mc2038
  • 11:15 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.migrate-role (exit_code=99) for role: analytics_cluster::hadoop::worker
  • 11:12 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 11:12 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 11:12 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 11:11 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 11:11 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 11:11 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 11:11 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 11:11 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 11:11 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 11:10 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 11:10 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 11:09 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 11:09 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 11:09 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 11:09 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 11:08 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 11:08 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 11:07 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-esams and A:cp
  • 11:07 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 11:07 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 11:07 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 11:06 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:05 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:05 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:05 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:04 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 11:04 btullis@cumin1001: Added views for new wiki: fonwiki T347938
  • 10:43 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: analytics_cluster::hadoop::worker
  • 10:40 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 10:40 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 10:40 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 10:40 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 10:33 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: dumps::web::htmldumps
  • 10:30 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 10:25 brouberol@deploy2002: Finished deploy [airflow-dags/analytics@af7f4e5]: (no justification provided) (duration: 00m 31s)
  • 10:24 brouberol@deploy2002: Started deploy [airflow-dags/analytics@af7f4e5]: (no justification provided)
  • 10:24 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: dumps::web::htmldumps
  • 10:24 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-esams and A:cp
  • 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host an-worker1111.eqiad.wmnet
  • 10:06 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host an-worker1111.eqiad.wmnet
  • 09:57 arnaudb@cumin1001: dbctl commit (dc=all): 'db1236 (re)pooling @ 100%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53170 and previous config saved to /var/cache/conftool/dbconfig/20231108-095701-arnaudb.json
  • 09:41 arnaudb@cumin1001: dbctl commit (dc=all): 'db1236 (re)pooling @ 90%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53169 and previous config saved to /var/cache/conftool/dbconfig/20231108-094156-arnaudb.json
  • 09:11 arnaudb@cumin1001: dbctl commit (dc=all): 'db1236 (re)pooling @ 60%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53167 and previous config saved to /var/cache/conftool/dbconfig/20231108-091146-arnaudb.json
  • 09:02 oblivian@deploy2002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 09:02 oblivian@deploy2002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 09:02 oblivian@deploy2002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 09:02 oblivian@deploy2002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 08:56 arnaudb@cumin1001: dbctl commit (dc=all): 'db1236 (re)pooling @ 45%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53166 and previous config saved to /var/cache/conftool/dbconfig/20231108-085641-arnaudb.json
  • 08:55 moritzm: restarting archiva to pick up Java security updates
  • 08:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 45899
  • 08:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 45899
  • 08:51 moritzm: installing openjdk-8 security updates
  • 08:49 oblivian@deploy2002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 08:49 oblivian@deploy2002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 08:49 oblivian@deploy2002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 08:49 oblivian@deploy2002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 08:41 arnaudb@cumin1001: dbctl commit (dc=all): 'db1236 (re)pooling @ 30%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53165 and previous config saved to /var/cache/conftool/dbconfig/20231108-084136-arnaudb.json
  • 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: druid::test_analytics::worker
  • 08:26 arnaudb@cumin1001: dbctl commit (dc=all): 'db1236 (re)pooling @ 15%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53164 and previous config saved to /var/cache/conftool/dbconfig/20231108-082631-arnaudb.json
  • 08:16 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: druid::test_analytics::worker
  • 08:08 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_cluster::turnilo::staging
  • 07:58 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: analytics_cluster::turnilo::staging
  • 07:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: zookeeper::test
  • 07:42 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: zookeeper::test
  • 00:16 urbanecm: mwmaint2002: Stop T315510#9312431 instances of extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php (T315510)

2023-11-07

  • 23:09 ladsgroup@deploy2002: Finished scap: Backport for styles: Fix stylesheet validation issues (duration: 07m 14s)
  • 23:04 ladsgroup@deploy2002: ladsgroup and volker-e: Continuing with sync
  • 23:03 ladsgroup@deploy2002: ladsgroup and volker-e: Backport for styles: Fix stylesheet validation issues synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:02 ladsgroup@deploy2002: Started scap: Backport for styles: Fix stylesheet validation issues
  • 22:19 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 22:19 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 22:19 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 22:19 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 22:18 ladsgroup@deploy2002: Finished scap: Backport for Replace WikimediaUI Base with Codex design tokens (T331403 T334934) (duration: 09m 15s)
  • 22:13 ladsgroup@deploy2002: ladsgroup and volker-e: Continuing with sync
  • 22:10 ladsgroup@deploy2002: ladsgroup and volker-e: Backport for Replace WikimediaUI Base with Codex design tokens (T331403 T334934) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:09 ladsgroup@deploy2002: Started scap: Backport for Replace WikimediaUI Base with Codex design tokens (T331403 T334934)
  • 22:00 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - bking@cumin2002 - T350703
  • 21:58 tgr@deploy2002: Finished scap: Backport for Fix centralauthtoken key schema migration (T347223 T350723) (duration: 13m 17s)
  • 21:53 tgr@deploy2002: tgr: Continuing with sync
  • 21:46 tgr@deploy2002: tgr: Backport for Fix centralauthtoken key schema migration (T347223 T350723) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:45 tgr@deploy2002: Started scap: Backport for Fix centralauthtoken key schema migration (T347223 T350723)
  • 21:37 tgr@deploy2002: Finished scap: Backport for CentralAuth: Clear domain cookie when setting non-domain cookie (T350695) (duration: 20m 27s)
  • 21:36 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - bking@cumin2002 - T350703
  • 21:35 tzatziki: changing email for User:Rlayton-WMF
  • 21:32 tgr@deploy2002: tgr: Continuing with sync
  • 21:18 tgr@deploy2002: tgr: Backport for CentralAuth: Clear domain cookie when setting non-domain cookie (T350695) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:17 tgr@deploy2002: Started scap: Backport for CentralAuth: Clear domain cookie when setting non-domain cookie (T350695)
  • 21:14 tgr@deploy2002: Finished scap: Backport for Enable edit check on fonwiki (T350634) (duration: 09m 45s)
  • 21:09 tgr@deploy2002: tgr and kemayo: Continuing with sync
  • 21:06 tgr@deploy2002: tgr and kemayo: Backport for Enable edit check on fonwiki (T350634) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:05 tgr@deploy2002: Started scap: Backport for Enable edit check on fonwiki (T350634)
  • 20:47 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host stewards1001.eqiad.wmnet
  • 20:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host stewards1001.eqiad.wmnet with OS bookworm
  • 20:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on stewards1001.eqiad.wmnet with reason: host reimage
  • 20:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on stewards1001.eqiad.wmnet with reason: host reimage
  • 20:20 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host stewards1001.eqiad.wmnet with OS bookworm
  • 20:19 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM stewards1001.eqiad.wmnet - dzahn@cumin1001"
  • 20:18 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM stewards1001.eqiad.wmnet - dzahn@cumin1001"
  • 20:18 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) stewards1001.eqiad.wmnet on all recursors
  • 20:18 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache stewards1001.eqiad.wmnet on all recursors
  • 20:18 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:18 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM stewards1001.eqiad.wmnet - dzahn@cumin1001"
  • 20:16 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM stewards1001.eqiad.wmnet - dzahn@cumin1001"
  • 19:58 wfan: payments-wiki change from 1d66a20f to 8c073c23, config revision changed from c841729a to 5c00d761
  • 19:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1192 (re)pooling @ 100%: Change binlog format', diff saved to https://phabricator.wikimedia.org/P53158 and previous config saved to /var/cache/conftool/dbconfig/20231107-195420-root.json
  • 19:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1192 (re)pooling @ 75%: Change binlog format', diff saved to https://phabricator.wikimedia.org/P53157 and previous config saved to /var/cache/conftool/dbconfig/20231107-193915-root.json
  • 19:32 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 19:32 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 19:26 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: mail::mx
  • 19:25 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 19:25 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 19:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1192 (re)pooling @ 50%: Change binlog format', diff saved to https://phabricator.wikimedia.org/P53156 and previous config saved to /var/cache/conftool/dbconfig/20231107-192410-root.json
  • 19:22 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 19:22 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host stewards1001.eqiad.wmnet
  • 19:18 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 19:17 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: mail::mx
  • 19:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host stewards2001.codfw.wmnet with OS bookworm
  • 19:16 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:13 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: kerberos::kdc
  • 19:12 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1192 (re)pooling @ 25%: Change binlog format', diff saved to https://phabricator.wikimedia.org/P53155 and previous config saved to /var/cache/conftool/dbconfig/20231107-190905-root.json
  • 19:06 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: kerberos::kdc
  • 19:04 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: apt_staging
  • 19:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on stewards2001.codfw.wmnet with reason: host reimage
  • 19:01 marostegui@deploy2002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc2 master" (duration: 06m 40s)
  • 18:59 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on stewards2001.codfw.wmnet with reason: host reimage
  • 18:57 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: apt_staging
  • 18:56 marostegui@deploy2002: marostegui: Continuing with sync
  • 18:55 marostegui@deploy2002: marostegui: Backport for Revert "ProductionServices.php: Promote pc1014 to pc2 master" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 18:54 marostegui@deploy2002: Started scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc2 master"
  • 18:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1192 (re)pooling @ 10%: Change binlog format', diff saved to https://phabricator.wikimedia.org/P53154 and previous config saved to /var/cache/conftool/dbconfig/20231107-185400-root.json
  • 18:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1192 T346454', diff saved to https://phabricator.wikimedia.org/P53153 and previous config saved to /var/cache/conftool/dbconfig/20231107-185033-root.json
  • 18:44 marostegui@deploy2002: Finished scap: Backport for ProductionServices.php: Promote pc1014 to pc2 master (duration: 06m 47s)
  • 18:39 marostegui@deploy2002: marostegui: Continuing with sync
  • 18:38 marostegui@deploy2002: marostegui: Backport for ProductionServices.php: Promote pc1014 to pc2 master synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 18:37 marostegui@deploy2002: Started scap: Backport for ProductionServices.php: Promote pc1014 to pc2 master
  • 18:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc[2012,2014].codfw.wmnet,pc[1012,1014].eqiad.wmnet with reason: Upgrade
  • 18:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc[2012,2014].codfw.wmnet,pc[1012,1014].eqiad.wmnet with reason: Upgrade
  • 18:33 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:30 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host stewards2001.codfw.wmnet with OS bookworm
  • 18:30 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:30 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:30 herron: performing rolling memory increase on logstash collector VMs T350434
  • 18:29 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:27 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:27 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:27 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:26 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:26 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:25 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: netmon
  • 18:16 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: netmon
  • 18:13 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: mirrors
  • 18:08 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:08 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:06 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: mirrors
  • 17:21 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudservices1005.eqiad.wmnet with OS bookworm
  • 17:20 fnegri@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudservices1005.eqiad.wmnet
  • 17:13 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 17:13 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 17:13 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 17:13 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 17:13 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 17:12 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 17:12 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 17:12 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 17:12 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 17:11 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 17:11 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 17:11 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 17:10 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 17:10 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 17:09 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 17:09 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 17:08 fnegri@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1005.eqiad.wmnet
  • 17:08 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 17:07 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 17:03 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 17:02 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 17:02 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 17:02 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 17:01 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 17:01 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 17:00 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 16:58 urbanecm@deploy2002: Finished scap: Backport for changeWikiConfig: Add --touch option (T347157), changeWikiConfig: Add --touch option (T347157) (duration: 07m 08s)
  • 16:58 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 16:57 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 16:52 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 16:52 urbanecm@deploy2002: urbanecm: Backport for changeWikiConfig: Add --touch option (T347157), changeWikiConfig: Add --touch option (T347157) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:51 urbanecm@deploy2002: Started scap: Backport for changeWikiConfig: Add --touch option (T347157), changeWikiConfig: Add --touch option (T347157)
  • 16:47 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 16:47 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 16:47 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 16:47 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 16:42 ottomata: increasing eventgate cpu limits 1000m -> 1500m hopefully to reduce throttling, also setting stream_config_retries: 3 to avoid stream config refetch failures for eventgate-analytics-external.
  • 16:42 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 16:12 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 16:07 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 16:06 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_cluster::ui::superset
  • 15:58 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:58 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:58 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:57 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:55 btullis@cumin1001: END (FAIL) - Cookbook sre.wikireplicas.add-wiki (exit_code=99)
  • 15:49 moritzm: importing openjdk-8 8u392-ga-1~deb10u1 for buster-wikimedia to apt.wikimedia.org (latest Java 8 security fixes)
  • 15:48 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 15:48 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 15:48 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 15:47 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 15:46 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 15:42 bvibber: brion halting requeueTranscodes.php media backfill job insertions for a bit while the queue catches up
  • 15:39 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 15:39 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 15:38 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:38 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:36 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 15:34 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 15:30 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:30 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:29 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:29 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons.
  • 15:28 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:28 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1005.eqiad.wmnet with reason: host reimage
  • 15:28 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:28 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:28 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:28 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:28 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:26 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1005.eqiad.wmnet with reason: host reimage
  • 15:25 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS bookworm
  • 15:25 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:25 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:24 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:24 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:24 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:24 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:21 urbanecm@deploy2002: Finished scap: Backport for [Languages] Add namespaces names for dga and bbc-latn, [Languages] Add namespaces names for dga and bbc-latn (duration: 07m 37s)
  • 15:21 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 15:20 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 15:16 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 15:15 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:15 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:15 urbanecm@deploy2002: urbanecm: Backport for [Languages] Add namespaces names for dga and bbc-latn, [Languages] Add namespaces names for dga and bbc-latn synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:13 urbanecm@deploy2002: Started scap: Backport for [Languages] Add namespaces names for dga and bbc-latn, [Languages] Add namespaces names for dga and bbc-latn
  • 15:13 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1005.eqiad.wmnet with OS bookworm
  • 15:03 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
  • 15:00 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
  • 14:56 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
  • 14:55 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: analytics_cluster::ui::superset
  • 14:52 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 14:51 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 14:34 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
  • 14:10 jforrester@deploy2002: Finished scap: Backport for [wikifunctions] Alter site to General Availability (T349054 T349061 T349063 T349080 T349082) (duration: 07m 00s)
  • 14:09 urbanecm: mwmaint2002: Start multiple instances of extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php (T315510#9312431)
  • 14:09 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:09 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:04 jforrester@deploy2002: jforrester: Continuing with sync
  • 14:04 jforrester@deploy2002: jforrester: Backport for [wikifunctions] Alter site to General Availability (T349054 T349061 T349063 T349080 T349082) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:03 jforrester@deploy2002: Started scap: Backport for [wikifunctions] Alter site to General Availability (T349054 T349061 T349063 T349080 T349082)
  • 13:49 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 13:49 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 13:49 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 13:48 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 13:48 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 13:42 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: sync
  • 13:30 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 13:24 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 13:23 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: netbox::database
  • 13:20 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
  • 13:19 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/mathoid: apply
  • 13:19 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 13:19 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 13:18 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 13:18 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 13:09 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: netbox::database
  • 13:09 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: netbox::frontend
  • 12:49 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: netbox::frontend
  • 12:33 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_cluster::ui::superset::staging
  • 12:24 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: analytics_cluster::ui::superset::staging
  • 12:14 btullis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
  • 12:11 btullis@deploy2002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
  • 12:09 btullis@deploy2002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
  • 12:05 btullis@deploy2002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
  • 11:53 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 11:52 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 11:48 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 11:48 btullis@cumin1001: Added views for new wiki: dgawiki T350228
  • 11:48 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 11:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_test_cluster::client
  • 11:47 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 11:47 btullis@cumin1001: Added views for new wiki: bjnwikiquote T350234
  • 11:47 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 11:47 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/image-suggestion: apply
  • 11:46 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/image-suggestion: apply
  • 11:46 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: apply
  • 11:46 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/image-suggestion: apply
  • 11:45 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 11:45 btullis@cumin1001: Added views for new wiki: bbcwiki T350372
  • 11:45 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 11:44 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
  • 11:43 jayme@deploy2002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
  • 11:42 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp[1075-1090].eqiad.wmnet} and A:cp
  • 11:37 btullis@deploy2002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 11:35 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: analytics_test_cluster::client
  • 11:34 btullis@deploy2002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 11:33 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 11:33 btullis@cumin1001: Added views for new wiki: zghwiki T350240
  • 11:32 topranks: reset PIC in cr1-eqiad slot 1/1 to enable port et-1/1/2 at 100G for new transport (T350504)
  • 11:27 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 11:27 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_test_cluster::presto::server
  • 11:26 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 11:21 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 11:20 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 11:18 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 11:18 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: analytics_test_cluster::presto::server
  • 11:18 jayme@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_test_cluster::hadoop::worker
  • 11:13 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 11:04 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: analytics_test_cluster::hadoop::worker
  • 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_test_cluster::hadoop::standby
  • 11:03 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 11:03 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp[1075-1090].eqiad.wmnet} and A:cp
  • 11:02 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 11:02 jayme@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 10:59 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 10:54 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: analytics_test_cluster::hadoop::standby
  • 10:53 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_test_cluster::hadoop::master
  • 10:36 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 10:35 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: analytics_test_cluster::hadoop::master
  • 10:33 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 10:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_test_cluster::coordinator
  • 10:17 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-drmrs and A:cp
  • 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:maps-replica-eqiad
  • 10:12 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 10:11 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 10:11 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: analytics_test_cluster::coordinator
  • 10:11 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot rolling restart_daemons on A:maps-replica-eqiad
  • 10:10 moritzm: installing dbus security updates on bookworm
  • 10:09 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:maps-replica-codfw
  • 10:04 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 10:04 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 10:03 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot rolling restart_daemons on A:maps-replica-codfw
  • 10:03 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 10:02 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 09:59 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 09:58 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 09:55 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 09:54 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 09:53 moritzm: installing nss security updates
  • 09:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove drmrs-esams IPs - ayounsi@cumin1001"
  • 09:34 dcausse: restarting blazegraph on wdqs1007 (stuck for 10+hours)
  • 09:34 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove drmrs-esams IPs - ayounsi@cumin1001"
  • 09:32 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 09:27 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-drmrs and A:cp
  • 09:23 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 09:23 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 09:22 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.4 refs T350080
  • 09:13 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-codfw and A:cp
  • 09:12 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 09:07 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 09:07 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 09:00 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 09:00 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 08:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 08:44 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 08:35 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-codfw and A:cp
  • 06:16 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 35 hosts with reason: Primary switchover s1 T350142
  • 06:16 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 35 hosts with reason: Primary switchover s1 T350142
  • 05:48 kart_: Updated cxserver to 2023-11-06-060744-production (T333969, T350229, T350241, T350373)
  • 05:46 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 05:45 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 05:44 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 05:44 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 05:32 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 05:32 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 05:07 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 05:07 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 04:55 mwpresync@deploy2002: Pruned MediaWiki: 1.42.0-wmf.2 (duration: 02m 12s)
  • 04:53 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.42.0-wmf.4 refs T350080 (duration: 51m 04s)
  • 04:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.42.0-wmf.4 refs T350080
  • 00:32 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=96) for new host stewards2001.codfw.wmnet
  • 00:32 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM stewards2001.codfw.wmnet - dzahn@cumin1001"
  • 00:31 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM stewards2001.codfw.wmnet - dzahn@cumin1001"
  • 00:31 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) stewards2001.codfw.wmnet on all recursors
  • 00:31 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache stewards2001.codfw.wmnet on all recursors
  • 00:31 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:31 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM stewards2001.codfw.wmnet - dzahn@cumin1001"
  • 00:30 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM stewards2001.codfw.wmnet - dzahn@cumin1001"
  • 00:27 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 00:27 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host stewards2001.codfw.wmnet

2023-11-06

  • 23:05 ejegg: fundraising civicrm upgraded from 5be02f1b to f1d49e66
  • 23:02 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 23:02 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 23:01 cjming: end of UTC late backport window
  • 22:58 cjming@deploy2002: Finished scap: Backport for [Languages] Add namespace translations for zgh (duration: 11m 28s)
  • 22:52 cjming@deploy2002: cjming and jhsoby: Continuing with sync
  • 22:52 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 22:52 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 22:48 cjming@deploy2002: cjming and jhsoby: Backport for [Languages] Add namespace translations for zgh synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:46 cjming@deploy2002: Started scap: Backport for [Languages] Add namespace translations for zgh
  • 22:41 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 22:41 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 22:34 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 22:34 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 22:24 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 22:23 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 22:05 cjming@deploy2002: Finished scap: Backport for mznwiki: add project namespace (T350397) (duration: 09m 09s)
  • 22:00 cjming@deploy2002: cjming and anzx: Continuing with sync
  • 21:57 cjming@deploy2002: cjming and anzx: Backport for mznwiki: add project namespace (T350397) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:56 cjming@deploy2002: Started scap: Backport for mznwiki: add project namespace (T350397)
  • 21:48 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 21:47 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 21:47 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 21:46 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 21:46 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 21:45 cjming@deploy2002: Finished scap: Backport for Avoid nullish coalescing operators (T350519) (duration: 15m 09s)
  • 21:45 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 21:42 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 21:42 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 21:40 cjming@deploy2002: jdlrobson and cjming: Continuing with sync
  • 21:32 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 21:31 cjming@deploy2002: jdlrobson and cjming: Backport for Avoid nullish coalescing operators (T350519) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:30 cjming@deploy2002: Started scap: Backport for Avoid nullish coalescing operators (T350519)
  • 21:30 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 21:29 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 21:28 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 21:27 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 21:27 ottomata: eventgate-analytics-external - deploy change to remove 'dynamic' stream config support, instead just re-cache stream configs every 60s - https://phabricator.wikimedia.org/T326002
  • 21:24 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 21:23 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 21:17 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 21:17 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 21:12 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 21:12 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 21:04 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 21:00 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 20:54 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:54 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:49 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 20:48 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 20:46 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 20:46 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 20:41 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 20:40 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 20:38 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 20:32 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 20:29 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 20:29 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 20:21 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit1003.wikimedia.org with OS bookworm
  • 20:12 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 20:12 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 20:10 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 20:10 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 20:09 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 20:09 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 20:01 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit1003.wikimedia.org with reason: host reimage
  • 19:57 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit1003.wikimedia.org with reason: host reimage
  • 19:46 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1003.wikimedia.org with OS bookworm
  • 19:43 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudrabbit1003.wikimedia.org with OS bookworm
  • 19:41 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 19:41 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:49 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:49 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:49 ladsgroup@deploy2002: Finished scap: Backport for Add pc4 to the list of ParserCache clusters (T350367) (duration: 09m 32s)
  • 18:48 bking@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 18:47 bking@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 18:47 bking@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 18:47 bking@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 18:43 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 18:41 ladsgroup@deploy2002: ladsgroup: Backport for Add pc4 to the list of ParserCache clusters (T350367) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 18:40 ladsgroup@deploy2002: Started scap: Backport for Add pc4 to the list of ParserCache clusters (T350367)
  • 18:39 milimetric@deploy2002: Finished deploy [airflow-dags/analytics@048362b]: (no justification provided) (duration: 00m 29s)
  • 18:39 milimetric@deploy2002: Started deploy [airflow-dags/analytics@048362b]: (no justification provided)
  • 18:19 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1003.wikimedia.org with OS bookworm
  • 18:18 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudrabbit1003.wikimedia.org with OS bookworm
  • 18:11 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudrabbit1003']
  • 18:10 milimetric@deploy2002: Finished deploy [analytics/refinery@0239c23] (thin): Publishing refinery-source jars at 0.2.24 (duration: 00m 07s)
  • 18:09 milimetric@deploy2002: Started deploy [analytics/refinery@0239c23] (thin): Publishing refinery-source jars at 0.2.24
  • 18:04 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:03 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:03 milimetric@deploy2002: Finished deploy [analytics/refinery@0239c23]: Publishing refinery-source jars at 0.2.24 (duration: 07m 39s)
  • 18:02 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:02 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:01 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 17:58 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudrabbit1003']
  • 17:56 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudrabbit1003']
  • 17:55 milimetric@deploy2002: Started deploy [analytics/refinery@0239c23]: Publishing refinery-source jars at 0.2.24
  • 17:52 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 17:52 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 17:52 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 17:50 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 17:48 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 17:48 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 17:48 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 17:46 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 17:46 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 17:46 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudrabbit1003']
  • 17:41 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 17:41 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 17:35 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit1002.wikimedia.org with OS bookworm
  • 17:24 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup2011.codfw.wmnet with OS bookworm
  • 17:19 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit1002.wikimedia.org with reason: host reimage
  • 17:15 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit1002.wikimedia.org with reason: host reimage
  • 17:07 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup2011.codfw.wmnet with reason: host reimage
  • 17:05 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 17:03 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2011.codfw.wmnet with reason: host reimage
  • 17:03 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1002.wikimedia.org with OS bookworm
  • 17:03 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1003.wikimedia.org with OS bookworm
  • 17:01 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bookworm
  • 16:56 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
  • 16:56 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bookworm
  • 16:55 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 05m 34s)
  • 16:49 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 16:49 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 05m 53s)
  • 16:49 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
  • 16:49 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bookworm
  • 16:48 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 16:48 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 16:45 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 16:44 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 16:44 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
  • 16:43 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 16:43 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 16:41 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 16:41 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 16:41 ottomata: beginning deployments of eventgate clusters: mesh and cert chart updates, as well as sleep timeout values for graceful envoy+eventgate container termination - T349823 T300033 T346638
  • 16:33 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1014.eqiad.wmnet
  • 16:29 btullis@cumin1001: END (FAIL) - Cookbook sre.wikireplicas.add-wiki (exit_code=99)
  • 16:29 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 16:28 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host backup2011.codfw.wmnet with OS bookworm
  • 16:26 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs1014.eqiad.wmnet
  • 16:10 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 16:10 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 16:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1016.eqiad.wmnet with OS bookworm
  • 16:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2016.codfw.wmnet with OS bookworm
  • 16:02 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit1001.wikimedia.org with OS bookworm
  • 15:54 marostegui@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on pc1016.eqiad.wmnet with reason: host reimage
  • 15:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc2016.codfw.wmnet with reason: host reimage
  • 15:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1016.eqiad.wmnet with reason: host reimage
  • 15:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2016.codfw.wmnet with reason: host reimage
  • 15:46 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit1001.wikimedia.org with reason: host reimage
  • 15:45 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on pc2015.codfw.wmnet,pc1015.eqiad.wmnet with reason: Upgrade
  • 15:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on pc2015.codfw.wmnet,pc1015.eqiad.wmnet with reason: Upgrade
  • 15:44 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:43 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit1001.wikimedia.org with reason: host reimage
  • 15:37 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc1016.eqiad.wmnet with OS bookworm
  • 15:32 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc2016.codfw.wmnet with OS bookworm
  • 15:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2015.codfw.wmnet with OS bookworm
  • 15:30 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudrabbit1001.wikimedia.org with OS bookworm
  • 15:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1015.eqiad.wmnet with OS bookworm
  • 15:29 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2195.codfw.wmnet with OS bookworm
  • 15:25 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-eqsin and A:cp
  • 15:22 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
  • 15:21 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bookworm
  • 15:19 bking@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:18 bking@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc2015.codfw.wmnet with reason: host reimage
  • 15:17 bking@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:17 bking@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:15 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2195.codfw.wmnet with reason: host reimage
  • 15:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2015.codfw.wmnet with reason: host reimage
  • 15:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1015.eqiad.wmnet with reason: host reimage
  • 15:12 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2195.codfw.wmnet with reason: host reimage
  • 15:10 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
  • 15:10 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bookworm
  • 15:09 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/termbox: apply
  • 15:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1015.eqiad.wmnet with reason: host reimage
  • 15:09 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/termbox: apply
  • 15:08 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/termbox: apply
  • 15:08 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/termbox: apply
  • 15:05 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: rpkivalidator
  • 15:04 sukhe: finished upgrading all doh* hosts to dnsdist 1.8.2-1+wmf12u2 12
  • 15:01 urbanecm@deploy2002: Finished scap: Backport for Add autopatrol to Wikifunctions Staff group (T350028) (duration: 08m 41s)
  • 15:01 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
  • 15:00 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet1005.eqiad.wmnet with OS bookworm
  • 14:57 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc1015.eqiad.wmnet with OS bookworm
  • 14:56 urbanecm@deploy2002: urbanecm and mdsshakil: Continuing with sync
  • 14:56 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: rpkivalidator
  • 14:56 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc2015.codfw.wmnet with OS bookworm
  • 14:56 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: idp
  • 14:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc[2015-2016].codfw.wmnet,pc[1015-1016].eqiad.wmnet with reason: Upgrade
  • 14:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on pc[2015-2016].codfw.wmnet,pc[1015-1016].eqiad.wmnet with reason: Upgrade
  • 14:55 marostegui@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1:00:00 on 32 hosts with reason: Primary switchover s8 T349053
  • 14:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: Primary switchover s8 T349053
  • 14:54 urbanecm@deploy2002: urbanecm and mdsshakil: Backport for Add autopatrol to Wikifunctions Staff group (T350028) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:53 arnaudb@cumin1001: START - Cookbook sre.hosts.reimage for host db2195.codfw.wmnet with OS bookworm
  • 14:53 urbanecm@deploy2002: Started scap: Backport for Add autopatrol to Wikifunctions Staff group (T350028)
  • 14:49 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: idp
  • 14:49 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: etcd::v3::aux_k8s_etcd
  • 14:49 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/termbox: apply
  • 14:48 urbanecm@deploy2002: Finished scap: Backport for Generalize Meta/Commons exceptions for CentralAuth cookie handling (T257852), Restore OOUI dialog styles for compatibility (T350544) (duration: 13m 13s)
  • 14:47 urbanecm: mwmaint2002: kill persistRevisionThreadItems.php maintenance script for s7 (T315510)
  • 14:46 jayme@deploy2002: helmfile [staging] START helmfile.d/services/termbox: apply
  • 14:42 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage
  • 14:42 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: etcd::v3::aux_k8s_etcd
  • 14:40 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: aux_k8s::master
  • 14:37 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage
  • 14:36 urbanecm@deploy2002: urbanecm and tgr and matmarex: Backport for Generalize Meta/Commons exceptions for CentralAuth cookie handling (T257852), Restore OOUI dialog styles for compatibility (T350544) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:35 urbanecm@deploy2002: Started scap: Backport for Generalize Meta/Commons exceptions for CentralAuth cookie handling (T257852), Restore OOUI dialog styles for compatibility (T350544)
  • 14:34 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: aux_k8s::master
  • 14:34 urbanecm@deploy2002: Finished scap: Backport for Don't remove current wiki family from $wgCentralAuthAutoLoginWikis, Clean up $wgCentralAuthAutoLoginWikis configuration (duration: 11m 34s)
  • 14:33 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: aux_k8s::worker
  • 14:29 urbanecm@deploy2002: matmarex and urbanecm: Continuing with sync
  • 14:27 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 14:27 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: aux_k8s::worker
  • 14:27 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 14:27 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 14:26 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 14:26 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-eqsin and A:cp
  • 14:25 vgutierrez: rolling upgrade of HAProxy to version 2.6.15-1~bpo11+1 in eqsin
  • 14:25 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1005.eqiad.wmnet with OS bookworm
  • 14:24 urbanecm@deploy2002: matmarex and urbanecm: Backport for Don't remove current wiki family from $wgCentralAuthAutoLoginWikis, Clean up $wgCentralAuthAutoLoginWikis configuration synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:22 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 14:22 urbanecm@deploy2002: Started scap: Backport for Don't remove current wiki family from $wgCentralAuthAutoLoginWikis, Clean up $wgCentralAuthAutoLoginWikis configuration
  • 14:22 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 14:22 urbanecm@deploy2002: Finished scap: Backport for CheckUser: Set 'debug' log level (T345591) (duration: 14m 20s)
  • 14:21 arnaudb@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2195.codfw.wmnet with OS bookworm
  • 14:20 stevemunene@cumin1001: END (ERROR) - Cookbook sre.druid.roll-restart-workers (exit_code=97) for Druid public cluster: Roll restart of Druid jvm daemons.
  • 14:13 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2195.codfw.wmnet with reason: host reimage
  • 14:13 volans@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 14:09 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 14:08 urbanecm@deploy2002: Started scap: Backport for CheckUser: Set 'debug' log level (T345591)
  • 13:57 stevemunene@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons.
  • 13:57 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: cluster::management
  • 13:53 arnaudb@cumin1001: START - Cookbook sre.hosts.reimage for host db2195.codfw.wmnet with OS bookworm
  • 13:49 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: cluster::management
  • 13:45 XioNoX: asw2-c-eqiad> request system power-off member 8 - T349798
  • 13:40 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2194.codfw.wmnet with OS bookworm
  • 13:30 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 13:29 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 13:29 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 13:29 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 13:28 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 13:28 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 13:28 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 13:27 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 13:27 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 13:27 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 13:26 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 13:26 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 13:26 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2194.codfw.wmnet with reason: host reimage
  • 13:23 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2194.codfw.wmnet with reason: host reimage
  • 13:21 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 13:21 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 13:21 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 13:20 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 13:10 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1136.eqiad.wmnet onto db1236.eqiad.wmnet
  • 13:05 arnaudb@cumin1001: START - Cookbook sre.hosts.reimage for host db2194.codfw.wmnet with OS bookworm
  • 12:52 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: idm
  • 12:48 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 12:48 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 12:44 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: idm
  • 12:42 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: idm_test
  • 12:36 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: idm_test
  • 12:20 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup2010.codfw.wmnet with OS bookworm
  • 12:14 moritzm: installing jetty9 security updates
  • 12:02 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup2010.codfw.wmnet with reason: host reimage
  • 11:59 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2010.codfw.wmnet with reason: host reimage
  • 11:40 moritzm: installing openssl bugfix updates on Bullseye (update to 1.1.1w)
  • 11:38 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2193.codfw.wmnet with OS bookworm
  • 11:23 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2193.codfw.wmnet with reason: host reimage
  • 11:20 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2193.codfw.wmnet with reason: host reimage
  • 11:17 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host backup2010.codfw.wmnet with OS bookworm
  • 11:17 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-mariadb[1001-1002].eqiad.wmnet with reason: Commissioning new database servers
  • 11:16 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-mariadb[1001-1002].eqiad.wmnet with reason: Commissioning new database servers
  • 11:02 arnaudb@cumin1001: START - Cookbook sre.hosts.reimage for host db2193.codfw.wmnet with OS bookworm
  • 11:01 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2192.codfw.wmnet with OS bookworm
  • 10:50 hashar: Restarting Jenkins
  • 10:46 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2192.codfw.wmnet with reason: host reimage
  • 10:43 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2192.codfw.wmnet with reason: host reimage
  • 10:25 arnaudb@cumin1001: START - Cookbook sre.hosts.reimage for host db2192.codfw.wmnet with OS bookworm
  • 10:24 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2191.codfw.wmnet with OS bookworm
  • 10:09 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2191.codfw.wmnet with reason: host reimage
  • 10:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1136 in db1236 for T344036', diff saved to https://phabricator.wikimedia.org/P53140 and previous config saved to /var/cache/conftool/dbconfig/20231106-100625-arnaudb.json
  • 10:06 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2191.codfw.wmnet with reason: host reimage
  • 10:05 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1136.eqiad.wmnet onto db1236.eqiad.wmnet
  • 10:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1136 in db1236 for T344036', diff saved to https://phabricator.wikimedia.org/P53139 and previous config saved to /var/cache/conftool/dbconfig/20231106-100213-arnaudb.json
  • 09:57 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: provisionning db1236.eqiad.wmnet - T344036
  • 09:56 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: provisionning db1236.eqiad.wmnet - T344036
  • 09:56 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: provisionning db1236.eqiad.wmnet - T344036
  • 09:56 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: provisionning db1236.eqiad.wmnet - T344036
  • 09:48 arnaudb@cumin1001: START - Cookbook sre.hosts.reimage for host db2191.codfw.wmnet with OS bookworm
  • 09:39 zabe@deploy2002: Finished scap: update interwiki cache (duration: 06m 21s)
  • 09:38 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2190.codfw.wmnet with OS bookworm
  • 09:35 moritzm: installing Tomcat security updates
  • 09:33 zabe@deploy2002: Started scap: update interwiki cache
  • 09:31 zabe@deploy2002: Finished scap: T350320 (duration: 06m 28s)
  • 09:26 zabe@deploy2002: zabe: Continuing with sync
  • 09:25 zabe@deploy2002: zabe: T350320 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:24 zabe@deploy2002: Started scap: T350320
  • 09:24 moritzm: installing openjdk-11 security updates
  • 09:24 zabe: Toba Batak Wikipedia # T350320
  • 09:22 zabe@deploy2002: Finished scap: T350218 (duration: 07m 04s)
  • 09:21 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2190.codfw.wmnet with reason: host reimage
  • 09:18 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2190.codfw.wmnet with reason: host reimage
  • 09:17 zabe@deploy2002: zabe: Continuing with sync
  • 09:16 zabe@deploy2002: zabe: T350218 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:15 zabe@deploy2002: Started scap: T350218
  • 09:15 zabe: create Dagaare Wikipedia # T350218
  • 09:13 zabe@deploy2002: Finished scap: T350216 (duration: 07m 15s)
  • 09:10 moritzm: importing openjdk-8 8u392-ga-1~deb11u1 for bullseye-wikimedia to apt.wikimedia.org (latest Java 8 security fixes)
  • 09:08 zabe@deploy2002: zabe: Continuing with sync
  • 09:07 zabe@deploy2002: zabe: T350216 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:06 zabe@deploy2002: Started scap: T350216
  • 09:06 zabe: create Moroccan Amazigh Wikipedia # T350216
  • 09:04 zabe@deploy2002: Finished scap: T350217 (duration: 07m 47s)
  • 09:00 arnaudb@cumin1001: START - Cookbook sre.hosts.reimage for host db2190.codfw.wmnet with OS bookworm
  • 08:58 zabe@deploy2002: zabe: Continuing with sync
  • 08:57 zabe@deploy2002: zabe: T350217 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:56 zabe@deploy2002: Started scap: T350217
  • 08:56 zabe: create Banjar Wikiquote # T350217
  • 08:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2189.codfw.wmnet with OS bookworm
  • 08:36 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2189.codfw.wmnet with reason: host reimage
  • 08:33 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2189.codfw.wmnet with reason: host reimage
  • 08:31 godog: add +80G to prometheus/ops in eqiad
  • 08:25 urbanecm@deploy2002: Finished scap: Backport for Structured mentor list: Make "no mentees" a proper weight (T347157 T347024) (duration: 23m 37s)
  • 08:15 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 08:15 arnaudb@cumin1001: START - Cookbook sre.hosts.reimage for host db2189.codfw.wmnet with OS bookworm
  • 08:14 urbanecm@deploy2002: urbanecm: Backport for Structured mentor list: Make "no mentees" a proper weight (T347157 T347024) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:01 urbanecm@deploy2002: Started scap: Backport for Structured mentor list: Make "no mentees" a proper weight (T347157 T347024)

2023-11-03

  • 19:15 cstone: payments-wiki upgraded from cf9f8e52 to 1d66a20f
  • 18:08 jynus@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup2011.codfw.wmnet with OS bookworm
  • 17:18 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 29 days, 4:00:00 on cp4052.ulsfo.wmnet with reason: testing instance
  • 17:17 brett@cumin2002: START - Cookbook sre.hosts.downtime for 29 days, 4:00:00 on cp4052.ulsfo.wmnet with reason: testing instance
  • 16:46 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host backup2011.codfw.wmnet with OS bookworm
  • 16:17 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host backup2010.codfw.wmnet with OS bookworm
  • 16:16 jynus@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup2010.codfw.wmnet with OS bookworm
  • 15:57 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2188.codfw.wmnet with OS bookworm
  • 15:42 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2188.codfw.wmnet with reason: host reimage
  • 15:39 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2188.codfw.wmnet with reason: host reimage
  • 15:36 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host backup2010.codfw.wmnet with OS bookworm
  • 15:20 arnaudb@cumin1001: START - Cookbook sre.hosts.reimage for host db2188.codfw.wmnet with OS bookworm
  • 15:20 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on db2188.codfw.wmnet with reason: reimage via T343674
  • 15:20 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on db2188.codfw.wmnet with reason: reimage via T343674
  • 15:09 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4052.ulsfo.wmnet with OS bookworm
  • 15:08 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
  • 15:05 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
  • 14:50 topranks: moving cr1-codfw <-> ssw1-a1-codfw EBGP session to private1-b-codfw IPs T347191
  • 14:40 topranks: adding irb interface in private1-a-codfw vlan to ssw1-a1-codfw T347191
  • 14:27 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
  • 14:02 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudelastic1005.wikimedia.org
  • 13:50 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudelastic1005.wikimedia.org
  • 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: setup in progress
  • 13:10 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: setup in progress
  • 13:00 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1010.eqiad.wmnet with OS bookworm
  • 12:54 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet1006.eqiad.wmnet
  • 12:48 fnegri@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1006.eqiad.wmnet
  • 12:45 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1010.eqiad.wmnet with reason: host reimage
  • 12:42 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1010.eqiad.wmnet with reason: host reimage
  • 12:17 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bookworm
  • 12:17 jynus@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host backup1010.eqiad.wmnet with OS bookworm
  • 11:54 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet1006.eqiad.wmnet with OS bookworm
  • 11:49 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bookworm
  • 11:49 jynus@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1010.eqiad.wmnet with OS bookworm
  • 11:33 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1010.eqiad.wmnet with reason: host reimage
  • 11:30 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1010.eqiad.wmnet with reason: host reimage
  • 11:24 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
  • 11:21 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
  • 11:13 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bookworm
  • 11:08 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bookworm
  • 11:07 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1011.eqiad.wmnet with OS bookworm
  • 10:53 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1011.eqiad.wmnet with reason: host reimage
  • 10:50 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1011.eqiad.wmnet with reason: host reimage
  • 10:34 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host backup1011.eqiad.wmnet with OS bookworm
  • 10:08 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe
  • 09:59 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe
  • 09:59 Emperor: roll-restart swift frontends
  • 04:01 eileen: civicrm upgraded from 84ec2957 to 5be02f1b
  • 01:23 thcipriani@deploy2002: Finished scap: Backport for Disable namespaceDupes.php for now (T350443) (duration: 10m 29s)
  • 01:18 thcipriani@deploy2002: thcipriani: Continuing with sync
  • 01:14 thcipriani@deploy2002: thcipriani: Backport for Disable namespaceDupes.php for now (T350443) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 01:13 thcipriani@deploy2002: Started scap: Backport for Disable namespaceDupes.php for now (T350443)

2023-11-02

  • 22:31 Amir1: killed update collation on s5
  • 22:13 brett: import acme-chief 0.36-2 into bookworm-wikimedia repo
  • 21:22 inflatador: bking@cumin2002 enabling elastic snapshots on eqiad clusters T348686
  • 20:32 mabualruz@deploy2002: Finished scap: Backport for Enable native math rendering mode on testwiki (T311620) (duration: 14m 06s)
  • 20:27 mabualruz@deploy2002: mabualruz and physikerwelt: Continuing with sync
  • 20:20 mabualruz@deploy2002: mabualruz and physikerwelt: Backport for Enable native math rendering mode on testwiki (T311620) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:18 mabualruz@deploy2002: Started scap: Backport for Enable native math rendering mode on testwiki (T311620)
  • 20:04 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:04 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS entries for codfw CR IPs moved to new interfaces. - cmooney@cumin1001"
  • 20:01 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS entries for codfw CR IPs moved to new interfaces. - cmooney@cumin1001"
  • 19:59 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 18:52 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host doh1001.wikimedia.org with OS bookworm
  • 18:46 topranks: shutting down uplink from asw-b-codfw et-2/0/51 to cr1-codfw in advance of cable move (T347191)
  • 18:44 topranks: Making cr2-codfw VRRP Master for row B traffic over new link from ssw1-a8-codfw (T347191)
  • 18:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh1001.wikimedia.org with reason: host reimage
  • 18:32 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh1001.wikimedia.org with reason: host reimage
  • 18:22 dduvall@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.42.0-wmf.3 refs T348356
  • 18:22 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host doh1001.wikimedia.org with OS bookworm
  • 18:21 topranks: Shutting asw-b-codfw uplink to cr2-codfw down in advance of cable move (T347191)
  • 18:09 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:09 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:07 topranks: Making cr1-codfw VRRP Master for row A traffic again on ssw1-a1-codfw interface (T347191)
  • 17:50 topranks: Shutting asw-a-codfw uplink to cr1-codfw down in advance of cable move (T347191)
  • 17:45 topranks: Moving row A outbound traffic from direct CR link to routing via Spinie (T347191)
  • 17:45 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol1005.eqiad.wmnet with OS bookworm
  • 17:42 vgutierrez: repool cp4051 and cp5030
  • 17:40 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:40 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:23 vgutierrez: depool cp5030
  • 17:19 vgutierrez: restart haproxy on cp4051
  • 17:14 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 17:14 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1005.eqiad.wmnet with reason: host reimage
  • 17:13 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 17:13 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 17:12 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 17:11 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1005.eqiad.wmnet with reason: host reimage
  • 17:11 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 17:10 bd808@deploy2002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 17:06 topranks: shutting down uplink from asw-a-codfw et-7/0/52 to cr2-codfw et-1/0/0 (T347191)
  • 17:05 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 13 hosts with reason: Move row A/B CR uplinks to SPINE switches
  • 17:05 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 13 hosts with reason: Move row A/B CR uplinks to SPINE switches
  • 17:02 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:01 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:01 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:00 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:00 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 16:59 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 16:57 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1005.eqiad.wmnet with OS bookworm
  • 16:40 vgutierrez: depool cp4051
  • 16:35 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 16:35 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 16:31 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 16:30 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 16:30 ottomata: eventgate-analytics-external: setting service-runner num_workers: 0 to run with one process and reduce # of threads used by container processes. Should reduce throttling and perhaps help with latency. If works, will make this the default in the chart. - T347477
  • 16:30 ottomata: eventgate-analytics in codfw: setting service-runner num_workers: 0 to run with one process and reduce # of threads used by container processes. Should reduce throttling and perhaps help with latency. If works, will make this the default in the chart. - T347477
  • 16:29 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 16:29 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 16:26 fabfur: haproxy: this change https://gerrit.wikimedia.org/r/c/operations/puppet/+/971228 will be propagated soon to all cp-ulsfo hosts (T348851)
  • 16:07 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 16:06 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 15:57 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 15:57 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 15:51 ottomata: eventgate-analytics in eqiad: setting service-runner num_workers: 0 to run with one process and reduce # of threads used by container processes. Should reduce throttling and perhaps help with latency. If works, will make this the default in the chart. - T347477
  • 15:50 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 15:50 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 15:48 sukhe: sudo cumin 'O:prometheus' 'run-puppet-agent'
  • 15:45 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough and A:wikidough
  • 15:40 fabfur: cp4037 repooling with changes for dedicated healthcheck backend (haproxy): https://gerrit.wikimedia.org/r/c/operations/puppet/+/966221/ (T348851)
  • 15:34 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 15:34 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 15:27 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 15:26 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 15:17 fabfur: cp4037 depooled to be used as canary for https://gerrit.wikimedia.org/r/c/operations/puppet/+/966221/
  • 15:02 sukhe@cumin2002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough and A:wikidough
  • 14:56 herron: logstash1025 systemctl restart apache2.service T350402
  • 14:51 sukhe: force agent run on A:wikidough
  • 14:45 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: netbox::standalone
  • 14:35 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: netbox::standalone
  • 14:32 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: installserver
  • 14:32 hashar: Restarting CI Jenkins again for plugins removal
  • 14:15 hashar: Restarting CI Jenkins for plugins adjustements
  • 13:50 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: installserver
  • 13:43 jayme@deploy2002: Finished scap: upgrading ICU67 (duration: 15m 10s)
  • 13:42 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host install6002.wikimedia.org
  • 13:34 sukhe: restart pybal on lvs1020
  • 13:29 jbond@cumin1001: START - Cookbook sre.puppet.migrate-host for host install6002.wikimedia.org
  • 13:27 jayme@deploy2002: Started scap: upgrading ICU67
  • 13:27 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: netinsights
  • 13:14 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: netinsights
  • 12:59 moritzm: upgrading deployment servers to ICU67 T345561
  • 12:46 jayme: running fleet wide php upgrades - T345561
  • 12:46 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.migrate-role (exit_code=99) for role: ganeti
  • 12:43 daniel@deploy2002: Finished scap: Backport for ParsoidHandler: emit relative URLs in redirects (T350219 T349001) (duration: 21m 37s)
  • 12:38 moritzm: upgrading snapshot* to ICU67 T345561
  • 12:37 daniel@deploy2002: daniel: Continuing with sync
  • 12:36 daniel@deploy2002: daniel: Backport for ParsoidHandler: emit relative URLs in redirects (T350219 T349001) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:31 moritzm: upgrading cloudweb to ICU67 T345561
  • 12:21 daniel@deploy2002: Started scap: Backport for ParsoidHandler: emit relative URLs in redirects (T350219 T349001)
  • 12:20 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol1006.eqiad.wmnet with OS bookworm
  • 12:04 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: ganeti
  • 11:58 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host netflow6001.drmrs.wmnet
  • 11:54 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 11:53 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 11:53 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 11:53 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 11:51 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 11:51 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 11:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-ulsfo and not P{cp4037.ulsfo.wmnet} and A:cp
  • 11:49 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 11:49 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 11:49 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1006.eqiad.wmnet with reason: host reimage
  • 11:48 jbond@cumin1001: START - Cookbook sre.puppet.migrate-host for host netflow6001.drmrs.wmnet
  • 11:46 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1006.eqiad.wmnet with reason: host reimage
  • 11:45 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:45 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 11:33 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1006.eqiad.wmnet with OS bookworm
  • 11:19 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
  • 11:19 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
  • 11:18 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
  • 11:18 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/page-analytics: apply
  • 11:16 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
  • 11:16 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/page-analytics: apply
  • 11:15 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-ulsfo and not P{cp4037.ulsfo.wmnet} and A:cp
  • 11:12 fnegri@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol1006.eqiad.wmnet with OS bookworm
  • 11:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp4037.ulsfo.wmnet} and A:cp
  • 11:10 vgutierrez: rolling upgrade of HAProxy to version 2.6.15-1~bpo11+1 in ulsfo
  • 11:09 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp4037.ulsfo.wmnet} and A:cp
  • 11:00 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 34 hosts with reason: testing new bgp policy
  • 11:00 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 34 hosts with reason: testing new bgp policy
  • 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ganeti2014.codfw.wmnet
  • 10:26 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1006.eqiad.wmnet with OS bookworm
  • 10:23 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host ganeti2014.codfw.wmnet
  • 09:32 moritzm: installing openssl bugfix updates from Bullseye point release (update to 1.1.1w)
  • 09:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on ldap-rw[1001,2001].wikimedia.org with reason: setup in progress
  • 09:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on ldap-rw[1001,2001].wikimedia.org with reason: setup in progress
  • 09:13 jayme: published image php7.4-fpm-multiversion-base:7.4.33-6 now based on icu67 php packages - T345561
  • 09:06 zabe@deploy2002: Finished scap: Backport for Update Netskope IP ranges (T350199) (duration: 07m 25s)
  • 09:05 moritzm: installing krb5 security updates on buster/bullseye/bookworm
  • 09:04 moritzm: installing krb5 security updates on bullseye
  • 09:01 zabe@deploy2002: zabe: Continuing with sync
  • 09:00 zabe@deploy2002: zabe: Backport for Update Netskope IP ranges (T350199) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:59 zabe@deploy2002: Started scap: Backport for Update Netskope IP ranges (T350199)
  • 08:57 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on dbproxy1017.eqiad.wmnet with reason: decomissionning via T348956
  • 08:57 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on dbproxy1017.eqiad.wmnet with reason: decomissionning via T348956
  • 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cloudcontrol2006-dev.codfw.wmnet
  • 08:48 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cloudcontrol2006-dev.codfw.wmnet
  • 08:12 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
  • 08:12 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
  • 07:05 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts1001.eqiad.wmnet
  • 07:01 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host vrts1001.eqiad.wmnet
  • 03:13 eileen: civicrm upgraded from 86b620ef to 84ec2957
  • 03:03 eileen: civicrm upgraded from bcfd8a7e to 86b620ef
  • 02:27 eileen: civicrm upgraded from 60bdd8d3 to bcfd8a7e
  • 02:18 eileen: civicrm upgraded from 770b114c to 60bdd8d3

2023-11-01

  • 22:39 urbanecm@deploy2002: Finished scap: Backport for Revert "Add খসড়া as draft namespace alias on bnwiki" and add "খসড়া" by copy-paste from wiki page (duration: 06m 44s)
  • 22:32 urbanecm@deploy2002: Started scap: Backport for Revert "Add খসড়া as draft namespace alias on bnwiki" and add "খসড়া" by copy-paste from wiki page
  • 22:18 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0)
  • 22:18 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 22:18 bking@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 22:16 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 22:15 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 22:15 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 22:14 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 22:13 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 22:06 urbanecm@deploy2002: Finished scap: Backport for Add খসড়া as draft namespace alias on bnwiki (duration: 09m 34s)
  • 22:00 urbanecm@deploy2002: mdsshakil and urbanecm: Continuing with sync
  • 21:57 urbanecm@deploy2002: mdsshakil and urbanecm: Backport for Add খসড়া as draft namespace alias on bnwiki synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:56 urbanecm@deploy2002: Started scap: Backport for Add খসড়া as draft namespace alias on bnwiki
  • 21:14 topranks: adjust BGP policy out to L3 switches on remaining CRs T344547
  • 20:51 urbanecm: mwmaint2002: mwscript namespaceDupes.php bnwiki --fix --add-prefix BROKEN
  • 20:49 topranks: configure esams switches to load-share default across CRs T344547
  • 20:23 cjming: end of UTC late backport window
  • 20:23 topranks: adjusting routes announced to L3 switches in esams T344547
  • 20:21 cjming@deploy2002: Finished scap: Backport for Create Draft namespace on bnwiki (T350133) (duration: 13m 38s)
  • 20:15 cjming@deploy2002: mdsshakil and cjming: Continuing with sync
  • 20:08 cjming@deploy2002: mdsshakil and cjming: Backport for Create Draft namespace on bnwiki (T350133) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:07 cjming@deploy2002: Started scap: Backport for Create Draft namespace on bnwiki (T350133)
  • 19:44 topranks: adjusting routes announced to L3 switches in codfw T344547
  • 18:17 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS bullseye
  • 18:17 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1115.eqiad.wmnet with OS bullseye
  • 18:16 sukhe: upgrade doh4001 to dnsdist 1.8.2-1+wmf12u2
  • 18:15 dduvall@deploy2002: Synchronized php: group1 wikis to 1.42.0-wmf.3 refs T348356 (duration: 05m 39s)
  • 18:14 sukhe: reprepro -C component/dnsdist include bookworm-wikimedia dnsdist_1.8.2-1+wmf12u2_amd64.changes
  • 18:10 dduvall@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.42.0-wmf.3 refs T348356
  • 18:02 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS bullseye
  • 17:50 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1114.eqiad.wmnet with OS bullseye
  • 17:41 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt-wdqs1002.eqiad.wmnet
  • 17:41 taavi@cumin1001: START - Cookbook sre.hosts.remove-downtime for cloudvirt-wdqs1002.eqiad.wmnet
  • 17:32 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1114.eqiad.wmnet with reason: host reimage
  • 17:28 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1114.eqiad.wmnet with reason: host reimage
  • 17:27 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt-wdqs1002.eqiad.wmnet
  • 17:21 taavi@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1002.eqiad.wmnet
  • 17:21 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on cloudvirt-wdqs1002.eqiad.wmnet with reason: still setting up
  • 17:21 taavi@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on cloudvirt-wdqs1002.eqiad.wmnet with reason: still setting up
  • 17:20 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt-wdqs1002.eqiad.wmnet with OS bookworm
  • 17:20 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmooney@cumin1001"
  • 17:19 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmooney@cumin1001"
  • 17:14 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 17:14 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 17:12 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1114.eqiad.wmnet with OS bullseye
  • 17:12 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1114.eqiad.wmnet with OS bullseye
  • 17:05 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 17:04 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 17:02 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 17:02 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 17:02 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1114.eqiad.wmnet with OS bullseye
  • 17:01 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cp1114.eqiad.wmnet with OS bullseye
  • 16:59 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt-wdqs1002.eqiad.wmnet with reason: host reimage
  • 16:58 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 16:57 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 16:56 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt-wdqs1002.eqiad.wmnet with reason: host reimage
  • 16:54 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lsw1-f1-eqiad.mgmt,ssw1-e1-eqiad.mgmt with reason: replacing optics to troubleshoot errors on core switch link
  • 16:54 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lsw1-f1-eqiad.mgmt,ssw1-e1-eqiad.mgmt with reason: replacing optics to troubleshoot errors on core switch link
  • 16:53 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "cloudvirt-wdqs1002 - taavi@cumin1001"
  • 16:53 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 16:52 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "cloudvirt-wdqs1002 - taavi@cumin1001"
  • 16:51 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1114.eqiad.wmnet with OS bullseye
  • 16:40 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:40 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:36 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1113.eqiad.wmnet with OS bullseye
  • 16:30 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 16:28 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 16:28 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:28 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:25 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 16:25 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 16:22 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:21 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:18 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 16:18 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1113.eqiad.wmnet with reason: host reimage
  • 16:18 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 16:15 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1113.eqiad.wmnet with reason: host reimage
  • 16:14 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1008.eqiad.wmnet
  • 16:04 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: moving switch link from NIC port 2 to port 1
  • 16:04 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: moving switch link from NIC port 2 to port 1
  • 16:03 stevemunene@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1008.eqiad.wmnet
  • 15:59 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1113.eqiad.wmnet with OS bullseye
  • 15:59 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cp1113.eqiad.wmnet with OS bullseye
  • 15:57 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 15:57 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 15:56 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 15:36 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1113.eqiad.wmnet with OS bullseye
  • 15:27 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1113.eqiad.wmnet with OS bullseye
  • 15:26 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1113.eqiad.wmnet with OS bookworm
  • 15:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1113.eqiad.wmnet with OS bookworm
  • 15:23 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1002.eqiad.wmnet with OS bookworm
  • 15:12 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1002.eqiad.wmnet with OS bookworm
  • 15:05 urbanecm: mwmaint2002: mwscript userOptions.php --wiki=WIKI --nowarn --old='oldimpact' --new='control' 'growthexperiments-homepage-variant' # end A/B testing of new Impact (T336203; wikis=arwiki bnwiki elwiki eswiki fawiki frwiki frwiktionary idwiki plwiki rowiki trwiki viwiki)
  • 15:02 urbanecm@deploy2002: Finished scap: Backport for Growth: Disable new impact A/B testing on pilot wikis (T336203) (duration: 09m 44s)
  • 15:00 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 14:59 urbanecm: mwmaint2002: mwscript userOptions.php --wiki=cswiki --nowarn --old='oldimpact' --new='control' 'growthexperiments-homepage-variant' # end A/B testing of new Impact (T336203)
  • 14:57 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 14:57 urbanecm@deploy2002: urbanecm: Backport for Growth: Disable new impact A/B testing on pilot wikis (T336203) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:55 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 14:52 urbanecm@deploy2002: Started scap: Backport for Growth: Disable new impact A/B testing on pilot wikis (T336203)
  • 14:52 urbanecm@deploy2002: Finished scap: Backport for Growth: Enable new Impact module on all Wikipedias (T336203) (duration: 10m 41s)
  • 14:49 ejegg: fundraising python tools upgraded from 65f101e4 to a4cbbbe7
  • 14:46 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 14:45 urbanecm@deploy2002: urbanecm: Backport for Growth: Enable new Impact module on all Wikipedias (T336203) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:41 urbanecm@deploy2002: Started scap: Backport for Growth: Enable new Impact module on all Wikipedias (T336203)
  • 14:15 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 14:15 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 14:14 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 14:14 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 14:10 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts1001.eqiad.wmnet
  • 14:06 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host vrts1001.eqiad.wmnet
  • 13:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on cp1100.eqiad.wmnet with reason: not pooled, reimaging in progress
  • 13:55 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on cp1100.eqiad.wmnet with reason: not pooled, reimaging in progress
  • 13:52 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on 15 hosts with reason: not pooled, reimaging in progress
  • 13:52 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on 15 hosts with reason: not pooled, reimaging in progress
  • 13:52 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1002.eqiad.wmnet with OS bookworm
  • 13:33 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 13:32 bking@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 13:29 bking@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 13:25 bking@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 13:22 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 13:20 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 13:11 cmooney@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt-wdqs1002']
  • 13:11 cmooney@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt-wdqs1002']
  • 13:09 moritzm: installing libx11 security updates
  • 13:01 moritzm: installing glib2.0 security updates
  • 12:44 cmooney@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudvirt-wdqs1002']
  • 12:43 cmooney@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt-wdqs1002']
  • 12:31 ladsgroup@deploy2002: Finished scap: Backport for Set pagelinks migration in s4 to write both (T345732) (duration: 09m 12s)
  • 12:27 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 12:23 ladsgroup@deploy2002: ladsgroup: Backport for Set pagelinks migration in s4 to write both (T345732) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:22 ladsgroup@deploy2002: Started scap: Backport for Set pagelinks migration in s4 to write both (T345732)
  • 12:05 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: GitLab version upgrade
  • 10:33 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: GitLab version upgrade
  • 10:22 moritzm: installing adduser security updates
  • 10:10 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: GitLab version upgrade
  • 10:03 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: GitLab version upgrade
  • 10:02 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: GitLab version upgrade
  • 09:57 moritzm: installing yajl security updates
  • 09:46 moritzm: installing ncurses security updates
  • 09:28 moritzm: installing RT security updates
  • 09:11 moritzm: installing curl security updates
  • 08:34 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: GitLab version upgrade
  • 06:01 kart_: Updated MinT to 2023-10-31-044726-production (T333969, T349991, T349079, T340507)
  • 05:57 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 05:51 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 05:46 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 05:40 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 05:32 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 05:29 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 00:51 eileen: civicrm upgraded from 31d53b57 to 6ae3d3fc
  • 00:01 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1112.eqiad.wmnet with OS bullseye

Other archives

2000s

2010s

2020s