Jump to content

Server Admin Log/Archive 70

From Wikitech
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

2023-08-31

  • 23:53 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1030.eqiad.wmnet with OS bullseye
  • 23:16 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
  • 23:15 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 23:15 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
  • 23:10 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1030.eqiad.wmnet with OS bullseye
  • 22:17 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host doh5002.wikimedia.org with OS bookworm
  • 21:38 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh5002.wikimedia.org with reason: host reimage
  • 21:35 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh5002.wikimedia.org with reason: host reimage
  • 21:25 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1030.eqiad.wmnet with OS bullseye
  • 21:13 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2037.codfw.wmnet with OS bullseye
  • 21:08 bking@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts flink-zk2003.codfw.wmnet
  • 21:08 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:08 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flink-zk2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin1001"
  • 21:07 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flink-zk2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin1001"
  • 21:06 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 21:04 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 21:01 jhuneidi@deploy1002: Finished scap: Backport for Use metrics from SiteConfig to restore the Parsoid prefix (T339365) (duration: 10m 03s)
  • 21:00 bking@cumin1001: START - Cookbook sre.hosts.decommission for hosts flink-zk2003.codfw.wmnet
  • 20:57 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1030.eqiad.wmnet with OS bullseye
  • 20:55 jhuneidi@deploy1002: arlolra and jhuneidi: Continuing with sync
  • 20:52 jhuneidi@deploy1002: arlolra and jhuneidi: Backport for Use metrics from SiteConfig to restore the Parsoid prefix (T339365) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:51 bking@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts flink-zk2001.codfw.wmnet
  • 20:51 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:51 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flink-zk2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin1001"
  • 20:51 jhuneidi@deploy1002: Started scap: Backport for Use metrics from SiteConfig to restore the Parsoid prefix (T339365)
  • 20:50 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flink-zk2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin1001"
  • 20:47 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 20:46 brett@cumin2002: START - Cookbook sre.hosts.reimage for host doh5002.wikimedia.org with OS bookworm
  • 20:45 brett@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for doh6002.wikimedia.org
  • 20:45 brett@cumin2002: START - Cookbook sre.hosts.remove-downtime for doh6002.wikimedia.org
  • 20:43 bking@cumin1001: START - Cookbook sre.hosts.decommission for hosts flink-zk2001.codfw.wmnet
  • 20:43 jhuneidi@deploy1002: Finished scap: Backport for WatchlistManager: Do not require watchlist rights for clearing talk page notification (T345031) (duration: 07m 01s)
  • 20:37 jhuneidi@deploy1002: jhuneidi and matmarex: Continuing with sync
  • 20:37 jhuneidi@deploy1002: jhuneidi and matmarex: Backport for WatchlistManager: Do not require watchlist rights for clearing talk page notification (T345031) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:36 jhuneidi@deploy1002: Started scap: Backport for WatchlistManager: Do not require watchlist rights for clearing talk page notification (T345031)
  • 20:34 jhuneidi@deploy1002: Finished scap: Backport for Undeploy Research Incentive survey on enwiki (T336092), Pre-deploy Campaigns Event Discovery survey (T345158) (duration: 14m 19s)
  • 20:32 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host doh6002.wikimedia.org with OS bookworm
  • 20:29 jhuneidi@deploy1002: jhuneidi and dani: Continuing with sync
  • 20:28 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['restbase1030.eqiad.wmnet']
  • 20:28 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase1030.eqiad.wmnet']
  • 20:27 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
  • 20:27 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['restbase1030.eqiad.wmnet']
  • 20:27 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase1030.eqiad.wmnet']
  • 20:26 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase1030.eqiad.wmnet']
  • 20:26 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase1030.eqiad.wmnet']
  • 20:25 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase1030.eqiad.wmnet']
  • 20:21 jhuneidi@deploy1002: jhuneidi and dani: Backport for Undeploy Research Incentive survey on enwiki (T336092), Pre-deploy Campaigns Event Discovery survey (T345158) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2038.codfw.wmnet with OS bullseye
  • 20:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2038.codfw.wmnet with OS bullseye
  • 20:20 jhuneidi@deploy1002: Started scap: Backport for Undeploy Research Incentive survey on enwiki (T336092), Pre-deploy Campaigns Event Discovery survey (T345158)
  • 20:18 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase1030.eqiad.wmnet']
  • 20:17 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase1030.eqiad.wmnet']
  • 20:16 inflatador: 'bking@wdqs1004 depool wdqs1004 to test script changes T342361'
  • 20:13 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2039.codfw.wmnet with OS bullseye
  • 20:13 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2039.codfw.wmnet with OS bullseye
  • 20:11 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wdqs1005.eqiad.wmnet
  • 20:11 ryankemper@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:11 ryankemper@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1001"
  • 20:11 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh6002.wikimedia.org with reason: host reimage
  • 20:11 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2039.codfw.wmnet with OS bullseye
  • 20:11 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2039.codfw.wmnet with OS bullseye
  • 20:09 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase1030.eqiad.wmnet']
  • 20:07 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh6002.wikimedia.org with reason: host reimage
  • 20:07 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1030.eqiad.wmnet with OS bullseye
  • 20:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2038.codfw.wmnet with OS bullseye
  • 20:01 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2038.codfw.wmnet with OS bullseye
  • 20:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2037.codfw.wmnet with OS bullseye
  • 20:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2038.codfw.wmnet with OS bullseye
  • 20:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2039.codfw.wmnet with OS bullseye
  • 20:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2038.codfw.wmnet with OS bullseye
  • 19:59 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2039.codfw.wmnet with OS bullseye
  • 19:51 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1030.eqiad.wmnet with OS bullseye
  • 19:48 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 19:45 brett@cumin2002: START - Cookbook sre.hosts.reimage for host doh6002.wikimedia.org with OS bookworm
  • 19:44 ryankemper@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1001"
  • 19:33 ryankemper@cumin1001: START - Cookbook sre.dns.netbox
  • 19:30 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a3-codfw - cmooney@cumin1001"
  • 19:30 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
  • 19:28 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts wdqs1005.eqiad.wmnet
  • 19:14 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 19:14 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a3-codfw.mgmt.codfw.wmnet
  • 19:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 19:03 ryankemper: T344198 on `ryankemper@cumin1001`: `sudo -E cumin 'A:wdqs-all' 'sudo disable-puppet "revoking old cert and generating new one with new alt_names - T344198"'`
  • 19:03 ryankemper: T344198 Temporarily disabling puppet on all `wdqs*` hosts in preparation for `wdqs.discovery.wmnet` certificate revocation
  • 18:56 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 18:46 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
  • 18:46 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
  • 18:44 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 18:44 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
  • 18:44 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 18:44 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
  • 18:12 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.24 refs T343726
  • 17:12 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a2-codfw.mgmt.codfw.wmnet
  • 17:07 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 17:03 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:02 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:02 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:02 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:01 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:01 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 16:42 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host restbase1030.eqiad.wmnet
  • 16:41 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:41 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a2-codfw - cmooney@cumin1001"
  • 16:40 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a2-codfw - cmooney@cumin1001"
  • 16:29 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1030.eqiad.wmnet
  • 16:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T343718)', diff saved to https://phabricator.wikimedia.org/P52236 and previous config saved to /var/cache/conftool/dbconfig/20230831-161736-ladsgroup.json
  • 16:04 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 16:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P52235 and previous config saved to /var/cache/conftool/dbconfig/20230831-160230-ladsgroup.json
  • 16:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2057.codfw.wmnet
  • 15:56 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on cloudservices1006.eqiad.wmnet with reason: service bootstrap
  • 15:55 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on cloudservices1006.eqiad.wmnet with reason: service bootstrap
  • 15:54 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1058.eqiad.wmnet
  • 15:49 hnowlan@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1019*,lvs2013*} and A:lvs (T336380)
  • 15:49 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2057.codfw.wmnet
  • 15:48 hnowlan@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1019*,lvs2013*} and A:lvs (T336380)
  • 15:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P52234 and previous config saved to /var/cache/conftool/dbconfig/20230831-154724-ladsgroup.json
  • 15:46 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2056.codfw.wmnet
  • 15:45 hnowlan@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T336380)
  • 15:44 hnowlan@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T336380)
  • 15:40 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1058.eqiad.wmnet
  • 15:40 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2056.codfw.wmnet
  • 15:39 moritzm: failover ganeti master in ulsfo to ganeti4005
  • 15:36 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1057.eqiad.wmnet
  • 15:35 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2055.codfw.wmnet
  • 15:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4007.ulsfo.wmnet
  • 15:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4007.ulsfo.wmnet
  • 15:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T343718)', diff saved to https://phabricator.wikimedia.org/P52233 and previous config saved to /var/cache/conftool/dbconfig/20230831-153217-ladsgroup.json
  • 15:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T343718)', diff saved to https://phabricator.wikimedia.org/P52232 and previous config saved to /var/cache/conftool/dbconfig/20230831-153005-ladsgroup.json
  • 15:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 15:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T343718)', diff saved to https://phabricator.wikimedia.org/P52231 and previous config saved to /var/cache/conftool/dbconfig/20230831-152943-ladsgroup.json
  • 15:29 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1057.eqiad.wmnet
  • 15:29 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2055.codfw.wmnet
  • 15:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4007.ulsfo.wmnet
  • 15:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52230 and previous config saved to /var/cache/conftool/dbconfig/20230831-152710-root.json
  • 15:24 jynus: extend backup1009 lv by additional 10TiB
  • 15:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4007.ulsfo.wmnet
  • 15:22 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2054.codfw.wmnet
  • 15:21 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1056.eqiad.wmnet
  • 15:15 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 15:14 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1056.eqiad.wmnet
  • 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P52229 and previous config saved to /var/cache/conftool/dbconfig/20230831-151437-ladsgroup.json
  • 15:12 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1010.eqiad.wmnet with OS bullseye
  • 15:12 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - bking@cumin1001"
  • 15:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52228 and previous config saved to /var/cache/conftool/dbconfig/20230831-151205-root.json
  • 15:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4006.ulsfo.wmnet
  • 15:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet
  • 15:11 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1055.eqiad.wmnet
  • 15:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet
  • 15:00 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1055.eqiad.wmnet
  • 14:59 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2054.codfw.wmnet
  • 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P52227 and previous config saved to /var/cache/conftool/dbconfig/20230831-145931-ladsgroup.json
  • 14:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52226 and previous config saved to /var/cache/conftool/dbconfig/20230831-145700-root.json
  • 14:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4006.ulsfo.wmnet
  • 14:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4005.ulsfo.wmnet
  • 14:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4005.ulsfo.wmnet
  • 14:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4005.ulsfo.wmnet
  • 14:46 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudservices1006.eqiad.wmnet with OS bullseye
  • 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T343718)', diff saved to https://phabricator.wikimedia.org/P52225 and previous config saved to /var/cache/conftool/dbconfig/20230831-144425-ladsgroup.json
  • 14:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52224 and previous config saved to /var/cache/conftool/dbconfig/20230831-144155-root.json
  • 14:34 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4005.ulsfo.wmnet
  • 14:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testreduce1002.eqiad.wmnet
  • 14:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testreduce1002.eqiad.wmnet with OS bookworm
  • 14:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 14:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 14:29 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1054.eqiad.wmnet
  • 14:27 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2053.codfw.wmnet
  • 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52223 and previous config saved to /var/cache/conftool/dbconfig/20230831-142651-root.json
  • 14:25 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 14:25 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a2-codfw.mgmt.codfw.wmnet
  • 14:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T343718)', diff saved to https://phabricator.wikimedia.org/P52222 and previous config saved to /var/cache/conftool/dbconfig/20230831-142445-ladsgroup.json
  • 14:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 14:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 14:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T343718)', diff saved to https://phabricator.wikimedia.org/P52221 and previous config saved to /var/cache/conftool/dbconfig/20230831-142424-ladsgroup.json
  • 14:22 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1054.eqiad.wmnet
  • 14:22 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1053.eqiad.wmnet
  • 14:17 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host doh2002.wikimedia.org with OS bookworm
  • 14:16 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on testreduce1002.eqiad.wmnet with reason: host reimage
  • 14:16 sgimeno@deploy1002: Finished scap: Backport for GrowthExperiments: enable AddLink backend for swwiki (T308138 T308139) (duration: 07m 34s)
  • 14:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 14:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 14:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T343718)', diff saved to https://phabricator.wikimedia.org/P52220 and previous config saved to /var/cache/conftool/dbconfig/20230831-141547-ladsgroup.json
  • 14:15 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1053.eqiad.wmnet
  • 14:15 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2053.codfw.wmnet
  • 14:14 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2052.codfw.wmnet
  • 14:13 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1052.eqiad.wmnet
  • 14:11 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testreduce1002.eqiad.wmnet with reason: host reimage
  • 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52219 and previous config saved to /var/cache/conftool/dbconfig/20230831-141146-root.json
  • 14:10 sgimeno@deploy1002: sgimeno: Continuing with sync
  • 14:10 sgimeno@deploy1002: sgimeno: Backport for GrowthExperiments: enable AddLink backend for swwiki (T308138 T308139) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 14:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P52217 and previous config saved to /var/cache/conftool/dbconfig/20230831-140917-ladsgroup.json
  • 14:09 sgimeno@deploy1002: Started scap: Backport for GrowthExperiments: enable AddLink backend for swwiki (T308138 T308139)
  • 14:07 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2052.codfw.wmnet
  • 14:07 sgimeno@deploy1002: Finished scap: Backport for Allow loading Edit-in-Sequence as a beta feature on Wikisources (T308098) (duration: 07m 36s)
  • 14:06 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2051.codfw.wmnet
  • 14:06 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1052.eqiad.wmnet
  • 14:05 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1051.eqiad.wmnet
  • 14:05 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - bking@cumin1001"
  • 14:01 sgimeno@deploy1002: sgimeno and soda: Continuing with sync
  • 14:01 sgimeno@deploy1002: sgimeno and soda: Backport for Allow loading Edit-in-Sequence as a beta feature on Wikisources (T308098) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 14:01 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2051.codfw.wmnet
  • 14:01 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P52216 and previous config saved to /var/cache/conftool/dbconfig/20230831-140041-ladsgroup.json
  • 14:00 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2050.codfw.wmnet
  • 14:00 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host testreduce1002.eqiad.wmnet with OS bookworm
  • 13:59 sgimeno@deploy1002: Started scap: Backport for Allow loading Edit-in-Sequence as a beta feature on Wikisources (T308098)
  • 13:59 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 13:58 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1051.eqiad.wmnet
  • 13:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1050.eqiad.wmnet
  • 13:58 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testreduce1002.eqiad.wmnet - jmm@cumin2002"
  • 13:57 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testreduce1002.eqiad.wmnet - jmm@cumin2002"
  • 13:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testreduce1002.eqiad.wmnet on all recursors
  • 13:57 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache testreduce1002.eqiad.wmnet on all recursors
  • 13:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testreduce1002.eqiad.wmnet - jmm@cumin2002"
  • 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 3%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52215 and previous config saved to /var/cache/conftool/dbconfig/20230831-135641-root.json
  • 13:56 sgimeno@deploy1002: Finished scap: Backport for Allow loading Edit-in-Sequence as a beta feature on Wikisources (T308098) (duration: 09m 33s)
  • 13:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testreduce1002.eqiad.wmnet - jmm@cumin2002"
  • 13:54 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2050.codfw.wmnet
  • 13:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P52214 and previous config saved to /var/cache/conftool/dbconfig/20230831-135411-ladsgroup.json
  • 13:54 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 13:53 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2049.codfw.wmnet
  • 13:53 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 13:53 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testreduce1002.eqiad.wmnet
  • 13:52 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 13:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh2002.wikimedia.org with reason: host reimage
  • 13:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 173
  • 13:49 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 173
  • 13:49 sgimeno@deploy1002: soda and sgimeno: Continuing with sync
  • 13:48 sgimeno@deploy1002: soda and sgimeno: Backport for Allow loading Edit-in-Sequence as a beta feature on Wikisources (T308098) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
  • 13:48 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
  • 13:47 sgimeno@deploy1002: Started scap: Backport for Allow loading Edit-in-Sequence as a beta feature on Wikisources (T308098)
  • 13:47 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh2002.wikimedia.org with reason: host reimage
  • 13:46 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1050.eqiad.wmnet
  • 13:46 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2049.codfw.wmnet
  • 13:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P52213 and previous config saved to /var/cache/conftool/dbconfig/20230831-134535-ladsgroup.json
  • 13:45 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1049.eqiad.wmnet
  • 13:44 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2048.codfw.wmnet
  • 13:42 sgimeno@deploy1002: Finished scap: Backport for Allow loading Edit-in-Sequence as a beta feature on Wikisources (T308098) (duration: 15m 00s)
  • 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52212 and previous config saved to /var/cache/conftool/dbconfig/20230831-134136-root.json
  • 13:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T343718)', diff saved to https://phabricator.wikimedia.org/P52211 and previous config saved to /var/cache/conftool/dbconfig/20230831-133905-ladsgroup.json
  • 13:38 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1049.eqiad.wmnet
  • 13:38 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2048.codfw.wmnet
  • 13:36 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1048.eqiad.wmnet
  • 13:36 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wdqs1010.eqiad.wmnet with reason: host reimage
  • 13:36 jbond: swap puppetdb-api and puppetdb-api-next gerrit:940384
  • 13:35 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.tls (exit_code=97) for network device ssw1-f1-eqiad
  • 13:35 sgimeno@deploy1002: sgimeno and soda: Continuing with sync
  • 13:35 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 13:35 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad
  • 13:34 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 13:33 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 13:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad
  • 13:32 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 13:32 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host doh2002.wikimedia.org with OS bookworm
  • 13:31 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1010.eqiad.wmnet with reason: host reimage
  • 13:31 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad
  • 13:30 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 13:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T343718)', diff saved to https://phabricator.wikimedia.org/P52210 and previous config saved to /var/cache/conftool/dbconfig/20230831-133029-ladsgroup.json
  • 13:30 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2047.codfw.wmnet
  • 13:29 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 13:29 sgimeno@deploy1002: sgimeno and soda: Backport for Allow loading Edit-in-Sequence as a beta feature on Wikisources (T308098) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1224 (T343718)', diff saved to https://phabricator.wikimedia.org/P52209 and previous config saved to /var/cache/conftool/dbconfig/20230831-132820-ladsgroup.json
  • 13:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 13:28 marostegui@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1132.eqiad.wmnet onto db1119.eqiad.wmnet
  • 13:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T343718)', diff saved to https://phabricator.wikimedia.org/P52208 and previous config saved to /var/cache/conftool/dbconfig/20230831-132759-ladsgroup.json
  • 13:27 sgimeno@deploy1002: Started scap: Backport for Allow loading Edit-in-Sequence as a beta feature on Wikisources (T308098)
  • 13:25 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a1-codfw.mgmt.codfw.wmnet
  • 13:25 sgimeno@deploy1002: Finished scap: Backport for Remove rc1.mediawiki.page_content_change stream (T307959) (duration: 10m 33s)
  • 13:24 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1048.eqiad.wmnet
  • 13:24 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2047.codfw.wmnet
  • 13:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2046.codfw.wmnet
  • 13:23 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1047.eqiad.wmnet
  • 13:20 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1010.eqiad.wmnet with OS bullseye
  • 13:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T343718)', diff saved to https://phabricator.wikimedia.org/P52207 and previous config saved to /var/cache/conftool/dbconfig/20230831-132009-ladsgroup.json
  • 13:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 13:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T343718)', diff saved to https://phabricator.wikimedia.org/P52206 and previous config saved to /var/cache/conftool/dbconfig/20230831-131947-ladsgroup.json
  • 13:17 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1047.eqiad.wmnet
  • 13:17 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2046.codfw.wmnet
  • 13:17 sgimeno@deploy1002: gmodena and sgimeno: Continuing with sync
  • 13:16 sgimeno@deploy1002: gmodena and sgimeno: Backport for Remove rc1.mediawiki.page_content_change stream (T307959) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:15 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 13:14 sgimeno@deploy1002: Started scap: Backport for Remove rc1.mediawiki.page_content_change stream (T307959)
  • 13:14 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 13:13 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 13:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P52205 and previous config saved to /var/cache/conftool/dbconfig/20230831-131252-ladsgroup.json
  • 13:12 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 13:09 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:08 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:08 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1046.eqiad.wmnet
  • 13:06 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 13:05 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P52204 and previous config saved to /var/cache/conftool/dbconfig/20230831-130441-ladsgroup.json
  • 13:03 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2045.codfw.wmnet
  • 13:02 aqu: Deployed refinery using scap, then deployed onto hdfs
  • 13:00 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:59 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P52203 and previous config saved to /var/cache/conftool/dbconfig/20230831-125746-ladsgroup.json
  • 12:57 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 12:55 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 12:55 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1046.eqiad.wmnet
  • 12:54 lucaswerkmeister-wmde: Deployed security patch for T345064
  • 12:54 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:54 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a1-codfw - cmooney@cumin1001"
  • 12:54 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1045.eqiad.wmnet
  • 12:53 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a1-codfw - cmooney@cumin1001"
  • 12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P52202 and previous config saved to /var/cache/conftool/dbconfig/20230831-124934-ladsgroup.json
  • 12:49 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:49 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a1-codfw.mgmt.codfw.wmnet
  • 12:49 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw
  • 12:49 cmooney@cumin1001: START - Cookbook sre.network.tls for network device ssw1-a1-codfw
  • 12:47 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2045.codfw.wmnet
  • 12:47 lucaswerkmeister-wmde: Deployed security patch for T345064
  • 12:47 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2044.codfw.wmnet
  • 12:47 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1045.eqiad.wmnet
  • 12:46 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1044.eqiad.wmnet
  • 12:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T343718)', diff saved to https://phabricator.wikimedia.org/P52201 and previous config saved to /var/cache/conftool/dbconfig/20230831-124240-ladsgroup.json
  • 12:42 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 12:39 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1044.eqiad.wmnet
  • 12:39 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2044.codfw.wmnet
  • 12:36 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw
  • 12:35 cmooney@cumin1001: START - Cookbook sre.network.tls for network device ssw1-a8-codfw
  • 12:35 aqu@deploy1002: Finished deploy [analytics/refinery@06203c0] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@06203c0] (duration: 03m 07s)
  • 12:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T343718)', diff saved to https://phabricator.wikimedia.org/P52200 and previous config saved to /var/cache/conftool/dbconfig/20230831-123428-ladsgroup.json
  • 12:32 aqu@deploy1002: Started deploy [analytics/refinery@06203c0] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@06203c0]
  • 12:32 aqu@deploy1002: Finished deploy [analytics/refinery@06203c0] (thin): Regular analytics weekly train THIN [analytics/refinery@06203c0] (duration: 00m 04s)
  • 12:32 aqu@deploy1002: Started deploy [analytics/refinery@06203c0] (thin): Regular analytics weekly train THIN [analytics/refinery@06203c0]
  • 12:31 ladsgroup@deploy1002: Finished scap: Backport for ores-extension: enable lift wing for fiwiki and itwiki (T343308) (duration: 27m 05s)
  • 12:28 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:27 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 12:27 jayme: restarting pybal on lvs1019 - T325178
  • 12:27 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 12:26 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device ssw1-a8-codfw.mgmt.codfw.wmnet
  • 12:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3316 (T343718)', diff saved to https://phabricator.wikimedia.org/P52199 and previous config saved to /var/cache/conftool/dbconfig/20230831-122502-ladsgroup.json
  • 12:25 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:25 jayme: restarting pybal on lvs1020 - T325178
  • 12:25 cmooney@cumin1001: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 12:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 12:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 12:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T343718)', diff saved to https://phabricator.wikimedia.org/P52198 and previous config saved to /var/cache/conftool/dbconfig/20230831-122441-ladsgroup.json
  • 12:23 ladsgroup@deploy1002: isaranto and ladsgroup: Continuing with sync
  • 12:23 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 12:23 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 12:21 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:21 cmooney@cumin1001: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 12:20 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 12:20 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=93) for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 12:20 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 12:19 cmooney@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 12:18 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=93) for device lsw1-b6-codfw.mgmt.codfw.wmnet
  • 12:18 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device lsw1-b3-codfw.mgmt.codfw.wmnet
  • 12:17 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T343718)', diff saved to https://phabricator.wikimedia.org/P52197 and previous config saved to /var/cache/conftool/dbconfig/20230831-121721-ladsgroup.json
  • 12:17 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-b3-codfw.mgmt.codfw.wmnet
  • 12:17 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 12:17 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:17 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-b4-codfw.mgmt.codfw.wmnet
  • 12:17 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 12:17 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-b6-codfw.mgmt.codfw.wmnet
  • 12:17 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-b5-codfw.mgmt.codfw.wmnet
  • 12:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 12:17 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device lsw1-a6-codfw.mgmt.codfw.wmnet
  • 12:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 12:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 12:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T343718)', diff saved to https://phabricator.wikimedia.org/P52196 and previous config saved to /var/cache/conftool/dbconfig/20230831-121654-ladsgroup.json
  • 12:16 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a6-codfw.mgmt.codfw.wmnet
  • 12:16 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:16 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a5-codfw.mgmt.codfw.wmnet
  • 12:16 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:16 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a4-codfw.mgmt.codfw.wmnet
  • 12:16 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device lsw1-a1-codfw.mgmt.codfw.wmnet
  • 12:16 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a1-codfw.mgmt.codfw.wmnet
  • 12:15 aqu@deploy1002: Finished deploy [analytics/refinery@06203c0]: Regular analytics weekly train [analytics/refinery@06203c0] (duration: 12m 15s)
  • 12:15 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:15 cmooney@cumin1001: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 12:11 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device lsw1-b2-codfw.mgmt.codfw.wmnet
  • 12:11 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:11 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-b2-codfw.mgmt.codfw.wmnet
  • 12:10 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a8-codfw.mgmt.codfw.wmnet
  • 12:10 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device lsw1-a5-codfw.mgmt.codfw.wmnet
  • 12:10 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device lsw1-a6-codfw.mgmt.codfw.wmnet
  • 12:10 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device lsw1-a7-codfw.mgmt.codfw.wmnet
  • 12:10 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a6-codfw.mgmt.codfw.wmnet
  • 12:10 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a7-codfw.mgmt.codfw.wmnet
  • 12:10 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device lsw1-a4-codfw.mgmt.codfw.wmnet
  • 12:10 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device lsw1-a3-codfw.mgmt.codfw.wmnet
  • 12:10 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a5-codfw.mgmt.codfw.wmnet
  • 12:10 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device lsw1-a2-codfw.mgmt.codfw.wmnet
  • 12:09 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a4-codfw.mgmt.codfw.wmnet
  • 12:09 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a3-codfw.mgmt.codfw.wmnet
  • 12:09 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a2-codfw.mgmt.codfw.wmnet
  • 12:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P52195 and previous config saved to /var/cache/conftool/dbconfig/20230831-120935-ladsgroup.json
  • 12:09 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device lsw1-a1-codfw.mgmt.codfw.wmnet
  • 12:09 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a1-codfw.mgmt.codfw.wmnet
  • 12:05 ladsgroup@deploy1002: isaranto and ladsgroup: Backport for ores-extension: enable lift wing for fiwiki and itwiki (T343308) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 12:03 ladsgroup@deploy1002: Started scap: Backport for ores-extension: enable lift wing for fiwiki and itwiki (T343308)
  • 12:03 aqu@deploy1002: Started deploy [analytics/refinery@06203c0]: Regular analytics weekly train [analytics/refinery@06203c0]
  • 12:02 aqu: About to deploy analytics refinery (weekly train)
  • 12:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P52194 and previous config saved to /var/cache/conftool/dbconfig/20230831-120148-ladsgroup.json
  • 11:59 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host kubemaster1002.eqiad.wmnet
  • 11:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P52193 and previous config saved to /var/cache/conftool/dbconfig/20230831-115429-ladsgroup.json
  • 11:48 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:47 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster1002.eqiad.wmnet
  • 11:46 marostegui@cumin1001: START - Cookbook sre.mysql.clone of db1132.eqiad.wmnet onto db1119.eqiad.wmnet
  • 11:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P52192 and previous config saved to /var/cache/conftool/dbconfig/20230831-114642-ladsgroup.json
  • 11:46 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 11:46 cmooney@cumin1001: START - Cookbook sre.network.provision for device ssw1-a8-codfw.mgmt.codfw.wmnet
  • 11:44 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host kubemaster1001.eqiad.wmnet
  • 11:40 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 11:40 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 11:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T343718)', diff saved to https://phabricator.wikimedia.org/P52191 and previous config saved to /var/cache/conftool/dbconfig/20230831-113922-ladsgroup.json
  • 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T343718)', diff saved to https://phabricator.wikimedia.org/P52190 and previous config saved to /var/cache/conftool/dbconfig/20230831-113613-ladsgroup.json
  • 11:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 11:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T343718)', diff saved to https://phabricator.wikimedia.org/P52189 and previous config saved to /var/cache/conftool/dbconfig/20230831-113603-ladsgroup.json
  • 11:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P52187 and previous config saved to /var/cache/conftool/dbconfig/20230831-113324-root.json
  • 11:32 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster1001.eqiad.wmnet
  • 11:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T343718)', diff saved to https://phabricator.wikimedia.org/P52186 and previous config saved to /var/cache/conftool/dbconfig/20230831-113136-ladsgroup.json
  • 11:31 jayme@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-eqiad
  • 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P52185 and previous config saved to /var/cache/conftool/dbconfig/20230831-112057-ladsgroup.json
  • 11:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2151 (T343718)', diff saved to https://phabricator.wikimedia.org/P52184 and previous config saved to /var/cache/conftool/dbconfig/20230831-111353-ladsgroup.json
  • 11:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 11:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 11:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T343718)', diff saved to https://phabricator.wikimedia.org/P52183 and previous config saved to /var/cache/conftool/dbconfig/20230831-111332-ladsgroup.json
  • 11:08 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 11:07 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes1025.eqiad.wmnet
  • 11:06 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubernetes1025.eqiad.wmnet
  • 11:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P52182 and previous config saved to /var/cache/conftool/dbconfig/20230831-110551-ladsgroup.json
  • 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P52181 and previous config saved to /var/cache/conftool/dbconfig/20230831-105826-ladsgroup.json
  • 10:54 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1006.eqiad.wmnet
  • 10:54 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 10:54 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 10:53 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:53 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 10:52 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 10:51 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
  • 10:50 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 10:50 cmooney@cumin1001: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 10:50 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
  • 10:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T343718)', diff saved to https://phabricator.wikimedia.org/P52180 and previous config saved to /var/cache/conftool/dbconfig/20230831-105044-ladsgroup.json
  • 10:50 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
  • 10:50 moritzm: installing flask security updates on buster
  • 10:49 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
  • 10:48 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
  • 10:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T343718)', diff saved to https://phabricator.wikimedia.org/P52179 and previous config saved to /var/cache/conftool/dbconfig/20230831-104836-ladsgroup.json
  • 10:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 10:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 10:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T343718)', diff saved to https://phabricator.wikimedia.org/P52178 and previous config saved to /var/cache/conftool/dbconfig/20230831-104815-ladsgroup.json
  • 10:48 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
  • 10:47 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@90f280e]: (no justification provided) (duration: 00m 09s)
  • 10:47 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@90f280e]: (no justification provided)
  • 10:47 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 10:46 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1006.eqiad.wmnet
  • 10:46 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 10:43 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
  • 10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P52177 and previous config saved to /var/cache/conftool/dbconfig/20230831-104319-ladsgroup.json
  • 10:43 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
  • 10:42 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
  • 10:41 moritzm: installing cjose security updates
  • 10:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1017.eqiad.wmnet with reason: Maintenance
  • 10:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb1017.eqiad.wmnet with reason: Maintenance
  • 10:38 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
  • 10:37 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/media-analytics: apply
  • 10:34 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
  • 10:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P52176 and previous config saved to /var/cache/conftool/dbconfig/20230831-103308-ladsgroup.json
  • 10:27 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host htmldumper1001.eqiad.wmnet
  • 10:25 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka main-eqiad cluster: Reboot kafka nodes
  • 10:24 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
  • 10:23 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 10:23 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:23 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 10:23 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/media-analytics: apply
  • 10:22 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 10:21 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host htmldumper1001.eqiad.wmnet
  • 10:20 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
  • 10:20 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
  • 10:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P52174 and previous config saved to /var/cache/conftool/dbconfig/20230831-101802-ladsgroup.json
  • 10:17 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 10:17 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
  • 10:16 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cr1-drmrs
  • 10:16 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
  • 10:15 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1005.eqiad.wmnet
  • 10:11 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
  • 10:10 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
  • 10:08 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1005.eqiad.wmnet
  • 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2129 (T343718)', diff saved to https://phabricator.wikimedia.org/P52173 and previous config saved to /var/cache/conftool/dbconfig/20230831-100811-ladsgroup.json
  • 10:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 10:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 10:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T343718)', diff saved to https://phabricator.wikimedia.org/P52172 and previous config saved to /var/cache/conftool/dbconfig/20230831-100750-ladsgroup.json
  • 10:07 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1004.eqiad.wmnet
  • 10:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T343718)', diff saved to https://phabricator.wikimedia.org/P52171 and previous config saved to /var/cache/conftool/dbconfig/20230831-100256-ladsgroup.json
  • 10:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr1-codfw
  • 10:00 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1004.eqiad.wmnet
  • 10:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T343718)', diff saved to https://phabricator.wikimedia.org/P52170 and previous config saved to /var/cache/conftool/dbconfig/20230831-100047-ladsgroup.json
  • 10:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 10:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 10:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T343718)', diff saved to https://phabricator.wikimedia.org/P52169 and previous config saved to /var/cache/conftool/dbconfig/20230831-100026-ladsgroup.json
  • 09:59 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1002.eqiad.wmnet
  • 09:57 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cr1-codfw
  • 09:52 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: sync
  • 09:52 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: sync
  • 09:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P52168 and previous config saved to /var/cache/conftool/dbconfig/20230831-095244-ladsgroup.json
  • 09:51 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 09:51 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 09:51 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1002.eqiad.wmnet
  • 09:51 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1001.eqiad.wmnet
  • 09:50 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: sync
  • 09:50 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: sync
  • 09:50 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 09:50 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 09:49 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 09:49 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 09:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P52167 and previous config saved to /var/cache/conftool/dbconfig/20230831-094520-ladsgroup.json
  • 09:45 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host snapshot1017.eqiad.wmnet
  • 09:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P52166 and previous config saved to /var/cache/conftool/dbconfig/20230831-093738-ladsgroup.json
  • 09:37 ladsgroup@deploy1002: Finished scap: Backport for ores-extension: fix arwiki likelybad threshold (T345305) (duration: 68m 57s)
  • 09:35 moritzm: imported cas 6.6.11+wmf11u1 to apt.wikimedia.org
  • 09:33 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1017.eqiad.wmnet
  • 09:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P52165 and previous config saved to /var/cache/conftool/dbconfig/20230831-093013-ladsgroup.json
  • 09:24 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host snapshot1016.eqiad.wmnet
  • 09:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T343718)', diff saved to https://phabricator.wikimedia.org/P52164 and previous config saved to /var/cache/conftool/dbconfig/20230831-092231-ladsgroup.json
  • 09:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr3-eqsin
  • 09:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T343718)', diff saved to https://phabricator.wikimedia.org/P52163 and previous config saved to /var/cache/conftool/dbconfig/20230831-091507-ladsgroup.json
  • 09:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T343718)', diff saved to https://phabricator.wikimedia.org/P52162 and previous config saved to /var/cache/conftool/dbconfig/20230831-091258-ladsgroup.json
  • 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 09:12 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1016.eqiad.wmnet
  • 09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 09:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T343718)', diff saved to https://phabricator.wikimedia.org/P52161 and previous config saved to /var/cache/conftool/dbconfig/20230831-091237-ladsgroup.json
  • 09:11 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cr3-eqsin
  • 09:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-esams
  • 09:06 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cr2-esams
  • 09:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin
  • 09:03 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host snapshot1015.eqiad.wmnet
  • 09:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T343718)', diff saved to https://phabricator.wikimedia.org/P52160 and previous config saved to /var/cache/conftool/dbconfig/20230831-090244-ladsgroup.json
  • 09:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 09:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 09:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T343718)', diff saved to https://phabricator.wikimedia.org/P52159 and previous config saved to /var/cache/conftool/dbconfig/20230831-090223-ladsgroup.json
  • 09:01 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cr2-eqsin
  • 09:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqord
  • 08:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P52158 and previous config saved to /var/cache/conftool/dbconfig/20230831-085731-ladsgroup.json
  • 08:56 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cr2-eqord
  • 08:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqiad
  • 08:56 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 08:56 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 08:52 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1015.eqiad.wmnet
  • 08:52 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cr2-eqiad
  • 08:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqdfw
  • 08:51 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 08:50 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host snapshot1014.eqiad.wmnet
  • 08:47 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cr2-eqdfw
  • 08:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-drmrs
  • 08:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P52157 and previous config saved to /var/cache/conftool/dbconfig/20230831-084717-ladsgroup.json
  • 08:43 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cr2-drmrs
  • 08:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-codfw
  • 08:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P52156 and previous config saved to /var/cache/conftool/dbconfig/20230831-084224-ladsgroup.json
  • 08:41 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:41 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 08:40 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 08:39 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1014.eqiad.wmnet
  • 08:38 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 08:38 cmooney@cumin1001: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 08:38 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cr2-codfw
  • 08:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr1-esams
  • 08:36 ladsgroup@deploy1002: ladsgroup and isaranto: Continuing with sync
  • 08:36 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1013.eqiad.wmnet
  • 08:36 ladsgroup@deploy1002: ladsgroup and isaranto: Backport for ores-extension: fix arwiki likelybad threshold (T345305) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 08:33 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cr1-esams
  • 08:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr1-eqiad
  • 08:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P52155 and previous config saved to /var/cache/conftool/dbconfig/20230831-083211-ladsgroup.json
  • 08:30 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1013.eqiad.wmnet
  • 08:30 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1012.eqiad.wmnet
  • 08:28 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cr1-eqiad
  • 08:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr4-ulsfo
  • 08:28 ladsgroup@deploy1002: Started scap: Backport for ores-extension: fix arwiki likelybad threshold (T345305)
  • 08:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T343718)', diff saved to https://phabricator.wikimedia.org/P52154 and previous config saved to /var/cache/conftool/dbconfig/20230831-082717-ladsgroup.json
  • 08:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T343718)', diff saved to https://phabricator.wikimedia.org/P52153 and previous config saved to /var/cache/conftool/dbconfig/20230831-082508-ladsgroup.json
  • 08:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 08:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 08:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T343718)', diff saved to https://phabricator.wikimedia.org/P52152 and previous config saved to /var/cache/conftool/dbconfig/20230831-082440-ladsgroup.json
  • 08:24 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cr4-ulsfo
  • 08:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr3-ulsfo
  • 08:23 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1012.eqiad.wmnet
  • 08:23 elukey@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka main-eqiad cluster: Reboot kafka nodes
  • 08:21 vgutierrez: set send_timeout to 3620s in the upload cluster via cumin to avoid a varnish restart https://gerrit.wikimedia.org/r/c/operations/puppet/+/953678 - T341755
  • 08:19 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1011.eqiad.wmnet
  • 08:19 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cr3-ulsfo
  • 08:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T343718)', diff saved to https://phabricator.wikimedia.org/P52151 and previous config saved to /var/cache/conftool/dbconfig/20230831-081705-ladsgroup.json
  • 08:15 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1011.eqiad.wmnet
  • 08:15 ladsgroup@deploy1002: Finished scap: Backport for Disable user creation on wikitech (T345226) (duration: 10m 06s)
  • 08:14 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1010.eqiad.wmnet
  • 08:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P52150 and previous config saved to /var/cache/conftool/dbconfig/20230831-080934-ladsgroup.json
  • 08:07 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1010.eqiad.wmnet
  • 08:06 ladsgroup@deploy1002: ladsgroup and andrew: Continuing with sync
  • 08:06 ladsgroup@deploy1002: ladsgroup and andrew: Backport for Disable user creation on wikitech (T345226) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 08:04 ladsgroup@deploy1002: Started scap: Backport for Disable user creation on wikitech (T345226)
  • 08:03 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 08:03 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host snapshot1009.eqiad.wmnet
  • 07:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T343718)', diff saved to https://phabricator.wikimedia.org/P52149 and previous config saved to /var/cache/conftool/dbconfig/20230831-075709-ladsgroup.json
  • 07:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 07:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 07:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P52148 and previous config saved to /var/cache/conftool/dbconfig/20230831-075428-ladsgroup.json
  • 07:52 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1009.eqiad.wmnet
  • 07:51 jayme@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
  • 07:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T343718)', diff saved to https://phabricator.wikimedia.org/P52147 and previous config saved to /var/cache/conftool/dbconfig/20230831-073921-ladsgroup.json
  • 07:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 07:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 07:37 apergos: UTC morning backport and config window done
  • 07:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T343718)', diff saved to https://phabricator.wikimedia.org/P52146 and previous config saved to /var/cache/conftool/dbconfig/20230831-073713-ladsgroup.json
  • 07:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 07:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P52145 and previous config saved to /var/cache/conftool/dbconfig/20230831-073115-root.json
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P52144 and previous config saved to /var/cache/conftool/dbconfig/20230831-072848-root.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P52143 and previous config saved to /var/cache/conftool/dbconfig/20230831-071610-root.json
  • 07:15 kartik@deploy1002: Finished scap: Backport for Enable MinT translation service for testwiki (duration: 10m 18s)
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P52142 and previous config saved to /var/cache/conftool/dbconfig/20230831-071343-root.json
  • 07:09 kartik@deploy1002: abi and kartik: Continuing with sync
  • 07:07 kartik@deploy1002: abi and kartik: Backport for Enable MinT translation service for testwiki synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:05 kartik@deploy1002: Started scap: Backport for Enable MinT translation service for testwiki
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P52141 and previous config saved to /var/cache/conftool/dbconfig/20230831-070105-root.json
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P52140 and previous config saved to /var/cache/conftool/dbconfig/20230831-065838-root.json
  • 06:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp1002.wikimedia.org
  • 06:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp1002.wikimedia.org
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P52139 and previous config saved to /var/cache/conftool/dbconfig/20230831-064601-root.json
  • 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P52138 and previous config saved to /var/cache/conftool/dbconfig/20230831-064333-root.json
  • 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P52137 and previous config saved to /var/cache/conftool/dbconfig/20230831-063056-root.json
  • 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P52136 and previous config saved to /var/cache/conftool/dbconfig/20230831-062829-root.json
  • 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P52135 and previous config saved to /var/cache/conftool/dbconfig/20230831-061551-root.json
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P52134 and previous config saved to /var/cache/conftool/dbconfig/20230831-061324-root.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 3%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P52133 and previous config saved to /var/cache/conftool/dbconfig/20230831-060047-root.json
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 3%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P52132 and previous config saved to /var/cache/conftool/dbconfig/20230831-055819-root.json
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 100%: Repooling after maintenance ', diff saved to https://phabricator.wikimedia.org/P52131 and previous config saved to /var/cache/conftool/dbconfig/20230831-054805-root.json
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 1%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P52130 and previous config saved to /var/cache/conftool/dbconfig/20230831-054542-root.json
  • 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 1%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P52129 and previous config saved to /var/cache/conftool/dbconfig/20230831-054314-root.json
  • 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1182 T344309', diff saved to https://phabricator.wikimedia.org/P52128 and previous config saved to /var/cache/conftool/dbconfig/20230831-054305-root.json
  • 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 75%: Repooling after maintenance ', diff saved to https://phabricator.wikimedia.org/P52127 and previous config saved to /var/cache/conftool/dbconfig/20230831-053300-root.json
  • 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131 T345223', diff saved to https://phabricator.wikimedia.org/P52126 and previous config saved to /var/cache/conftool/dbconfig/20230831-053035-root.json
  • 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1173 to s6 primary and set section read-write T345223', diff saved to https://phabricator.wikimedia.org/P52125 and previous config saved to /var/cache/conftool/dbconfig/20230831-052852-marostegui.json
  • 05:28 marostegui: Starting s6 eqiad failover from db1131 to db1173 - T345223
  • 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 50%: Repooling after maintenance ', diff saved to https://phabricator.wikimedia.org/P52123 and previous config saved to /var/cache/conftool/dbconfig/20230831-051755-root.json
  • 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 25%: Repooling after maintenance ', diff saved to https://phabricator.wikimedia.org/P52122 and previous config saved to /var/cache/conftool/dbconfig/20230831-050250-root.json
  • 04:57 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1173 with weight 0 T345223', diff saved to https://phabricator.wikimedia.org/P52121 and previous config saved to /var/cache/conftool/dbconfig/20230831-045719-marostegui.json
  • 04:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s6 T345223
  • 04:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Primary switchover s6 T345223
  • 04:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 10%: Repooling after maintenance ', diff saved to https://phabricator.wikimedia.org/P52120 and previous config saved to /var/cache/conftool/dbconfig/20230831-044746-root.json
  • 02:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2037.codfw.wmnet with reason: host reimage
  • 02:26 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2037.codfw.wmnet with reason: host reimage
  • 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2037.codfw.wmnet with OS bullseye
  • 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2038.codfw.wmnet with OS bullseye
  • 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2039.codfw.wmnet with OS bullseye
  • 01:44 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-be2003.codfw.wmnet with OS bullseye
  • 01:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bullseye
  • 01:43 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-be2003.codfw.wmnet with OS bullseye
  • 01:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bullseye
  • 01:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['moss-be2003']
  • 01:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['moss-be2003']
  • 00:54 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bullseye
  • 00:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['moss-be2003']
  • 00:43 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['moss-be2003']

2023-08-30

  • 22:28 krinkle@deploy1002: Synchronized php-1.41.0-wmf.24/extensions/WikimediaEvents/: 697ab03 (duration: 06m 26s)
  • 22:09 krinkle@deploy1002: Finished scap: Backport for mediawiki.util: Investigate when mw.util is compromised by third-party script (T343944) (duration: 35m 08s)
  • 21:57 krinkle@deploy1002: krinkle: Continuing with sync
  • 21:55 krinkle@deploy1002: krinkle: Backport for mediawiki.util: Investigate when mw.util is compromised by third-party script (T343944) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 21:34 krinkle@deploy1002: Started scap: Backport for mediawiki.util: Investigate when mw.util is compromised by third-party script (T343944)
  • 21:31 Krinkle: krinkle@deploy1002: running `sudo /usr/local/sbin/fix-staging-perms` two fix permissions under /srv/patches/1.41.0-wmf.24 where 2 of the 3 patch files are read-only by jnuche:deployment
  • 20:44 hmonroy@deploy1002: Finished scap: Backport for Add comment about mirroring of wgMobileUrlTemplate (T344185) (duration: 07m 11s)
  • 20:37 hmonroy@deploy1002: Started scap: Backport for Add comment about mirroring of wgMobileUrlTemplate (T344185)
  • 20:33 hmonroy@deploy1002: Finished scap: Backport for Omit 'target' in the body of review REST API requests (duration: 08m 18s)
  • 20:27 hmonroy@deploy1002: matmarex and hmonroy: Continuing with sync
  • 20:26 hmonroy@deploy1002: matmarex and hmonroy: Backport for Omit 'target' in the body of review REST API requests synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:24 hmonroy@deploy1002: Started scap: Backport for Omit 'target' in the body of review REST API requests
  • 20:14 hmonroy@deploy1002: Finished scap: Backport for wikidiff2: set maxSplitSize = 10 by default (T341754) (duration: 09m 13s)
  • 20:08 hmonroy@deploy1002: hmonroy: Continuing with sync
  • 20:06 hmonroy@deploy1002: hmonroy: Backport for wikidiff2: set maxSplitSize = 10 by default (T341754) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:05 hmonroy@deploy1002: Started scap: Backport for wikidiff2: set maxSplitSize = 10 by default (T341754)
  • 19:19 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: apply security updates - bking@cumin1001 - T344587
  • 19:17 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts wdqs1005.eqiad.wmnet
  • 18:41 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 18:41 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:41 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 18:39 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 18:37 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 18:28 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts wdqs1005.eqiad.wmnet
  • 18:19 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host doh2001.wikimedia.org with OS bookworm
  • 18:15 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.24 refs T343726 (duration: 06m 13s)
  • 18:08 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.24 refs T343726
  • 18:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh2001.wikimedia.org with reason: host reimage
  • 18:03 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: apply security updates - bking@cumin1001 - T344587
  • 18:03 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster search_eqiad: apply security updates - bking@cumin1001 - T344587
  • 18:03 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster search_eqiad: apply security updates - bking@cumin1001 - T344587
  • 18:01 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh2001.wikimedia.org with reason: host reimage
  • 17:46 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host doh2001.wikimedia.org with OS bookworm
  • 16:36 ladsgroup@deploy1002: Finished scap: Backport for ores-extension: replace thresholds with numeric values (T343308) (duration: 10m 09s)
  • 16:30 ladsgroup@deploy1002: ladsgroup and isaranto: Continuing with sync
  • 16:28 ladsgroup@deploy1002: ladsgroup and isaranto: Backport for ores-extension: replace thresholds with numeric values (T343308) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 16:26 ladsgroup@deploy1002: Started scap: Backport for ores-extension: replace thresholds with numeric values (T343308)
  • 16:19 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: apply security updates - bking@cumin1001 - T344587
  • 16:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 16:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 15:56 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 15:56 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 15:56 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 15:56 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 15:54 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster2001.codfw.wmnet
  • 15:50 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster2001.codfw.wmnet
  • 15:49 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host doh4001.wikimedia.org with OS bookworm
  • 15:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 15:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 15:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T343718)', diff saved to https://phabricator.wikimedia.org/P52113 and previous config saved to /var/cache/conftool/dbconfig/20230830-154437-ladsgroup.json
  • 15:44 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host kubemaster2002.codfw.wmnet
  • 15:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1201', diff saved to https://phabricator.wikimedia.org/P52112 and previous config saved to /var/cache/conftool/dbconfig/20230830-153915-root.json
  • 15:32 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster2002.codfw.wmnet
  • 15:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh4001.wikimedia.org with reason: host reimage
  • 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P52111 and previous config saved to /var/cache/conftool/dbconfig/20230830-152931-ladsgroup.json
  • 15:27 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh4001.wikimedia.org with reason: host reimage
  • 15:24 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka main-codfw cluster: Reboot kafka nodes
  • 15:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-b13-drmrs
  • 15:24 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw1-b13-drmrs
  • 15:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-b12-drmrs
  • 15:23 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw1-b12-drmrs
  • 15:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-bw27-esams
  • 15:23 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw1-bw27-esams
  • 15:19 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:17 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 15:17 cmooney@cumin1001: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P52110 and previous config saved to /var/cache/conftool/dbconfig/20230830-151424-ladsgroup.json
  • 15:12 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-by27-esams
  • 15:11 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw1-by27-esams
  • 15:08 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host doh4001.wikimedia.org with OS bookworm
  • 15:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1218 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P52109 and previous config saved to /var/cache/conftool/dbconfig/20230830-150709-ladsgroup.json
  • 15:00 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T343718)', diff saved to https://phabricator.wikimedia.org/P52108 and previous config saved to /var/cache/conftool/dbconfig/20230830-145918-ladsgroup.json
  • 14:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "POP switches - ayounsi@cumin1001"
  • 14:55 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "POP switches - ayounsi@cumin1001"
  • 14:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T343718)', diff saved to https://phabricator.wikimedia.org/P52107 and previous config saved to /var/cache/conftool/dbconfig/20230830-145457-ladsgroup.json
  • 14:52 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1218 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P52106 and previous config saved to /var/cache/conftool/dbconfig/20230830-145205-ladsgroup.json
  • 14:51 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 14:51 cmooney@cumin1001: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 14:49 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: apply security updates - bking@cumin1001 - T344587
  • 14:49 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster search_codfw: apply security updates - bking@cumin1001 - T344587
  • 14:48 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster search_codfw: apply security updates - bking@cumin1001 - T344587
  • 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1024.eqiad.wmnet
  • 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1024.eqiad.wmnet
  • 14:41 fab@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 14:40 jynus: disable bacula backup1002, backup2002 jobs
  • 14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P52105 and previous config saved to /var/cache/conftool/dbconfig/20230830-143950-ladsgroup.json
  • 14:39 fab@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 14:39 fab@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 14:37 Amir1: dbmaint on s4@codfw (T207253)
  • 14:37 fab@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1218 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P52104 and previous config saved to /var/cache/conftool/dbconfig/20230830-143700-ladsgroup.json
  • 14:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1024.eqiad.wmnet
  • 14:34 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 14:33 fab@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 14:30 fab@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1222 (T343718)', diff saved to https://phabricator.wikimedia.org/P52103 and previous config saved to /var/cache/conftool/dbconfig/20230830-142737-ladsgroup.json
  • 14:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 14:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T343718)', diff saved to https://phabricator.wikimedia.org/P52102 and previous config saved to /var/cache/conftool/dbconfig/20230830-142716-ladsgroup.json
  • 14:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P52101 and previous config saved to /var/cache/conftool/dbconfig/20230830-142444-ladsgroup.json
  • 14:24 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host doh6001.wikimedia.org with OS bookworm
  • 14:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1024.eqiad.wmnet
  • 14:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1218 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P52100 and previous config saved to /var/cache/conftool/dbconfig/20230830-142155-ladsgroup.json
  • 14:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P52099 and previous config saved to /var/cache/conftool/dbconfig/20230830-141210-ladsgroup.json
  • 14:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T343718)', diff saved to https://phabricator.wikimedia.org/P52098 and previous config saved to /var/cache/conftool/dbconfig/20230830-140938-ladsgroup.json
  • 14:02 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh6001.wikimedia.org with reason: host reimage
  • 13:57 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh6001.wikimedia.org with reason: host reimage
  • 13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P52097 and previous config saved to /var/cache/conftool/dbconfig/20230830-135704-ladsgroup.json
  • 13:49 topranks: disabling DHCP snooping on mr1-codfw to test ztp operation
  • 13:47 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply security updates - bking@cumin1001 - T344587
  • 13:46 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 13:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2039']
  • 13:45 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2039']
  • 13:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes2039']
  • 13:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2038']
  • 13:44 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 13:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T343718)', diff saved to https://phabricator.wikimedia.org/P52096 and previous config saved to /var/cache/conftool/dbconfig/20230830-134232-ladsgroup.json
  • 13:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 13:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 13:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T343718)', diff saved to https://phabricator.wikimedia.org/P52095 and previous config saved to /var/cache/conftool/dbconfig/20230830-134209-ladsgroup.json
  • 13:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T343718)', diff saved to https://phabricator.wikimedia.org/P52094 and previous config saved to /var/cache/conftool/dbconfig/20230830-134157-ladsgroup.json
  • 13:41 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 13:41 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 13:40 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 13:39 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 13:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1197 (T343718)', diff saved to https://phabricator.wikimedia.org/P52093 and previous config saved to /var/cache/conftool/dbconfig/20230830-133745-ladsgroup.json
  • 13:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 13:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes2027']
  • 13:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 13:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T343718)', diff saved to https://phabricator.wikimedia.org/P52092 and previous config saved to /var/cache/conftool/dbconfig/20230830-133724-ladsgroup.json
  • 13:36 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2027']
  • 13:36 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes2026']
  • 13:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2027']
  • 13:36 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2027']
  • 13:36 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2026']
  • 13:36 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host doh6001.wikimedia.org with OS bookworm
  • 13:35 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2039']
  • 13:35 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2038']
  • 13:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2037']
  • 13:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2035']
  • 13:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2034']
  • 13:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2033']
  • 13:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2036']
  • 13:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2032']
  • 13:33 moritzm: failover ganeti master in eqiad to ganeti1027
  • 13:32 taavi@deploy1002: Finished scap: Backport for Disable NearbyPages on lockeddown, Disable Collection on lockeddown, Disable FileExporter on lockeddown (duration: 08m 10s)
  • 13:28 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 13:27 taavi@deploy1002: taavi: Continuing with sync
  • 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P52091 and previous config saved to /var/cache/conftool/dbconfig/20230830-132703-ladsgroup.json
  • 13:25 taavi@deploy1002: taavi: Backport for Disable NearbyPages on lockeddown, Disable Collection on lockeddown, Disable FileExporter on lockeddown synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:24 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2037']
  • 13:24 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2036']
  • 13:24 taavi@deploy1002: Started scap: Backport for Disable NearbyPages on lockeddown, Disable Collection on lockeddown, Disable FileExporter on lockeddown
  • 13:24 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2035']
  • 13:24 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2034']
  • 13:24 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2033']
  • 13:24 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2032']
  • 13:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2031']
  • 13:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2030']
  • 13:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2029']
  • 13:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2028']
  • 13:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2027']
  • 13:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2026']
  • 13:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P52090 and previous config saved to /var/cache/conftool/dbconfig/20230830-132218-ladsgroup.json
  • 13:21 elukey@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka main-codfw cluster: Reboot kafka nodes
  • 13:21 jiji@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-codfw
  • 13:20 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply security updates - bking@cumin1001 - T344587
  • 13:18 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 13:15 samtar@deploy1002: Finished scap: Backport for IS: Enable Phonos on all projects (T336763) (duration: 09m 29s)
  • 13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P52089 and previous config saved to /var/cache/conftool/dbconfig/20230830-131157-ladsgroup.json
  • 13:10 samtar@deploy1002: samtar: Continuing with sync
  • 13:09 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2031']
  • 13:09 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2030']
  • 13:09 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2029']
  • 13:09 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2028']
  • 13:09 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2027']
  • 13:09 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2026']
  • 13:08 samtar@deploy1002: samtar: Backport for IS: Enable Phonos on all projects (T336763) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P52088 and previous config saved to /var/cache/conftool/dbconfig/20230830-130712-ladsgroup.json
  • 13:06 samtar@deploy1002: Started scap: Backport for IS: Enable Phonos on all projects (T336763)
  • 13:04 samtar@deploy1002: backport Cancelled
  • 13:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2034.mgmt.codfw.wmnet with reboot policy FORCED
  • 13:02 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:02 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 13:02 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw2-22-ulsfo
  • 13:02 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw2-22-ulsfo
  • 13:01 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 13:00 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2034.mgmt.codfw.wmnet with reboot policy FORCED
  • 12:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1034.eqiad.wmnet
  • 12:58 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:58 cmooney@cumin1001: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 12:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1034.eqiad.wmnet
  • 12:57 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T343718)', diff saved to https://phabricator.wikimedia.org/P52087 and previous config saved to /var/cache/conftool/dbconfig/20230830-125650-ladsgroup.json
  • 12:56 elukey: restart kubelet on ml-serve1001 to clear prometheus metrics
  • 12:55 taavi@deploy1002: Finished scap: Backport for wmf-config: remove public subnets from reverse-proxy.php (T344704 T329219) (duration: 11m 28s)
  • 12:54 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:54 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 12:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1034.eqiad.wmnet
  • 12:53 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:53 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:53 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 12:52 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 12:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T343718)', diff saved to https://phabricator.wikimedia.org/P52086 and previous config saved to /var/cache/conftool/dbconfig/20230830-125206-ladsgroup.json
  • 12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1188 (T343718)', diff saved to https://phabricator.wikimedia.org/P52085 and previous config saved to /var/cache/conftool/dbconfig/20230830-124954-ladsgroup.json
  • 12:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 12:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T343718)', diff saved to https://phabricator.wikimedia.org/P52084 and previous config saved to /var/cache/conftool/dbconfig/20230830-124933-ladsgroup.json
  • 12:47 taavi@deploy1002: sukhe and taavi: Continuing with sync
  • 12:46 taavi@deploy1002: sukhe and taavi: Backport for wmf-config: remove public subnets from reverse-proxy.php (T344704 T329219) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 12:46 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 12:45 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1009.eqiad.wmnet
  • 12:43 taavi@deploy1002: Started scap: Backport for wmf-config: remove public subnets from reverse-proxy.php (T344704 T329219)
  • 12:43 ladsgroup@deploy1002: Finished scap: Backport for ores-extension: fix thresholds (T343308) (duration: 25m 53s)
  • 12:38 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1009.eqiad.wmnet
  • 12:37 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1008.eqiad.wmnet
  • 12:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P52083 and previous config saved to /var/cache/conftool/dbconfig/20230830-123427-ladsgroup.json
  • 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1034.eqiad.wmnet
  • 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T343718)', diff saved to https://phabricator.wikimedia.org/P52082 and previous config saved to /var/cache/conftool/dbconfig/20230830-123001-ladsgroup.json
  • 12:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 12:29 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1008.eqiad.wmnet
  • 12:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T343718)', diff saved to https://phabricator.wikimedia.org/P52081 and previous config saved to /var/cache/conftool/dbconfig/20230830-122940-ladsgroup.json
  • 12:29 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:28 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 12:28 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:27 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:27 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 12:26 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 12:25 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1007.eqiad.wmnet
  • 12:19 ladsgroup@deploy1002: isaranto and ladsgroup: Continuing with sync
  • 12:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P52080 and previous config saved to /var/cache/conftool/dbconfig/20230830-121921-ladsgroup.json
  • 12:19 ladsgroup@deploy1002: isaranto and ladsgroup: Backport for ores-extension: fix thresholds (T343308) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 12:19 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1007.eqiad.wmnet
  • 12:17 ladsgroup@deploy1002: Started scap: Backport for ores-extension: fix thresholds (T343308)
  • 12:16 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1006.eqiad.wmnet with reason: host reimage
  • 12:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1033.eqiad.wmnet
  • 12:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1033.eqiad.wmnet
  • 12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P52079 and previous config saved to /var/cache/conftool/dbconfig/20230830-121433-ladsgroup.json
  • 12:13 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1006.eqiad.wmnet with reason: host reimage
  • 12:12 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1006.eqiad.wmnet with OS bullseye
  • 12:10 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudservices1006.eqiad.wmnet with OS bullseye
  • 12:10 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1006.eqiad.wmnet with OS bullseye
  • 12:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1033.eqiad.wmnet
  • 12:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
  • 12:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1220 (T344589)', diff saved to https://phabricator.wikimedia.org/P52078 and previous config saved to /var/cache/conftool/dbconfig/20230830-120511-ladsgroup.json
  • 12:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T343718)', diff saved to https://phabricator.wikimedia.org/P52077 and previous config saved to /var/cache/conftool/dbconfig/20230830-120415-ladsgroup.json
  • 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P52076 and previous config saved to /var/cache/conftool/dbconfig/20230830-115927-ladsgroup.json
  • 11:59 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudservices1006.eqiad.wmnet with OS bullseye
  • 11:57 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1033.eqiad.wmnet
  • 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1032.eqiad.wmnet
  • 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1032.eqiad.wmnet
  • 11:52 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 11:52 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:52 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 11:51 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 11:51 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 11:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1220', diff saved to https://phabricator.wikimedia.org/P52074 and previous config saved to /var/cache/conftool/dbconfig/20230830-115005-ladsgroup.json
  • 11:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1032.eqiad.wmnet
  • 11:48 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 11:47 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T343718)', diff saved to https://phabricator.wikimedia.org/P52073 and previous config saved to /var/cache/conftool/dbconfig/20230830-114421-ladsgroup.json
  • 11:40 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1032.eqiad.wmnet
  • 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T343718)', diff saved to https://phabricator.wikimedia.org/P52072 and previous config saved to /var/cache/conftool/dbconfig/20230830-113728-ladsgroup.json
  • 11:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 11:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T343718)', diff saved to https://phabricator.wikimedia.org/P52071 and previous config saved to /var/cache/conftool/dbconfig/20230830-113656-ladsgroup.json
  • 11:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1031.eqiad.wmnet
  • 11:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1031.eqiad.wmnet
  • 11:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1220', diff saved to https://phabricator.wikimedia.org/P52070 and previous config saved to /var/cache/conftool/dbconfig/20230830-113459-ladsgroup.json
  • 11:34 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 11:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1031.eqiad.wmnet
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P52069 and previous config saved to /var/cache/conftool/dbconfig/20230830-112150-ladsgroup.json
  • 11:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1220 (T344589)', diff saved to https://phabricator.wikimedia.org/P52068 and previous config saved to /var/cache/conftool/dbconfig/20230830-111952-ladsgroup.json
  • 11:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T343718)', diff saved to https://phabricator.wikimedia.org/P52067 and previous config saved to /var/cache/conftool/dbconfig/20230830-111720-ladsgroup.json
  • 11:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 11:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 11:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T343718)', diff saved to https://phabricator.wikimedia.org/P52066 and previous config saved to /var/cache/conftool/dbconfig/20230830-111659-ladsgroup.json
  • 11:16 jbond: switch cumin to the puppetdb api micro service Gerrit:953203
  • 11:13 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1031.eqiad.wmnet
  • 11:12 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1006.eqiad.wmnet with OS bullseye
  • 11:12 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudservices1006.eqiad.wmnet with OS bullseye
  • 11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1220 (T344589)', diff saved to https://phabricator.wikimedia.org/P52065 and previous config saved to /var/cache/conftool/dbconfig/20230830-111143-ladsgroup.json
  • 11:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1220.eqiad.wmnet with reason: Maintenance
  • 11:11 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
  • 11:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1220.eqiad.wmnet with reason: Maintenance
  • 11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137 (T344589)', diff saved to https://phabricator.wikimedia.org/P52064 and previous config saved to /var/cache/conftool/dbconfig/20230830-111118-ladsgroup.json
  • 11:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131 (T344589)', diff saved to https://phabricator.wikimedia.org/P52063 and previous config saved to /var/cache/conftool/dbconfig/20230830-110800-ladsgroup.json
  • 11:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P52062 and previous config saved to /var/cache/conftool/dbconfig/20230830-110644-ladsgroup.json
  • 11:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P52061 and previous config saved to /var/cache/conftool/dbconfig/20230830-110152-ladsgroup.json
  • 11:01 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
  • 11:00 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
  • 10:57 XioNoX: enable mgmt_junos on fasw-c-codfw - T327862
  • 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1030.eqiad.wmnet
  • 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1030.eqiad.wmnet
  • 10:56 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:56 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 10:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137', diff saved to https://phabricator.wikimedia.org/P52060 and previous config saved to /var/cache/conftool/dbconfig/20230830-105612-ladsgroup.json
  • 10:55 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 10:54 moritzm: installing grub2 updates from bullseye point release
  • 10:53 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 10:52 cmooney@cumin1001: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 10:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P52059 and previous config saved to /var/cache/conftool/dbconfig/20230830-105254-ladsgroup.json
  • 10:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T343718)', diff saved to https://phabricator.wikimedia.org/P52058 and previous config saved to /var/cache/conftool/dbconfig/20230830-105138-ladsgroup.json
  • 10:50 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/media-analytics: apply
  • 10:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1030.eqiad.wmnet
  • 10:48 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1006.eqiad.wmnet with OS bullseye
  • 10:48 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
  • 10:48 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/media-analytics: apply
  • 10:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P52057 and previous config saved to /var/cache/conftool/dbconfig/20230830-104646-ladsgroup.json
  • 10:42 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudservices1006
  • 10:41 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudservices1006
  • 10:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137', diff saved to https://phabricator.wikimedia.org/P52056 and previous config saved to /var/cache/conftool/dbconfig/20230830-104105-ladsgroup.json
  • 10:38 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
  • 10:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P52055 and previous config saved to /var/cache/conftool/dbconfig/20230830-103747-ladsgroup.json
  • 10:35 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:35 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices1006 - aborrero@cumin1001"
  • 10:35 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices1006 - aborrero@cumin1001"
  • 10:32 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 10:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T343718)', diff saved to https://phabricator.wikimedia.org/P52054 and previous config saved to /var/cache/conftool/dbconfig/20230830-103140-ladsgroup.json
  • 10:28 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/media-analytics: apply
  • 10:26 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1030.eqiad.wmnet
  • 10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137 (T344589)', diff saved to https://phabricator.wikimedia.org/P52053 and previous config saved to /var/cache/conftool/dbconfig/20230830-102559-ladsgroup.json
  • 10:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T343718)', diff saved to https://phabricator.wikimedia.org/P52052 and previous config saved to /var/cache/conftool/dbconfig/20230830-102452-ladsgroup.json
  • 10:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 10:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 10:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T343718)', diff saved to https://phabricator.wikimedia.org/P52051 and previous config saved to /var/cache/conftool/dbconfig/20230830-102432-ladsgroup.json
  • 10:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131 (T344589)', diff saved to https://phabricator.wikimedia.org/P52050 and previous config saved to /var/cache/conftool/dbconfig/20230830-102241-ladsgroup.json
  • 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1029.eqiad.wmnet
  • 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1029.eqiad.wmnet
  • 10:20 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 10:20 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 10:18 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 10:16 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 10:16 hnowlan@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:16 godog: +50g to prometheus eqiad 'services' instance
  • 10:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1029.eqiad.wmnet
  • 10:14 hnowlan@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2131 (T344589)', diff saved to https://phabricator.wikimedia.org/P52049 and previous config saved to /var/cache/conftool/dbconfig/20230830-101437-ladsgroup.json
  • 10:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2131.codfw.wmnet with reason: Maintenance
  • 10:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2131.codfw.wmnet with reason: Maintenance
  • 10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2096 (T344589)', diff saved to https://phabricator.wikimedia.org/P52048 and previous config saved to /var/cache/conftool/dbconfig/20230830-101410-ladsgroup.json
  • 10:13 hnowlan@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:12 hnowlan@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P52047 and previous config saved to /var/cache/conftool/dbconfig/20230830-100926-ladsgroup.json
  • 10:07 jiji@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-codfw
  • 10:06 effie: Rolling reboot codfw wikikube k8s nodes
  • 10:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3312 (T343718)', diff saved to https://phabricator.wikimedia.org/P52046 and previous config saved to /var/cache/conftool/dbconfig/20230830-100413-ladsgroup.json
  • 10:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 10:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T343718)', diff saved to https://phabricator.wikimedia.org/P52045 and previous config saved to /var/cache/conftool/dbconfig/20230830-100351-ladsgroup.json
  • 09:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2096', diff saved to https://phabricator.wikimedia.org/P52044 and previous config saved to /var/cache/conftool/dbconfig/20230830-095903-ladsgroup.json
  • 09:58 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1029.eqiad.wmnet
  • 09:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1006.eqiad.wmnet
  • 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P52043 and previous config saved to /var/cache/conftool/dbconfig/20230830-095419-ladsgroup.json
  • 09:49 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1006.eqiad.wmnet
  • 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P52042 and previous config saved to /var/cache/conftool/dbconfig/20230830-094845-ladsgroup.json
  • 09:47 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1005.eqiad.wmnet
  • 09:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2096', diff saved to https://phabricator.wikimedia.org/P52041 and previous config saved to /var/cache/conftool/dbconfig/20230830-094357-ladsgroup.json
  • 09:40 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1005.eqiad.wmnet
  • 09:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1004.eqiad.wmnet
  • 09:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T343718)', diff saved to https://phabricator.wikimedia.org/P52040 and previous config saved to /var/cache/conftool/dbconfig/20230830-093913-ladsgroup.json
  • 09:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P52039 and previous config saved to /var/cache/conftool/dbconfig/20230830-093339-ladsgroup.json
  • 09:33 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1004.eqiad.wmnet
  • 09:32 urbanecm@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 09:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1027.eqiad.wmnet
  • 09:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1027.eqiad.wmnet
  • 09:31 urbanecm@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 09:31 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1003.eqiad.wmnet
  • 09:30 urbanecm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 09:28 urbanecm@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 09:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2096 (T344589)', diff saved to https://phabricator.wikimedia.org/P52038 and previous config saved to /var/cache/conftool/dbconfig/20230830-092851-ladsgroup.json
  • 09:28 urbanecm@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 09:27 urbanecm@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 09:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1027.eqiad.wmnet
  • 09:25 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1003.eqiad.wmnet
  • 09:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1002.eqiad.wmnet
  • 09:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2030 (T344589)', diff saved to https://phabricator.wikimedia.org/P52037 and previous config saved to /var/cache/conftool/dbconfig/20230830-092255-ladsgroup.json
  • 09:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1137 (T344589)', diff saved to https://phabricator.wikimedia.org/P52036 and previous config saved to /var/cache/conftool/dbconfig/20230830-092228-ladsgroup.json
  • 09:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1137.eqiad.wmnet with reason: Maintenance
  • 09:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1137.eqiad.wmnet with reason: Maintenance
  • 09:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2096 (T344589)', diff saved to https://phabricator.wikimedia.org/P52035 and previous config saved to /var/cache/conftool/dbconfig/20230830-092147-ladsgroup.json
  • 09:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2096.codfw.wmnet with reason: Maintenance
  • 09:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2096.codfw.wmnet with reason: Maintenance
  • 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 100%: Repooling after upgrade 10.4.31 T344309', diff saved to https://phabricator.wikimedia.org/P52034 and previous config saved to /var/cache/conftool/dbconfig/20230830-091922-root.json
  • 09:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T343718)', diff saved to https://phabricator.wikimedia.org/P52033 and previous config saved to /var/cache/conftool/dbconfig/20230830-091833-ladsgroup.json
  • 09:17 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1002.eqiad.wmnet
  • 09:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T343718)', diff saved to https://phabricator.wikimedia.org/P52032 and previous config saved to /var/cache/conftool/dbconfig/20230830-091610-ladsgroup.json
  • 09:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1001.eqiad.wmnet
  • 09:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 09:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 09:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 09:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 09:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T343718)', diff saved to https://phabricator.wikimedia.org/P52031 and previous config saved to /var/cache/conftool/dbconfig/20230830-091544-ladsgroup.json
  • 09:12 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw-c1a-eqiad
  • 09:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1027.eqiad.wmnet
  • 09:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T343718)', diff saved to https://phabricator.wikimedia.org/P52030 and previous config saved to /var/cache/conftool/dbconfig/20230830-091242-ladsgroup.json
  • 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 09:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1026.eqiad.wmnet
  • 09:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T343718)', diff saved to https://phabricator.wikimedia.org/P52029 and previous config saved to /var/cache/conftool/dbconfig/20230830-091203-ladsgroup.json
  • 09:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1026.eqiad.wmnet
  • 09:10 ladsgroup@deploy1002: Finished scap: Backport for Allow setting configurations through rtl dblist (duration: 08m 52s)
  • 09:09 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device fasw-c1a-eqiad
  • 09:09 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw-c8a-codfw
  • 09:09 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1001.eqiad.wmnet
  • 09:09 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2009.codfw.wmnet
  • 09:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2030', diff saved to https://phabricator.wikimedia.org/P52028 and previous config saved to /var/cache/conftool/dbconfig/20230830-090749-ladsgroup.json
  • 09:06 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device fasw-c8a-codfw
  • 09:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw-d2-codfw
  • 09:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1026.eqiad.wmnet
  • 09:04 ladsgroup@deploy1002: ladsgroup and zabe: Continuing with sync
  • 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 75%: Repooling after upgrade 10.4.31 T344309', diff saved to https://phabricator.wikimedia.org/P52027 and previous config saved to /var/cache/conftool/dbconfig/20230830-090415-root.json
  • 09:03 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw-d2-codfw
  • 09:03 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw-c2-codfw
  • 09:02 ladsgroup@deploy1002: ladsgroup and zabe: Backport for Allow setting configurations through rtl dblist synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 09:02 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2009.codfw.wmnet
  • 09:01 ladsgroup@deploy1002: Started scap: Backport for Allow setting configurations through rtl dblist
  • 09:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P52026 and previous config saved to /var/cache/conftool/dbconfig/20230830-090038-ladsgroup.json
  • 09:00 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw-c2-codfw
  • 08:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw-b2-codfw
  • 08:59 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ores2008.codfw.wmnet
  • 08:57 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw-b2-codfw
  • 08:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw-0604-eqsin
  • 08:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P52025 and previous config saved to /var/cache/conftool/dbconfig/20230830-085657-ladsgroup.json
  • 08:53 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw-0604-eqsin
  • 08:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw2-d7-eqiad
  • 08:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2030', diff saved to https://phabricator.wikimedia.org/P52024 and previous config saved to /var/cache/conftool/dbconfig/20230830-085243-ladsgroup.json
  • 08:51 Emperor: stopping puppet to fix broken drive labelling after disk swap thanos-be1003 T345079
  • 08:50 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw2-d7-eqiad
  • 08:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw2-c2-eqiad
  • 08:50 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1026.eqiad.wmnet
  • 08:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet
  • 08:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet
  • 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 50%: Repooling after upgrade 10.4.31 T344309', diff saved to https://phabricator.wikimedia.org/P52023 and previous config saved to /var/cache/conftool/dbconfig/20230830-084911-root.json
  • 08:47 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw2-c2-eqiad
  • 08:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw-a2-codfw
  • 08:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P52021 and previous config saved to /var/cache/conftool/dbconfig/20230830-084532-ladsgroup.json
  • 08:44 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw-a2-codfw
  • 08:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet
  • 08:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P52020 and previous config saved to /var/cache/conftool/dbconfig/20230830-084151-ladsgroup.json
  • 08:38 XioNoX: set bgp-error-tolerance on all sessions - T340111
  • 08:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2030 (T344589)', diff saved to https://phabricator.wikimedia.org/P52019 and previous config saved to /var/cache/conftool/dbconfig/20230830-083737-ladsgroup.json
  • 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 25%: Repooling after upgrade 10.4.31 T344309', diff saved to https://phabricator.wikimedia.org/P52018 and previous config saved to /var/cache/conftool/dbconfig/20230830-083406-root.json
  • 08:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet
  • 08:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2030 (T344589)', diff saved to https://phabricator.wikimedia.org/P52017 and previous config saved to /var/cache/conftool/dbconfig/20230830-083246-ladsgroup.json
  • 08:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2030.codfw.wmnet with reason: Maintenance
  • 08:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2030.codfw.wmnet with reason: Maintenance
  • 08:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2028 (T344589)', diff saved to https://phabricator.wikimedia.org/P52016 and previous config saved to /var/cache/conftool/dbconfig/20230830-083220-ladsgroup.json
  • 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1023.eqiad.wmnet
  • 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1023.eqiad.wmnet
  • 08:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T343718)', diff saved to https://phabricator.wikimedia.org/P52015 and previous config saved to /var/cache/conftool/dbconfig/20230830-083025-ladsgroup.json
  • 08:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1023.eqiad.wmnet
  • 08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T343718)', diff saved to https://phabricator.wikimedia.org/P52014 and previous config saved to /var/cache/conftool/dbconfig/20230830-082645-ladsgroup.json
  • 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 10%: Repooling after upgrade 10.4.31 T344309', diff saved to https://phabricator.wikimedia.org/P52013 and previous config saved to /var/cache/conftool/dbconfig/20230830-081901-root.json
  • 08:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2028', diff saved to https://phabricator.wikimedia.org/P52012 and previous config saved to /var/cache/conftool/dbconfig/20230830-081714-ladsgroup.json
  • 08:06 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2008.codfw.wmnet
  • 08:05 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1023.eqiad.wmnet
  • 08:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2007.codfw.wmnet
  • 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 5%: Repooling after upgrade 10.4.31 T344309', diff saved to https://phabricator.wikimedia.org/P52011 and previous config saved to /var/cache/conftool/dbconfig/20230830-080356-root.json
  • 08:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2028', diff saved to https://phabricator.wikimedia.org/P52010 and previous config saved to /var/cache/conftool/dbconfig/20230830-080208-ladsgroup.json
  • 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1022.eqiad.wmnet
  • 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1022.eqiad.wmnet
  • 07:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T343718)', diff saved to https://phabricator.wikimedia.org/P52009 and previous config saved to /var/cache/conftool/dbconfig/20230830-075956-ladsgroup.json
  • 07:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 07:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 07:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T343718)', diff saved to https://phabricator.wikimedia.org/P52008 and previous config saved to /var/cache/conftool/dbconfig/20230830-075934-ladsgroup.json
  • 07:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T343718)', diff saved to https://phabricator.wikimedia.org/P52007 and previous config saved to /var/cache/conftool/dbconfig/20230830-075736-ladsgroup.json
  • 07:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 07:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 07:57 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2007.codfw.wmnet
  • 07:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1022.eqiad.wmnet
  • 07:51 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1128.eqiad.wmnet with OS bullseye
  • 07:50 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1129.eqiad.wmnet with OS bullseye
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 3%: Repooling after upgrade 10.4.31 T344309', diff saved to https://phabricator.wikimedia.org/P52006 and previous config saved to /var/cache/conftool/dbconfig/20230830-074852-root.json
  • 07:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2028 (T344589)', diff saved to https://phabricator.wikimedia.org/P52005 and previous config saved to /var/cache/conftool/dbconfig/20230830-074702-ladsgroup.json
  • 07:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P52004 and previous config saved to /var/cache/conftool/dbconfig/20230830-074428-ladsgroup.json
  • 07:42 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1022.eqiad.wmnet
  • 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P52003 and previous config saved to /var/cache/conftool/dbconfig/20230830-074238-root.json
  • 07:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2028 (T344589)', diff saved to https://phabricator.wikimedia.org/P52002 and previous config saved to /var/cache/conftool/dbconfig/20230830-074202-ladsgroup.json
  • 07:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2028.codfw.wmnet with reason: Maintenance
  • 07:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2028.codfw.wmnet with reason: Maintenance
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Repooling after onsite upgrade', diff saved to https://phabricator.wikimedia.org/P52001 and previous config saved to /var/cache/conftool/dbconfig/20230830-073514-root.json
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 1%: Repooling after upgrade 10.4.31 T344309', diff saved to https://phabricator.wikimedia.org/P52000 and previous config saved to /var/cache/conftool/dbconfig/20230830-073347-root.json
  • 07:31 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2006.codfw.wmnet
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1128 upgrade to mariadb 10.4.31', diff saved to https://phabricator.wikimedia.org/P51999 and previous config saved to /var/cache/conftool/dbconfig/20230830-073144-root.json
  • 07:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P51998 and previous config saved to /var/cache/conftool/dbconfig/20230830-072922-ladsgroup.json
  • 07:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 07:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 07:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T343718)', diff saved to https://phabricator.wikimedia.org/P51997 and previous config saved to /var/cache/conftool/dbconfig/20230830-072902-ladsgroup.json
  • 07:28 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1128.eqiad.wmnet with reason: host reimage
  • 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P51996 and previous config saved to /var/cache/conftool/dbconfig/20230830-072733-root.json
  • 07:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1021.eqiad.wmnet
  • 07:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1021.eqiad.wmnet
  • 07:25 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1129.eqiad.wmnet with reason: host reimage
  • 07:25 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2006.codfw.wmnet
  • 07:23 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2005.codfw.wmnet
  • 07:22 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1128.eqiad.wmnet with reason: host reimage
  • 07:22 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1129.eqiad.wmnet with reason: host reimage
  • 07:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1021.eqiad.wmnet
  • 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Repooling after onsite upgrade', diff saved to https://phabricator.wikimedia.org/P51995 and previous config saved to /var/cache/conftool/dbconfig/20230830-072009-root.json
  • 07:19 ladsgroup@deploy1002: Finished scap: Backport for Disable search result deduplication. (T341227) (duration: 15m 53s)
  • 07:18 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1021.eqiad.wmnet
  • 07:17 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1021.eqiad.wmnet
  • 07:16 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2005.codfw.wmnet
  • 07:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2004.codfw.wmnet
  • 07:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T343718)', diff saved to https://phabricator.wikimedia.org/P51994 and previous config saved to /var/cache/conftool/dbconfig/20230830-071416-ladsgroup.json
  • 07:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P51993 and previous config saved to /var/cache/conftool/dbconfig/20230830-071356-ladsgroup.json
  • 07:13 ladsgroup@deploy1002: ladsgroup and pfischer: Continuing with sync
  • 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P51992 and previous config saved to /var/cache/conftool/dbconfig/20230830-071228-root.json
  • 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T343718)', diff saved to https://phabricator.wikimedia.org/P51991 and previous config saved to /var/cache/conftool/dbconfig/20230830-071152-ladsgroup.json
  • 07:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 07:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 07:10 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2004.codfw.wmnet
  • 07:09 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1129.eqiad.wmnet with OS bullseye
  • 07:09 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1128.eqiad.wmnet with OS bullseye
  • 07:08 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2003.codfw.wmnet
  • 07:06 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1021.eqiad.wmnet
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: Repooling after onsite upgrade', diff saved to https://phabricator.wikimedia.org/P51990 and previous config saved to /var/cache/conftool/dbconfig/20230830-070504-root.json
  • 07:04 ladsgroup@deploy1002: ladsgroup and pfischer: Backport for Disable search result deduplication. (T341227) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1020.eqiad.wmnet
  • 07:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1020.eqiad.wmnet
  • 07:03 ladsgroup@deploy1002: Started scap: Backport for Disable search result deduplication. (T341227)
  • 07:01 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2003.codfw.wmnet
  • 07:01 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1127.eqiad.wmnet with OS bullseye
  • 06:58 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1126.eqiad.wmnet with OS bullseye
  • 06:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P51989 and previous config saved to /var/cache/conftool/dbconfig/20230830-065849-ladsgroup.json
  • 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P51988 and previous config saved to /var/cache/conftool/dbconfig/20230830-065723-root.json
  • 06:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1020.eqiad.wmnet
  • 06:50 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1020.eqiad.wmnet
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Repooling after onsite upgrade', diff saved to https://phabricator.wikimedia.org/P51987 and previous config saved to /var/cache/conftool/dbconfig/20230830-064959-root.json
  • 06:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T343718)', diff saved to https://phabricator.wikimedia.org/P51986 and previous config saved to /var/cache/conftool/dbconfig/20230830-064343-ladsgroup.json
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P51985 and previous config saved to /var/cache/conftool/dbconfig/20230830-064219-root.json
  • 06:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 06:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 06:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T343718)', diff saved to https://phabricator.wikimedia.org/P51984 and previous config saved to /var/cache/conftool/dbconfig/20230830-064131-ladsgroup.json
  • 06:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 06:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 06:37 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1127.eqiad.wmnet with reason: host reimage
  • 06:35 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1126.eqiad.wmnet with reason: host reimage
  • 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 10%: Repooling after onsite upgrade', diff saved to https://phabricator.wikimedia.org/P51983 and previous config saved to /var/cache/conftool/dbconfig/20230830-063455-root.json
  • 06:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 33
  • 06:33 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 33
  • 06:33 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1127.eqiad.wmnet with reason: host reimage
  • 06:32 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1126.eqiad.wmnet with reason: host reimage
  • 06:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 33
  • 06:31 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 33
  • 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P51982 and previous config saved to /var/cache/conftool/dbconfig/20230830-062714-root.json
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 5%: Repooling after onsite upgrade', diff saved to https://phabricator.wikimedia.org/P51981 and previous config saved to /var/cache/conftool/dbconfig/20230830-061950-root.json
  • 06:19 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1127.eqiad.wmnet with OS bullseye
  • 06:18 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1126.eqiad.wmnet with OS bullseye
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 3%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P51980 and previous config saved to /var/cache/conftool/dbconfig/20230830-061209-root.json
  • 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 3%: Repooling after onsite upgrade', diff saved to https://phabricator.wikimedia.org/P51979 and previous config saved to /var/cache/conftool/dbconfig/20230830-060445-root.json
  • 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 1%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P51978 and previous config saved to /var/cache/conftool/dbconfig/20230830-055704-root.json
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1173 upgrade to mariadb 10.6', diff saved to https://phabricator.wikimedia.org/P51977 and previous config saved to /var/cache/conftool/dbconfig/20230830-055034-root.json
  • 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 1%: Repooling after onsite upgrade', diff saved to https://phabricator.wikimedia.org/P51976 and previous config saved to /var/cache/conftool/dbconfig/20230830-054940-root.json
  • 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1165 upgrade to mariadb 10.6', diff saved to https://phabricator.wikimedia.org/P51975 and previous config saved to /var/cache/conftool/dbconfig/20230830-054248-root.json
  • 05:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 05:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 05:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T343718)', diff saved to https://phabricator.wikimedia.org/P51974 and previous config saved to /var/cache/conftool/dbconfig/20230830-051543-ladsgroup.json
  • 05:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P51973 and previous config saved to /var/cache/conftool/dbconfig/20230830-050036-ladsgroup.json
  • 04:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P51972 and previous config saved to /var/cache/conftool/dbconfig/20230830-044530-ladsgroup.json
  • 04:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T343718)', diff saved to https://phabricator.wikimedia.org/P51971 and previous config saved to /var/cache/conftool/dbconfig/20230830-043024-ladsgroup.json
  • 01:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1221 (T343718)', diff saved to https://phabricator.wikimedia.org/P51970 and previous config saved to /var/cache/conftool/dbconfig/20230830-014730-ladsgroup.json
  • 01:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 01:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 01:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 01:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 01:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T343718)', diff saved to https://phabricator.wikimedia.org/P51969 and previous config saved to /var/cache/conftool/dbconfig/20230830-014702-ladsgroup.json
  • 01:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P51968 and previous config saved to /var/cache/conftool/dbconfig/20230830-013156-ladsgroup.json
  • 01:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P51967 and previous config saved to /var/cache/conftool/dbconfig/20230830-011650-ladsgroup.json
  • 01:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes2034.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T343718)', diff saved to https://phabricator.wikimedia.org/P51966 and previous config saved to /var/cache/conftool/dbconfig/20230830-010144-ladsgroup.json
  • 01:00 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2034.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes2034.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2037.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2035.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2039.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2038.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2036.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2039.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2038.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2037.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2036.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2035.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2034.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2033.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2032.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2031.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T343718)', diff saved to https://phabricator.wikimedia.org/P51965 and previous config saved to /var/cache/conftool/dbconfig/20230830-002108-ladsgroup.json
  • 00:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2030.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2029.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2033.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2028.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:12 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2032.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:10 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2031.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:08 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2030.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P51964 and previous config saved to /var/cache/conftool/dbconfig/20230830-000602-ladsgroup.json
  • 00:05 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2029.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2028.mgmt.codfw.wmnet with reboot policy FORCED

2023-08-29

  • 23:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P51963 and previous config saved to /var/cache/conftool/dbconfig/20230829-235055-ladsgroup.json
  • 23:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T343718)', diff saved to https://phabricator.wikimedia.org/P51962 and previous config saved to /var/cache/conftool/dbconfig/20230829-233549-ladsgroup.json
  • 23:09 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly racked kubernetes2034 hosts in codfw - jhancock@cumin2002"
  • 23:08 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly racked kubernetes2034 hosts in codfw - jhancock@cumin2002"
  • 23:06 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 23:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2027.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:01 eileen: civicrm upgraded from fc5c73db to 29ce9ac0
  • 22:52 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2027.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2026.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1199 (T343718)', diff saved to https://phabricator.wikimedia.org/P51961 and previous config saved to /var/cache/conftool/dbconfig/20230829-225031-ladsgroup.json
  • 22:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 22:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 22:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T343718)', diff saved to https://phabricator.wikimedia.org/P51960 and previous config saved to /var/cache/conftool/dbconfig/20230829-225010-ladsgroup.json
  • 22:40 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2026.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P51959 and previous config saved to /var/cache/conftool/dbconfig/20230829-223504-ladsgroup.json
  • 22:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P51958 and previous config saved to /var/cache/conftool/dbconfig/20230829-221958-ladsgroup.json
  • 22:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T343718)', diff saved to https://phabricator.wikimedia.org/P51957 and previous config saved to /var/cache/conftool/dbconfig/20230829-220451-ladsgroup.json
  • 21:53 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes2026.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2026.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:50 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes2026.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:49 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes2027.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2027.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2026.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:36 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host restbase1030.eqiad.wmnet
  • 21:35 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:33 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:31 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:30 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:29 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2172 (T343718)', diff saved to https://phabricator.wikimedia.org/P51956 and previous config saved to /var/cache/conftool/dbconfig/20230829-212619-ladsgroup.json
  • 21:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 21:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 21:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T343718)', diff saved to https://phabricator.wikimedia.org/P51955 and previous config saved to /var/cache/conftool/dbconfig/20230829-212558-ladsgroup.json
  • 21:25 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:23 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:23 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1030.eqiad.wmnet
  • 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:21 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:20 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:20 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:19 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly racked kubernetes2026 hosts in codfw - jhancock@cumin2002"
  • 21:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly racked kubernetes2026 hosts in codfw - jhancock@cumin2002"
  • 21:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:14 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:13 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:13 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:13 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:13 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:13 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P51954 and previous config saved to /var/cache/conftool/dbconfig/20230829-211052-ladsgroup.json
  • 20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P51953 and previous config saved to /var/cache/conftool/dbconfig/20230829-205546-ladsgroup.json
  • 20:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T343718)', diff saved to https://phabricator.wikimedia.org/P51952 and previous config saved to /var/cache/conftool/dbconfig/20230829-204039-ladsgroup.json
  • 20:16 urbanecm@deploy1002: Finished scap: Backport for clienthints: Raise maxlag for API back to default for group0 and 1 (T344797) (duration: 07m 13s)
  • 20:10 urbanecm@deploy1002: urbanecm and dreamyjazz: Continuing with sync
  • 20:10 urbanecm@deploy1002: urbanecm and dreamyjazz: Backport for clienthints: Raise maxlag for API back to default for group0 and 1 (T344797) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:09 urbanecm@deploy1002: Started scap: Backport for clienthints: Raise maxlag for API back to default for group0 and 1 (T344797)
  • 19:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1190 (T343718)', diff saved to https://phabricator.wikimedia.org/P51951 and previous config saved to /var/cache/conftool/dbconfig/20230829-195215-ladsgroup.json
  • 19:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 19:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 19:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T343718)', diff saved to https://phabricator.wikimedia.org/P51950 and previous config saved to /var/cache/conftool/dbconfig/20230829-195154-ladsgroup.json
  • 19:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P51949 and previous config saved to /var/cache/conftool/dbconfig/20230829-193648-ladsgroup.json
  • 19:35 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1033.eqiad.wmnet
  • 19:32 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.tls (exit_code=97) for network device asw2-c2-eqiad
  • 19:32 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw2-c2-eqiad
  • 19:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad
  • 19:30 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad
  • 19:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad
  • 19:27 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad
  • 19:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f3-eqiad
  • 19:26 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1033.eqiad.wmnet
  • 19:25 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-f3-eqiad
  • 19:25 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f2-eqiad
  • 19:24 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudservices1006.eqiad.wmnet with OS bullseye
  • 19:24 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:23 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-f2-eqiad
  • 19:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f1-eqiad
  • 19:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P51948 and previous config saved to /var/cache/conftool/dbconfig/20230829-192141-ladsgroup.json
  • 19:20 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-f1-eqiad
  • 19:20 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e3-eqiad
  • 19:18 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-e3-eqiad
  • 19:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e2-eqiad
  • 19:18 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1026.eqiad.wmnet
  • 19:18 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:16 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-e2-eqiad
  • 19:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e1-eqiad
  • 19:13 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-e1-eqiad
  • 19:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 173
  • 19:11 eileen: civicrm upgraded from d13e6e0c to fc5c73db
  • 19:10 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 173
  • 19:10 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1026.eqiad.wmnet
  • 19:09 eileen: civicrm upgraded from d13e6e0c to fc5c73db
  • 19:07 zabe@deploy1002: Finished scap: update interwiki cache (duration: 07m 08s)
  • 19:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T343718)', diff saved to https://phabricator.wikimedia.org/P51947 and previous config saved to /var/cache/conftool/dbconfig/20230829-190635-ladsgroup.json
  • 19:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1006.eqiad.wmnet with reason: host reimage
  • 19:01 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1025.eqiad.wmnet
  • 19:00 zabe@deploy1002: Started scap: update interwiki cache
  • 18:56 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1006.eqiad.wmnet with reason: host reimage
  • 18:55 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudservices1006.eqiad.wmnet with OS bullseye
  • 18:55 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudservices1006.eqiad.wmnet with OS bullseye
  • 18:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudservices1006.eqiad.wmnet with OS bullseye
  • 18:52 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1025.eqiad.wmnet
  • 18:37 aokoth@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host vrts1002.eqiad.wmnet
  • 18:37 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host vrts1002.eqiad.wmnet with OS bullseye
  • 18:32 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1032.eqiad.wmnet
  • 18:24 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1032.eqiad.wmnet
  • 18:24 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on vrts1002.eqiad.wmnet with reason: host reimage
  • 18:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2155 (T343718)', diff saved to https://phabricator.wikimedia.org/P51946 and previous config saved to /var/cache/conftool/dbconfig/20230829-182251-ladsgroup.json
  • 18:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 18:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 18:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 18:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 18:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T343718)', diff saved to https://phabricator.wikimedia.org/P51945 and previous config saved to /var/cache/conftool/dbconfig/20230829-182225-ladsgroup.json
  • 18:21 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on vrts1002.eqiad.wmnet with reason: host reimage
  • 18:16 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1029.eqiad.wmnet
  • 18:14 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.24 refs T343726
  • 18:12 aokoth@cumin1001: START - Cookbook sre.hosts.reimage for host vrts1002.eqiad.wmnet with OS bullseye
  • 18:10 aokoth@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM vrts1002.eqiad.wmnet - aokoth@cumin1001"
  • 18:09 aokoth@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM vrts1002.eqiad.wmnet - aokoth@cumin1001"
  • 18:08 aokoth@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) vrts1002.eqiad.wmnet on all recursors
  • 18:08 aokoth@cumin1001: START - Cookbook sre.dns.wipe-cache vrts1002.eqiad.wmnet on all recursors
  • 18:08 aokoth@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:08 aokoth@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM vrts1002.eqiad.wmnet - aokoth@cumin1001"
  • 18:08 aokoth@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM vrts1002.eqiad.wmnet - aokoth@cumin1001"
  • 18:07 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1029.eqiad.wmnet
  • 18:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P51944 and previous config saved to /var/cache/conftool/dbconfig/20230829-180719-ladsgroup.json
  • 18:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1160 (T343718)', diff saved to https://phabricator.wikimedia.org/P51943 and previous config saved to /var/cache/conftool/dbconfig/20230829-180613-ladsgroup.json
  • 18:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 18:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 18:04 aokoth@cumin1001: START - Cookbook sre.dns.netbox
  • 18:04 aokoth@cumin1001: START - Cookbook sre.ganeti.makevm for new host vrts1002.eqiad.wmnet
  • 18:00 urbanecm@deploy1002: Finished scap: Backport for Growth: Disable Add an image on all wikis (T345188) (duration: 06m 47s)
  • 17:59 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1024.eqiad.wmnet
  • 17:55 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudservices1006.eqiad.wmnet with OS bullseye
  • 17:53 urbanecm@deploy1002: Started scap: Backport for Growth: Disable Add an image on all wikis (T345188)
  • 17:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P51942 and previous config saved to /var/cache/conftool/dbconfig/20230829-175213-ladsgroup.json
  • 17:51 jhuneidi@deploy1002: Pruned MediaWiki: 1.41.0-wmf.22 (duration: 02m 11s)
  • 17:49 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1024.eqiad.wmnet
  • 17:48 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.24 refs T343726 (duration: 43m 27s)
  • 17:41 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1023.eqiad.wmnet
  • 17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T343718)', diff saved to https://phabricator.wikimedia.org/P51941 and previous config saved to /var/cache/conftool/dbconfig/20230829-173707-ladsgroup.json
  • 17:33 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1023.eqiad.wmnet
  • 17:05 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.24 refs T343726
  • 16:31 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 16:30 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 16:30 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 16:30 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 16:30 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 16:30 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 16:26 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:25 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:25 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 16:24 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 16:23 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 16:23 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 16:20 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 16:19 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 16:19 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 16:18 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 16:18 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1022.eqiad.wmnet
  • 16:17 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:17 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
  • 16:17 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
  • 16:17 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:16 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
  • 16:16 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
  • 16:14 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
  • 16:14 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
  • 16:13 ladsgroup@deploy1002: Finished scap: Backport for ores-extension: replace first batch of wikis model thresholds with numeric values (T343308) (duration: 09m 31s)
  • 16:12 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 16:11 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 16:11 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 16:10 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 16:10 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 16:09 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 16:09 akosiaris: deploy cxserver mariadb egress functionality. T341117
  • 16:09 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1022.eqiad.wmnet
  • 16:09 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 16:09 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:09 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 16:08 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:08 ladsgroup@deploy1002: ladsgroup and isaranto: Continuing with sync
  • 16:07 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:07 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:05 ladsgroup@deploy1002: ladsgroup and isaranto: Backport for ores-extension: replace first batch of wikis model thresholds with numeric values (T343308) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 16:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2002.codfw.wmnet
  • 16:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 16:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 16:04 ladsgroup@deploy1002: Started scap: Backport for ores-extension: replace first batch of wikis model thresholds with numeric values (T343308)
  • 16:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T343718)', diff saved to https://phabricator.wikimedia.org/P51939 and previous config saved to /var/cache/conftool/dbconfig/20230829-160415-ladsgroup.json
  • 16:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T343718)', diff saved to https://phabricator.wikimedia.org/P51938 and previous config saved to /var/cache/conftool/dbconfig/20230829-160144-ladsgroup.json
  • 15:58 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2002.codfw.wmnet
  • 15:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2001.codfw.wmnet
  • 15:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2025.codfw.wmnet with OS bullseye
  • 15:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 15:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 15:51 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling reboot on A:swift-fe
  • 15:50 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2001.codfw.wmnet
  • 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P51937 and previous config saved to /var/cache/conftool/dbconfig/20230829-154909-ladsgroup.json
  • 15:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P51936 and previous config saved to /var/cache/conftool/dbconfig/20230829-154638-ladsgroup.json
  • 15:46 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 15:45 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 15:38 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
  • 15:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 15:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T343718)', diff saved to https://phabricator.wikimedia.org/P51935 and previous config saved to /var/cache/conftool/dbconfig/20230829-153801-ladsgroup.json
  • 15:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudservices1006']
  • 15:37 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
  • 15:35 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
  • 15:34 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
  • 15:34 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync
  • 15:34 elukey@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: sync
  • 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P51934 and previous config saved to /var/cache/conftool/dbconfig/20230829-153403-ladsgroup.json
  • 15:32 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudservices1006']
  • 15:31 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1125.eqiad.wmnet with OS bullseye
  • 15:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P51933 and previous config saved to /var/cache/conftool/dbconfig/20230829-153132-ladsgroup.json
  • 15:30 aokoth@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=93) for new host vrts1002.eqiad.wmnet
  • 15:30 aokoth@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) vrts1002.eqiad.wmnet on all recursors
  • 15:30 aokoth@cumin1001: START - Cookbook sre.dns.wipe-cache vrts1002.eqiad.wmnet on all recursors
  • 15:30 aokoth@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:30 aokoth@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM vrts1002.eqiad.wmnet - aokoth@cumin1001"
  • 15:30 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: sync
  • 15:29 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: sync
  • 15:29 aokoth@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM vrts1002.eqiad.wmnet - aokoth@cumin1001"
  • 15:28 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.tls (exit_code=97) for network device asw2-c2-eqiad
  • 15:28 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 15:27 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1124.eqiad.wmnet with OS bullseye
  • 15:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1019.eqiad.wmnet
  • 15:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1019.eqiad.wmnet
  • 15:27 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudservices1006.eqiad.wmnet with OS bullseye
  • 15:27 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 15:27 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw2-c2-eqiad
  • 15:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw2-b2-eqiad
  • 15:27 ladsgroup@deploy1002: Finished scap: Creating tlywiki (T345166) (duration: 07m 03s)
  • 15:25 aokoth@cumin1001: START - Cookbook sre.dns.netbox
  • 15:25 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 15:25 aokoth@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) vrts1002.eqiad.wmnet on all recursors
  • 15:25 aokoth@cumin1001: START - Cookbook sre.dns.wipe-cache vrts1002.eqiad.wmnet on all recursors
  • 15:25 aokoth@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:25 aokoth@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM vrts1002.eqiad.wmnet - aokoth@cumin1001"
  • 15:24 jiji@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 15:24 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 15:24 aokoth@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM vrts1002.eqiad.wmnet - aokoth@cumin1001"
  • 15:24 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw2-b2-eqiad
  • 15:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw2-a7-eqiad
  • 15:23 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync
  • 15:23 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync
  • 15:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P51932 and previous config saved to /var/cache/conftool/dbconfig/20230829-152255-ladsgroup.json
  • 15:22 jiji@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 15:22 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 15:22 aokoth@cumin1001: START - Cookbook sre.dns.netbox
  • 15:22 aokoth@cumin1001: START - Cookbook sre.ganeti.makevm for new host vrts1002.eqiad.wmnet
  • 15:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1019.eqiad.wmnet
  • 15:21 jiji@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 15:21 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 15:21 jiji@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 15:21 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw2-a7-eqiad
  • 15:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw2-22-ulsfo
  • 15:20 ladsgroup@deploy1002: Started scap: Creating tlywiki (T345166)
  • 15:19 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1019.eqiad.wmnet
  • 15:19 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1019.eqiad.wmnet
  • 15:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T343718)', diff saved to https://phabricator.wikimedia.org/P51931 and previous config saved to /var/cache/conftool/dbconfig/20230829-151857-ladsgroup.json
  • 15:18 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw2-22-ulsfo
  • 15:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-by27-esams
  • 15:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T343718)', diff saved to https://phabricator.wikimedia.org/P51930 and previous config saved to /var/cache/conftool/dbconfig/20230829-151625-ladsgroup.json
  • 15:16 ladsgroup@deploy1002: Finished scap: Backport for Enable url shortener in sidebar in RTL and some non-latin wikis (T267921) (duration: 11m 46s)
  • 15:16 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: sync
  • 15:16 elukey@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: sync
  • 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2147 (T343718)', diff saved to https://phabricator.wikimedia.org/P51929 and previous config saved to /var/cache/conftool/dbconfig/20230829-151423-ladsgroup.json
  • 15:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 15:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T343718)', diff saved to https://phabricator.wikimedia.org/P51928 and previous config saved to /var/cache/conftool/dbconfig/20230829-151402-ladsgroup.json
  • 15:13 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw1-by27-esams
  • 15:13 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-bw27-esams
  • 15:10 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 15:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1019.eqiad.wmnet
  • 15:09 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw1-bw27-esams
  • 15:08 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-b13-drmrs
  • 15:08 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1125.eqiad.wmnet with reason: host reimage
  • 15:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P51927 and previous config saved to /var/cache/conftool/dbconfig/20230829-150749-ladsgroup.json
  • 15:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1018.eqiad.wmnet
  • 15:06 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1124.eqiad.wmnet with reason: host reimage
  • 15:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1018.eqiad.wmnet
  • 15:06 ladsgroup@deploy1002: ladsgroup: Backport for Enable url shortener in sidebar in RTL and some non-latin wikis (T267921) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 15:05 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1031.eqiad.wmnet
  • 15:04 ladsgroup@deploy1002: Started scap: Backport for Enable url shortener in sidebar in RTL and some non-latin wikis (T267921)
  • 15:04 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1125.eqiad.wmnet with reason: host reimage
  • 15:04 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw1-b13-drmrs
  • 15:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-b12-drmrs
  • 15:03 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1124.eqiad.wmnet with reason: host reimage
  • 15:02 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: sync
  • 15:01 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: sync
  • 15:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1018.eqiad.wmnet
  • 14:59 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw1-b12-drmrs
  • 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P51926 and previous config saved to /var/cache/conftool/dbconfig/20230829-145856-ladsgroup.json
  • 14:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2181 (T343718)', diff saved to https://phabricator.wikimedia.org/P51925 and previous config saved to /var/cache/conftool/dbconfig/20230829-145727-ladsgroup.json
  • 14:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 14:57 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1031.eqiad.wmnet
  • 14:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 14:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T343718)', diff saved to https://phabricator.wikimedia.org/P51924 and previous config saved to /var/cache/conftool/dbconfig/20230829-145705-ladsgroup.json
  • 14:57 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1028.eqiad.wmnet
  • 14:56 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 14:56 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 14:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad
  • 14:56 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: sync
  • 14:55 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: sync
  • 14:55 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 14:55 jiji@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 14:54 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad
  • 14:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw1-f4-eqiad
  • 14:54 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 14:54 jiji@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T343718)', diff saved to https://phabricator.wikimedia.org/P51923 and previous config saved to /var/cache/conftool/dbconfig/20230829-145242-ladsgroup.json
  • 14:51 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cloudsw1-f4-eqiad
  • 14:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw1-e4-eqiad
  • 14:51 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on P{ncredir5002.eqsin.wmnet} and A:ncredir
  • 14:51 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on P{ncredir6002.drmrs.wmnet} and A:ncredir
  • 14:51 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1125.eqiad.wmnet with OS bullseye
  • 14:50 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1124.eqiad.wmnet with OS bullseye
  • 14:49 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cloudsw1-e4-eqiad
  • 14:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw1-d5-eqiad
  • 14:48 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1028.eqiad.wmnet
  • 14:48 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1021.eqiad.wmnet
  • 14:47 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-eqiad
  • 14:47 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on P{ncredir6002.drmrs.wmnet} and A:ncredir
  • 14:47 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cloudsw1-d5-eqiad
  • 14:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw1-c8-eqiad
  • 14:46 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on P{ncredir4002.ulsfo.wmnet} and A:ncredir
  • 14:46 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on P{ncredir5002.eqsin.wmnet} and A:ncredir
  • 14:46 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-eqiad
  • 14:46 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on P{ncredir2002.codfw.wmnet} and A:ncredir
  • 14:46 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on P{ncredir1002.eqiad.wmnet} and A:ncredir
  • 14:44 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cloudsw1-c8-eqiad
  • 14:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw1-b1-codfw
  • 14:44 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cloudsw1-b1-codfw
  • 14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P51922 and previous config saved to /var/cache/conftool/dbconfig/20230829-144349-ladsgroup.json
  • 14:42 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on P{ncredir4002.ulsfo.wmnet} and A:ncredir
  • 14:42 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on P{ncredir2002.codfw.wmnet} and A:ncredir
  • 14:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P51921 and previous config saved to /var/cache/conftool/dbconfig/20230829-144159-ladsgroup.json
  • 14:41 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on P{ncredir1002.eqiad.wmnet} and A:ncredir
  • 14:41 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on P{ncredir2001.codfw.wmnet} and A:ncredir
  • 14:41 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on P{ncredir6001.drmrs.wmnet} and A:ncredir
  • 14:39 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-codfw
  • 14:38 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1021.eqiad.wmnet
  • 14:38 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-codfw
  • 14:37 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on P{ncredir2001.codfw.wmnet} and A:ncredir
  • 14:37 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on P{ncredir6001.drmrs.wmnet} and A:ncredir
  • 14:36 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2025.codfw.wmnet with OS bullseye
  • 14:35 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on P{ncredir5001.eqsin.wmnet} and A:ncredir
  • 14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1214 (T343718)', diff saved to https://phabricator.wikimedia.org/P51920 and previous config saved to /var/cache/conftool/dbconfig/20230829-143434-ladsgroup.json
  • 14:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 14:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T343718)', diff saved to https://phabricator.wikimedia.org/P51919 and previous config saved to /var/cache/conftool/dbconfig/20230829-143413-ladsgroup.json
  • 14:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw2-22-ulsfo
  • 14:31 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on P{ncredir4001.ulsfo.wmnet} and A:ncredir
  • 14:31 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1018.eqiad.wmnet
  • 14:30 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device asw2-22-ulsfo
  • 14:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr4-ulsfo
  • 14:28 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on P{ncredir5001.eqsin.wmnet} and A:ncredir
  • 14:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T343718)', diff saved to https://phabricator.wikimedia.org/P51918 and previous config saved to /var/cache/conftool/dbconfig/20230829-142843-ladsgroup.json
  • 14:28 fabfur@cumin1001: END (ERROR) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=97) rolling reboot on P{ncredir5001.*} and A:ncredir
  • 14:28 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on P{ncredir5001.*} and A:ncredir
  • 14:28 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1018.eqiad.wmnet
  • 14:27 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on P{ncredir4001.ulsfo.wmnet} and A:ncredir
  • 14:27 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on P{ncredir1001.eqiad.wmnet} and A:ncredir
  • 14:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P51917 and previous config saved to /var/cache/conftool/dbconfig/20230829-142653-ladsgroup.json
  • 14:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1017.eqiad.wmnet
  • 14:25 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cr4-ulsfo
  • 14:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1017.eqiad.wmnet
  • 14:22 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on P{ncredir1001.eqiad.wmnet} and A:ncredir
  • 14:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1017.eqiad.wmnet
  • 14:19 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1018.eqiad.wmnet
  • 14:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P51916 and previous config saved to /var/cache/conftool/dbconfig/20230829-141907-ladsgroup.json
  • 14:14 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1017.eqiad.wmnet
  • 14:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T343718)', diff saved to https://phabricator.wikimedia.org/P51915 and previous config saved to /var/cache/conftool/dbconfig/20230829-141147-ladsgroup.json
  • 14:08 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
  • 14:08 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
  • 14:08 fabfur: start rebooting ncredir hosts for T344587
  • 14:07 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: sync
  • 14:07 elukey@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: sync
  • 14:06 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1017.eqiad.wmnet
  • 14:06 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
  • 14:06 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host restbase1017.eqiad.wmnet
  • 14:06 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1017.eqiad.wmnet
  • 14:06 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
  • 14:05 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1017.eqiad.wmnet
  • 14:05 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: sync
  • 14:05 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
  • 14:05 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host restbase1017.eqiad.wmnet
  • 14:05 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1017.eqiad.wmnet
  • 14:05 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
  • 14:05 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: sync
  • 14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P51914 and previous config saved to /var/cache/conftool/dbconfig/20230829-140400-ladsgroup.json
  • 13:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2025.codfw.wmnet with OS bullseye
  • 13:56 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: sync
  • 13:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2025.codfw.wmnet with OS bullseye
  • 13:55 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: sync
  • 13:53 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: sync
  • 13:53 elukey@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: sync
  • 13:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3318 (T343718)', diff saved to https://phabricator.wikimedia.org/P51913 and previous config saved to /var/cache/conftool/dbconfig/20230829-135236-ladsgroup.json
  • 13:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 13:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 13:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T343718)', diff saved to https://phabricator.wikimedia.org/P51912 and previous config saved to /var/cache/conftool/dbconfig/20230829-135214-ladsgroup.json
  • 13:49 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes2025']
  • 13:49 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2025']
  • 13:49 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['kubernetes2025']
  • 13:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T343718)', diff saved to https://phabricator.wikimedia.org/P51911 and previous config saved to /var/cache/conftool/dbconfig/20230829-134854-ladsgroup.json
  • 13:48 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2025']
  • 13:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1016.eqiad.wmnet
  • 13:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1016.eqiad.wmnet
  • 13:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes2025']
  • 13:44 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2025']
  • 13:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2025']
  • 13:44 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2025']
  • 13:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 13:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1016.eqiad.wmnet
  • 13:39 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 13:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes2025']
  • 13:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2025']
  • 13:38 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling reboot on A:swift-fe
  • 13:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P51910 and previous config saved to /var/cache/conftool/dbconfig/20230829-133708-ladsgroup.json
  • 13:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1211 (T343718)', diff saved to https://phabricator.wikimedia.org/P51909 and previous config saved to /var/cache/conftool/dbconfig/20230829-133018-ladsgroup.json
  • 13:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 13:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 13:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T343718)', diff saved to https://phabricator.wikimedia.org/P51908 and previous config saved to /var/cache/conftool/dbconfig/20230829-132957-ladsgroup.json
  • 13:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes2025']
  • 13:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2025']
  • 13:28 moritzm: installing openssl security updates on buster
  • 13:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes2025']
  • 13:26 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2025']
  • 13:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes2025']
  • 13:25 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2025']
  • 13:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 13:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P51907 and previous config saved to /var/cache/conftool/dbconfig/20230829-132202-ladsgroup.json
  • 13:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 13:21 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1016.eqiad.wmnet
  • 13:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1015.eqiad.wmnet
  • 13:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1015.eqiad.wmnet
  • 13:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1015.eqiad.wmnet
  • 13:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P51906 and previous config saved to /var/cache/conftool/dbconfig/20230829-131451-ladsgroup.json
  • 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T343718)', diff saved to https://phabricator.wikimedia.org/P51905 and previous config saved to /var/cache/conftool/dbconfig/20230829-130656-ladsgroup.json
  • 13:02 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1015.eqiad.wmnet
  • 12:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P51904 and previous config saved to /var/cache/conftool/dbconfig/20230829-125944-ladsgroup.json
  • 12:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet
  • 12:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1014.eqiad.wmnet
  • 12:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T343718)', diff saved to https://phabricator.wikimedia.org/P51903 and previous config saved to /var/cache/conftool/dbconfig/20230829-125844-ladsgroup.json
  • 12:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 12:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 12:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T343718)', diff saved to https://phabricator.wikimedia.org/P51902 and previous config saved to /var/cache/conftool/dbconfig/20230829-125823-ladsgroup.json
  • 12:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1014.eqiad.wmnet
  • 12:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3318 (T343718)', diff saved to https://phabricator.wikimedia.org/P51901 and previous config saved to /var/cache/conftool/dbconfig/20230829-124750-ladsgroup.json
  • 12:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 12:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 12:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T343718)', diff saved to https://phabricator.wikimedia.org/P51900 and previous config saved to /var/cache/conftool/dbconfig/20230829-124739-ladsgroup.json
  • 12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T343718)', diff saved to https://phabricator.wikimedia.org/P51899 and previous config saved to /var/cache/conftool/dbconfig/20230829-124438-ladsgroup.json
  • 12:44 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet
  • 12:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet
  • 12:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1013.eqiad.wmnet
  • 12:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P51898 and previous config saved to /var/cache/conftool/dbconfig/20230829-124317-ladsgroup.json
  • 12:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1013.eqiad.wmnet
  • 12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P51897 and previous config saved to /var/cache/conftool/dbconfig/20230829-123233-ladsgroup.json
  • 12:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P51896 and previous config saved to /var/cache/conftool/dbconfig/20230829-122811-ladsgroup.json
  • 12:27 filippo@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 12:24 filippo@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 12:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1209 (T343718)', diff saved to https://phabricator.wikimedia.org/P51894 and previous config saved to /var/cache/conftool/dbconfig/20230829-122403-ladsgroup.json
  • 12:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 12:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 12:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T343718)', diff saved to https://phabricator.wikimedia.org/P51893 and previous config saved to /var/cache/conftool/dbconfig/20230829-122342-ladsgroup.json
  • 12:23 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet
  • 12:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1012.eqiad.wmnet
  • 12:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1012.eqiad.wmnet
  • 12:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P51892 and previous config saved to /var/cache/conftool/dbconfig/20230829-121727-ladsgroup.json
  • 12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T343718)', diff saved to https://phabricator.wikimedia.org/P51891 and previous config saved to /var/cache/conftool/dbconfig/20230829-121305-ladsgroup.json
  • 12:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1012.eqiad.wmnet
  • 12:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P51890 and previous config saved to /var/cache/conftool/dbconfig/20230829-120835-ladsgroup.json
  • 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2140 (T343718)', diff saved to https://phabricator.wikimedia.org/P51889 and previous config saved to /var/cache/conftool/dbconfig/20230829-120603-ladsgroup.json
  • 12:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 12:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 12:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T343718)', diff saved to https://phabricator.wikimedia.org/P51888 and previous config saved to /var/cache/conftool/dbconfig/20230829-120221-ladsgroup.json
  • 12:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1012.eqiad.wmnet
  • 11:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P51887 and previous config saved to /var/cache/conftool/dbconfig/20230829-115329-ladsgroup.json
  • 11:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet
  • 11:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1011.eqiad.wmnet
  • 11:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1011.eqiad.wmnet
  • 11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2166 (T343718)', diff saved to https://phabricator.wikimedia.org/P51886 and previous config saved to /var/cache/conftool/dbconfig/20230829-114326-ladsgroup.json
  • 11:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 11:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T343718)', diff saved to https://phabricator.wikimedia.org/P51885 and previous config saved to /var/cache/conftool/dbconfig/20230829-114304-ladsgroup.json
  • 11:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T343718)', diff saved to https://phabricator.wikimedia.org/P51884 and previous config saved to /var/cache/conftool/dbconfig/20230829-113823-ladsgroup.json
  • 11:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P51883 and previous config saved to /var/cache/conftool/dbconfig/20230829-112758-ladsgroup.json
  • 11:24 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet
  • 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet
  • 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1010.eqiad.wmnet
  • 11:20 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetserver1002.eqiad.wmnet with OS bookworm
  • 11:20 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jbond@cumin1001"
  • 11:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1203 (T343718)', diff saved to https://phabricator.wikimedia.org/P51882 and previous config saved to /var/cache/conftool/dbconfig/20230829-111949-ladsgroup.json
  • 11:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 11:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 11:19 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jbond@cumin1001"
  • 11:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T343718)', diff saved to https://phabricator.wikimedia.org/P51881 and previous config saved to /var/cache/conftool/dbconfig/20230829-111927-ladsgroup.json
  • 11:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1010.eqiad.wmnet
  • 11:13 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet
  • 11:13 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1010.eqiad.wmnet
  • 11:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P51880 and previous config saved to /var/cache/conftool/dbconfig/20230829-111252-ladsgroup.json
  • 11:08 moritzm: installing nftables bugfix updates from Bullseye point release
  • 11:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P51879 and previous config saved to /var/cache/conftool/dbconfig/20230829-110421-ladsgroup.json
  • 11:02 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetserver1002.eqiad.wmnet with reason: host reimage
  • 10:59 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet
  • 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet
  • 10:59 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetserver1002.eqiad.wmnet with reason: host reimage
  • 10:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1009.eqiad.wmnet
  • 10:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T343718)', diff saved to https://phabricator.wikimedia.org/P51878 and previous config saved to /var/cache/conftool/dbconfig/20230829-105746-ladsgroup.json
  • 10:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr3-ulsfo
  • 10:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1009.eqiad.wmnet
  • 10:51 joal@deploy1002: Finished deploy [airflow-dags/analytics@90f280e]: Regular deploy of Analytics airflow dags [airflow-dags/analytics@90f280ec] (duration: 00m 14s)
  • 10:51 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cr3-ulsfo
  • 10:51 joal@deploy1002: Started deploy [airflow-dags/analytics@90f280e]: Regular deploy of Analytics airflow dags [airflow-dags/analytics@90f280ec]
  • 10:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P51877 and previous config saved to /var/cache/conftool/dbconfig/20230829-104915-ladsgroup.json
  • 10:47 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet
  • 10:42 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetserver1002.eqiad.wmnet with OS bookworm
  • 10:42 cgoubert@deploy1002: Finished scap: Removing mw-on-k8s tls-proxy CPU limits - T344814 (duration: 02m 27s)
  • 10:39 cgoubert@deploy1002: Started scap: Removing mw-on-k8s tls-proxy CPU limits - T344814
  • 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2020.codfw.wmnet
  • 10:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2165 (T343718)', diff saved to https://phabricator.wikimedia.org/P51876 and previous config saved to /var/cache/conftool/dbconfig/20230829-103901-ladsgroup.json
  • 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2020.codfw.wmnet
  • 10:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 10:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 10:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T343718)', diff saved to https://phabricator.wikimedia.org/P51875 and previous config saved to /var/cache/conftool/dbconfig/20230829-103840-ladsgroup.json
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T343718)', diff saved to https://phabricator.wikimedia.org/P51874 and previous config saved to /var/cache/conftool/dbconfig/20230829-103409-ladsgroup.json
  • 10:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2020.codfw.wmnet
  • 10:30 claime: Running puppet on deploy servers to bump envoy image version - T344814
  • 10:27 jynus: reboot db1204
  • 10:27 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1122.eqiad.wmnet with OS bullseye
  • 10:25 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1123.eqiad.wmnet with OS bullseye
  • 10:24 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P51873 and previous config saved to /var/cache/conftool/dbconfig/20230829-102333-ladsgroup.json
  • 10:22 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 10:22 jayme: Successfully published image docker-registry.discovery.wmnet/envoy:1.23.10-2-s2
  • 10:21 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 10:19 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 10:17 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2020.codfw.wmnet
  • 10:17 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 10:16 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 10:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1193 (T343718)', diff saved to https://phabricator.wikimedia.org/P51872 and previous config saved to /var/cache/conftool/dbconfig/20230829-101536-ladsgroup.json
  • 10:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 10:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 10:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T343718)', diff saved to https://phabricator.wikimedia.org/P51871 and previous config saved to /var/cache/conftool/dbconfig/20230829-101515-ladsgroup.json
  • 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P51870 and previous config saved to /var/cache/conftool/dbconfig/20230829-100827-ladsgroup.json
  • 10:04 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1122.eqiad.wmnet with reason: host reimage
  • 10:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 10:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T343718)', diff saved to https://phabricator.wikimedia.org/P51869 and previous config saved to /var/cache/conftool/dbconfig/20230829-100315-ladsgroup.json
  • 10:02 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1123.eqiad.wmnet with reason: host reimage
  • 10:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P51868 and previous config saved to /var/cache/conftool/dbconfig/20230829-100009-ladsgroup.json
  • 09:58 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1123.eqiad.wmnet with reason: host reimage
  • 09:57 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1122.eqiad.wmnet with reason: host reimage
  • 09:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T343718)', diff saved to https://phabricator.wikimedia.org/P51867 and previous config saved to /var/cache/conftool/dbconfig/20230829-095638-ladsgroup.json
  • 09:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 09:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 09:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T343718)', diff saved to https://phabricator.wikimedia.org/P51866 and previous config saved to /var/cache/conftool/dbconfig/20230829-095617-ladsgroup.json
  • 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T343718)', diff saved to https://phabricator.wikimedia.org/P51865 and previous config saved to /var/cache/conftool/dbconfig/20230829-095321-ladsgroup.json
  • 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P51864 and previous config saved to /var/cache/conftool/dbconfig/20230829-094809-ladsgroup.json
  • 09:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P51862 and previous config saved to /var/cache/conftool/dbconfig/20230829-094503-ladsgroup.json
  • 09:44 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1123.eqiad.wmnet with OS bullseye
  • 09:44 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1122.eqiad.wmnet with OS bullseye
  • 09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P51861 and previous config saved to /var/cache/conftool/dbconfig/20230829-094111-ladsgroup.json
  • 09:39 cgoubert@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox interface ID cr1-esams:xe-0/0/7
  • 09:39 cgoubert@cumin1001: START - Cookbook sre.network.debug for Netbox interface ID cr1-esams:xe-0/0/7
  • 09:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2164 (T343718)', diff saved to https://phabricator.wikimedia.org/P51860 and previous config saved to /var/cache/conftool/dbconfig/20230829-093539-ladsgroup.json
  • 09:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 09:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 09:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 09:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 09:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T343718)', diff saved to https://phabricator.wikimedia.org/P51859 and previous config saved to /var/cache/conftool/dbconfig/20230829-093513-ladsgroup.json
  • 09:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P51858 and previous config saved to /var/cache/conftool/dbconfig/20230829-093303-ladsgroup.json
  • 09:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T343718)', diff saved to https://phabricator.wikimedia.org/P51857 and previous config saved to /var/cache/conftool/dbconfig/20230829-092957-ladsgroup.json
  • 09:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 33
  • 09:28 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 33
  • 09:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P51856 and previous config saved to /var/cache/conftool/dbconfig/20230829-092605-ladsgroup.json
  • 09:22 moritzm: failover the ganeti master in codfw to ganeti2022
  • 09:22 jynus: restart db1205
  • 09:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P51855 and previous config saved to /var/cache/conftool/dbconfig/20230829-092007-ladsgroup.json
  • 09:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T343718)', diff saved to https://phabricator.wikimedia.org/P51854 and previous config saved to /var/cache/conftool/dbconfig/20230829-091756-ladsgroup.json
  • 09:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1192 (T343718)', diff saved to https://phabricator.wikimedia.org/P51853 and previous config saved to /var/cache/conftool/dbconfig/20230829-091223-ladsgroup.json
  • 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 09:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T343718)', diff saved to https://phabricator.wikimedia.org/P51852 and previous config saved to /var/cache/conftool/dbconfig/20230829-091202-ladsgroup.json
  • 09:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T343718)', diff saved to https://phabricator.wikimedia.org/P51851 and previous config saved to /var/cache/conftool/dbconfig/20230829-091059-ladsgroup.json
  • 09:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P51850 and previous config saved to /var/cache/conftool/dbconfig/20230829-090501-ladsgroup.json
  • 08:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P51849 and previous config saved to /var/cache/conftool/dbconfig/20230829-085656-ladsgroup.json
  • 08:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T343718)', diff saved to https://phabricator.wikimedia.org/P51848 and previous config saved to /var/cache/conftool/dbconfig/20230829-084955-ladsgroup.json
  • 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2032.codfw.wmnet
  • 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2032.codfw.wmnet
  • 08:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P51847 and previous config saved to /var/cache/conftool/dbconfig/20230829-084150-ladsgroup.json
  • 08:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2032.codfw.wmnet
  • 08:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2163 (T343718)', diff saved to https://phabricator.wikimedia.org/P51846 and previous config saved to /var/cache/conftool/dbconfig/20230829-083223-ladsgroup.json
  • 08:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 08:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 08:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T343718)', diff saved to https://phabricator.wikimedia.org/P51845 and previous config saved to /var/cache/conftool/dbconfig/20230829-083202-ladsgroup.json
  • 08:30 claime: Restarted grafana-ldap-users-sync.service on grafana1002
  • 08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T343718)', diff saved to https://phabricator.wikimedia.org/P51844 and previous config saved to /var/cache/conftool/dbconfig/20230829-082644-ladsgroup.json
  • 08:26 claime: downtiming cassandra-a alerts on restbase1030.eqiad.wmnet for 14 days T344210 T344259
  • 08:20 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2032.codfw.wmnet
  • 08:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P51843 and previous config saved to /var/cache/conftool/dbconfig/20230829-081655-ladsgroup.json
  • 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2031.codfw.wmnet
  • 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2031.codfw.wmnet
  • 08:09 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host contint2002.wikimedia.org
  • 08:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T343718)', diff saved to https://phabricator.wikimedia.org/P51842 and previous config saved to /var/cache/conftool/dbconfig/20230829-080828-ladsgroup.json
  • 08:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 08:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 08:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T343718)', diff saved to https://phabricator.wikimedia.org/P51841 and previous config saved to /var/cache/conftool/dbconfig/20230829-080807-ladsgroup.json
  • 08:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2031.codfw.wmnet
  • 08:03 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host contint2002.wikimedia.org
  • 08:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P51840 and previous config saved to /var/cache/conftool/dbconfig/20230829-080149-ladsgroup.json
  • 07:54 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1121.eqiad.wmnet with OS bullseye
  • 07:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P51839 and previous config saved to /var/cache/conftool/dbconfig/20230829-075301-ladsgroup.json
  • 07:51 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1120.eqiad.wmnet with OS bullseye
  • 07:49 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2031.codfw.wmnet
  • 07:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2030.codfw.wmnet
  • 07:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2030.codfw.wmnet
  • 07:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2014.codfw.wmnet
  • 07:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T343718)', diff saved to https://phabricator.wikimedia.org/P51838 and previous config saved to /var/cache/conftool/dbconfig/20230829-074643-ladsgroup.json
  • 07:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2030.codfw.wmnet
  • 07:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2014.codfw.wmnet
  • 07:39 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2030.codfw.wmnet
  • 07:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P51837 and previous config saved to /var/cache/conftool/dbconfig/20230829-073755-ladsgroup.json
  • 07:37 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2029.codfw.wmnet
  • 07:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2029.codfw.wmnet
  • 07:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2029.codfw.wmnet
  • 07:31 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1121.eqiad.wmnet with reason: host reimage
  • 07:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2162 (T343718)', diff saved to https://phabricator.wikimedia.org/P51836 and previous config saved to /var/cache/conftool/dbconfig/20230829-072853-ladsgroup.json
  • 07:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2029.codfw.wmnet
  • 07:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 07:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T343718)', diff saved to https://phabricator.wikimedia.org/P51835 and previous config saved to /var/cache/conftool/dbconfig/20230829-072832-ladsgroup.json
  • 07:28 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1120.eqiad.wmnet with reason: host reimage
  • 07:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: WIP
  • 07:26 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: WIP
  • 07:26 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1121.eqiad.wmnet with reason: host reimage
  • 07:25 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1120.eqiad.wmnet with reason: host reimage
  • 07:24 kartik@deploy1002: Finished scap: Backport for Enable Content and Section translation in Ligurian Wikipedia (T337669) (duration: 21m 02s)
  • 07:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2028.codfw.wmnet
  • 07:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2028.codfw.wmnet
  • 07:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T343718)', diff saved to https://phabricator.wikimedia.org/P51834 and previous config saved to /var/cache/conftool/dbconfig/20230829-072249-ladsgroup.json
  • 07:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
  • 07:14 kartik@deploy1002: kartik: Continuing with sync
  • 07:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P51833 and previous config saved to /var/cache/conftool/dbconfig/20230829-071326-ladsgroup.json
  • 07:12 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1121.eqiad.wmnet with OS bullseye
  • 07:12 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1120.eqiad.wmnet with OS bullseye
  • 07:07 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2028.codfw.wmnet
  • 07:07 kartik@deploy1002: kartik: Backport for Enable Content and Section translation in Ligurian Wikipedia (T337669) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T343718)', diff saved to https://phabricator.wikimedia.org/P51832 and previous config saved to /var/cache/conftool/dbconfig/20230829-070443-ladsgroup.json
  • 07:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2027.codfw.wmnet
  • 07:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 07:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 07:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T343718)', diff saved to https://phabricator.wikimedia.org/P51831 and previous config saved to /var/cache/conftool/dbconfig/20230829-070422-ladsgroup.json
  • 07:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2027.codfw.wmnet
  • 07:03 kartik@deploy1002: Started scap: Backport for Enable Content and Section translation in Ligurian Wikipedia (T337669)
  • 06:58 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1119.eqiad.wmnet with OS bullseye
  • 06:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P51830 and previous config saved to /var/cache/conftool/dbconfig/20230829-065819-ladsgroup.json
  • 06:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2027.codfw.wmnet
  • 06:57 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1118.eqiad.wmnet with OS bullseye
  • 06:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T343718)', diff saved to https://phabricator.wikimedia.org/P51829 and previous config saved to /var/cache/conftool/dbconfig/20230829-065525-ladsgroup.json
  • 06:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 06:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 06:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T343718)', diff saved to https://phabricator.wikimedia.org/P51828 and previous config saved to /var/cache/conftool/dbconfig/20230829-065515-ladsgroup.json
  • 06:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2027.codfw.wmnet
  • 06:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3314 (T343718)', diff saved to https://phabricator.wikimedia.org/P51827 and previous config saved to /var/cache/conftool/dbconfig/20230829-065059-ladsgroup.json
  • 06:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 06:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 06:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T343718)', diff saved to https://phabricator.wikimedia.org/P51826 and previous config saved to /var/cache/conftool/dbconfig/20230829-065038-ladsgroup.json
  • 06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P51825 and previous config saved to /var/cache/conftool/dbconfig/20230829-064916-ladsgroup.json
  • 06:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T343718)', diff saved to https://phabricator.wikimedia.org/P51824 and previous config saved to /var/cache/conftool/dbconfig/20230829-064313-ladsgroup.json
  • 06:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P51823 and previous config saved to /var/cache/conftool/dbconfig/20230829-064009-ladsgroup.json
  • 06:36 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1119.eqiad.wmnet with reason: host reimage
  • 06:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P51822 and previous config saved to /var/cache/conftool/dbconfig/20230829-063531-ladsgroup.json
  • 06:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P51821 and previous config saved to /var/cache/conftool/dbconfig/20230829-063410-ladsgroup.json
  • 06:33 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1118.eqiad.wmnet with reason: host reimage
  • 06:31 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1119.eqiad.wmnet with reason: host reimage
  • 06:30 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1118.eqiad.wmnet with reason: host reimage
  • 06:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2154 (T343718)', diff saved to https://phabricator.wikimedia.org/P51820 and previous config saved to /var/cache/conftool/dbconfig/20230829-062532-ladsgroup.json
  • 06:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 06:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 06:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T343718)', diff saved to https://phabricator.wikimedia.org/P51819 and previous config saved to /var/cache/conftool/dbconfig/20230829-062511-ladsgroup.json
  • 06:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P51818 and previous config saved to /var/cache/conftool/dbconfig/20230829-062502-ladsgroup.json
  • 06:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P51817 and previous config saved to /var/cache/conftool/dbconfig/20230829-062025-ladsgroup.json
  • 06:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T343718)', diff saved to https://phabricator.wikimedia.org/P51816 and previous config saved to /var/cache/conftool/dbconfig/20230829-061904-ladsgroup.json
  • 06:17 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1119.eqiad.wmnet with OS bullseye
  • 06:17 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1118.eqiad.wmnet with OS bullseye
  • 06:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P51815 and previous config saved to /var/cache/conftool/dbconfig/20230829-061005-ladsgroup.json
  • 06:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T343718)', diff saved to https://phabricator.wikimedia.org/P51814 and previous config saved to /var/cache/conftool/dbconfig/20230829-060956-ladsgroup.json
  • 06:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T343718)', diff saved to https://phabricator.wikimedia.org/P51813 and previous config saved to /var/cache/conftool/dbconfig/20230829-060519-ladsgroup.json
  • 06:04 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1117.eqiad.wmnet with OS bullseye
  • 06:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T343718)', diff saved to https://phabricator.wikimedia.org/P51812 and previous config saved to /var/cache/conftool/dbconfig/20230829-060047-ladsgroup.json
  • 06:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 06:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 05:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P51811 and previous config saved to /var/cache/conftool/dbconfig/20230829-055459-ladsgroup.json
  • 05:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 05:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 05:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T343718)', diff saved to https://phabricator.wikimedia.org/P51810 and previous config saved to /var/cache/conftool/dbconfig/20230829-054405-ladsgroup.json
  • 05:42 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1117.eqiad.wmnet with reason: host reimage
  • 05:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T343718)', diff saved to https://phabricator.wikimedia.org/P51809 and previous config saved to /var/cache/conftool/dbconfig/20230829-053953-ladsgroup.json
  • 05:39 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1117.eqiad.wmnet with reason: host reimage
  • 05:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P51808 and previous config saved to /var/cache/conftool/dbconfig/20230829-052859-ladsgroup.json
  • 05:26 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1117.eqiad.wmnet with OS bullseye
  • 05:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2152 (T343718)', diff saved to https://phabricator.wikimedia.org/P51807 and previous config saved to /var/cache/conftool/dbconfig/20230829-052222-ladsgroup.json
  • 05:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 05:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 05:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P51806 and previous config saved to /var/cache/conftool/dbconfig/20230829-051353-ladsgroup.json
  • 05:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 05:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 04:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T343718)', diff saved to https://phabricator.wikimedia.org/P51805 and previous config saved to /var/cache/conftool/dbconfig/20230829-045847-ladsgroup.json
  • 04:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 04:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T343718)', diff saved to https://phabricator.wikimedia.org/P51804 and previous config saved to /var/cache/conftool/dbconfig/20230829-044049-ladsgroup.json
  • 04:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 04:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 04:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 04:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 04:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 03:46 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes2025']
  • 03:46 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2025']
  • 03:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T343718)', diff saved to https://phabricator.wikimedia.org/P51803 and previous config saved to /var/cache/conftool/dbconfig/20230829-034540-ladsgroup.json
  • 03:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3314 (T343718)', diff saved to https://phabricator.wikimedia.org/P51802 and previous config saved to /var/cache/conftool/dbconfig/20230829-034530-ladsgroup.json
  • 03:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 03:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 03:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 03:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 03:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T343718)', diff saved to https://phabricator.wikimedia.org/P51801 and previous config saved to /var/cache/conftool/dbconfig/20230829-034509-ladsgroup.json
  • 03:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P51800 and previous config saved to /var/cache/conftool/dbconfig/20230829-033002-ladsgroup.json
  • 03:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P51799 and previous config saved to /var/cache/conftool/dbconfig/20230829-031456-ladsgroup.json
  • 03:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be2003.codfw.wmnet with OS bullseye
  • 03:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 03:08 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T343718)', diff saved to https://phabricator.wikimedia.org/P51798 and previous config saved to /var/cache/conftool/dbconfig/20230829-025950-ladsgroup.json
  • 02:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be2003.codfw.wmnet with reason: host reimage
  • 02:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be2003.codfw.wmnet with reason: host reimage
  • 02:01 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bullseye
  • 01:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 01:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 01:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T343718)', diff saved to https://phabricator.wikimedia.org/P51797 and previous config saved to /var/cache/conftool/dbconfig/20230829-014452-ladsgroup.json
  • 01:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P51796 and previous config saved to /var/cache/conftool/dbconfig/20230829-012946-ladsgroup.json
  • 01:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P51795 and previous config saved to /var/cache/conftool/dbconfig/20230829-011440-ladsgroup.json
  • 00:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T343718)', diff saved to https://phabricator.wikimedia.org/P51794 and previous config saved to /var/cache/conftool/dbconfig/20230829-005933-ladsgroup.json
  • 00:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 00:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 00:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T343718)', diff saved to https://phabricator.wikimedia.org/P51793 and previous config saved to /var/cache/conftool/dbconfig/20230829-004925-ladsgroup.json
  • 00:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T343718)', diff saved to https://phabricator.wikimedia.org/P51792 and previous config saved to /var/cache/conftool/dbconfig/20230829-004217-ladsgroup.json
  • 00:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 00:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 00:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T343718)', diff saved to https://phabricator.wikimedia.org/P51791 and previous config saved to /var/cache/conftool/dbconfig/20230829-004207-ladsgroup.json
  • 00:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P51790 and previous config saved to /var/cache/conftool/dbconfig/20230829-003418-ladsgroup.json
  • 00:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P51789 and previous config saved to /var/cache/conftool/dbconfig/20230829-002700-ladsgroup.json
  • 00:22 eileen: civicrm upgraded from 6a2cdf10 to d13e6e0c
  • 00:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P51788 and previous config saved to /var/cache/conftool/dbconfig/20230829-001912-ladsgroup.json
  • 00:17 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 00:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P51787 and previous config saved to /var/cache/conftool/dbconfig/20230829-001154-ladsgroup.json
  • 00:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T343718)', diff saved to https://phabricator.wikimedia.org/P51786 and previous config saved to /var/cache/conftool/dbconfig/20230829-000406-ladsgroup.json

2023-08-28

  • 23:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T343718)', diff saved to https://phabricator.wikimedia.org/P51785 and previous config saved to /var/cache/conftool/dbconfig/20230828-235648-ladsgroup.json
  • 23:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1219 (T343718)', diff saved to https://phabricator.wikimedia.org/P51784 and previous config saved to /var/cache/conftool/dbconfig/20230828-232344-ladsgroup.json
  • 23:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 23:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 23:20 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
  • 23:11 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 23:11 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
  • 23:11 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 22:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 22:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 22:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T343718)', diff saved to https://phabricator.wikimedia.org/P51783 and previous config saved to /var/cache/conftool/dbconfig/20230828-225306-ladsgroup.json
  • 22:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P51782 and previous config saved to /var/cache/conftool/dbconfig/20230828-223800-ladsgroup.json
  • 22:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T343718)', diff saved to https://phabricator.wikimedia.org/P51781 and previous config saved to /var/cache/conftool/dbconfig/20230828-223740-ladsgroup.json
  • 22:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 22:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 22:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T343718)', diff saved to https://phabricator.wikimedia.org/P51780 and previous config saved to /var/cache/conftool/dbconfig/20230828-223719-ladsgroup.json
  • 22:33 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
  • 22:32 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 22:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P51779 and previous config saved to /var/cache/conftool/dbconfig/20230828-222254-ladsgroup.json
  • 22:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P51778 and previous config saved to /var/cache/conftool/dbconfig/20230828-222212-ladsgroup.json
  • 22:14 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
  • 22:14 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 22:14 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
  • 22:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T343718)', diff saved to https://phabricator.wikimedia.org/P51777 and previous config saved to /var/cache/conftool/dbconfig/20230828-220747-ladsgroup.json
  • 22:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P51776 and previous config saved to /var/cache/conftool/dbconfig/20230828-220706-ladsgroup.json
  • 21:57 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@16e3dcf]: T337296 restart services for new federation endpoint (duration: 01m 12s)
  • 21:56 ryankemper@deploy1002: Started deploy [wdqs/wdqs@16e3dcf]: T337296 restart services for new federation endpoint
  • 21:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T343718)', diff saved to https://phabricator.wikimedia.org/P51775 and previous config saved to /var/cache/conftool/dbconfig/20230828-215200-ladsgroup.json
  • 21:45 bking@cumin1001: conftool action : set/pooled=yes; selector: name=wdqs1004.eqiad.wmnet
  • 21:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T343718)', diff saved to https://phabricator.wikimedia.org/P51774 and previous config saved to /var/cache/conftool/dbconfig/20230828-214409-ladsgroup.json
  • 21:32 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 21:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P51773 and previous config saved to /var/cache/conftool/dbconfig/20230828-212903-ladsgroup.json
  • 21:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1207 (T343718)', diff saved to https://phabricator.wikimedia.org/P51772 and previous config saved to /var/cache/conftool/dbconfig/20230828-212733-ladsgroup.json
  • 21:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 21:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 21:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T343718)', diff saved to https://phabricator.wikimedia.org/P51771 and previous config saved to /var/cache/conftool/dbconfig/20230828-212712-ladsgroup.json
  • 21:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2119 (T343718)', diff saved to https://phabricator.wikimedia.org/P51770 and previous config saved to /var/cache/conftool/dbconfig/20230828-212647-ladsgroup.json
  • 21:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 21:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 21:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T343718)', diff saved to https://phabricator.wikimedia.org/P51769 and previous config saved to /var/cache/conftool/dbconfig/20230828-212637-ladsgroup.json
  • 21:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P51768 and previous config saved to /var/cache/conftool/dbconfig/20230828-211357-ladsgroup.json
  • 21:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P51767 and previous config saved to /var/cache/conftool/dbconfig/20230828-211206-ladsgroup.json
  • 21:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P51766 and previous config saved to /var/cache/conftool/dbconfig/20230828-211131-ladsgroup.json
  • 20:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T343718)', diff saved to https://phabricator.wikimedia.org/P51765 and previous config saved to /var/cache/conftool/dbconfig/20230828-205851-ladsgroup.json
  • 20:58 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 20:58 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 20:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P51764 and previous config saved to /var/cache/conftool/dbconfig/20230828-205700-ladsgroup.json
  • 20:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P51763 and previous config saved to /var/cache/conftool/dbconfig/20230828-205625-ladsgroup.json
  • 20:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T343718)', diff saved to https://phabricator.wikimedia.org/P51762 and previous config saved to /var/cache/conftool/dbconfig/20230828-204153-ladsgroup.json
  • 20:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T343718)', diff saved to https://phabricator.wikimedia.org/P51761 and previous config saved to /var/cache/conftool/dbconfig/20230828-204119-ladsgroup.json
  • 20:27 urandom: clear pre-upgrade aqs snapshots — T339299
  • 20:25 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 20:25 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 20:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T343718)', diff saved to https://phabricator.wikimedia.org/P51760 and previous config saved to /var/cache/conftool/dbconfig/20230828-202206-ladsgroup.json
  • 20:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 20:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 20:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T343718)', diff saved to https://phabricator.wikimedia.org/P51759 and previous config saved to /var/cache/conftool/dbconfig/20230828-202145-ladsgroup.json
  • 20:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P51758 and previous config saved to /var/cache/conftool/dbconfig/20230828-200639-ladsgroup.json
  • 20:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1206 (T343718)', diff saved to https://phabricator.wikimedia.org/P51757 and previous config saved to /var/cache/conftool/dbconfig/20230828-200415-ladsgroup.json
  • 20:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 20:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 20:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T343718)', diff saved to https://phabricator.wikimedia.org/P51756 and previous config saved to /var/cache/conftool/dbconfig/20230828-200354-ladsgroup.json
  • 19:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P51755 and previous config saved to /var/cache/conftool/dbconfig/20230828-195132-ladsgroup.json
  • 19:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P51754 and previous config saved to /var/cache/conftool/dbconfig/20230828-194848-ladsgroup.json
  • 19:38 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:38 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host flink-zk2001.codfw.wmnet with OS bookworm
  • 19:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T343718)', diff saved to https://phabricator.wikimedia.org/P51753 and previous config saved to /var/cache/conftool/dbconfig/20230828-193626-ladsgroup.json
  • 19:36 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1004.eqiad.wmnet with OS bullseye
  • 19:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T343718)', diff saved to https://phabricator.wikimedia.org/P51752 and previous config saved to /var/cache/conftool/dbconfig/20230828-193511-ladsgroup.json
  • 19:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 19:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 19:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T343718)', diff saved to https://phabricator.wikimedia.org/P51751 and previous config saved to /var/cache/conftool/dbconfig/20230828-193501-ladsgroup.json
  • 19:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P51750 and previous config saved to /var/cache/conftool/dbconfig/20230828-193342-ladsgroup.json
  • 19:33 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 19:33 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 19:25 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 19:25 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 19:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P51749 and previous config saved to /var/cache/conftool/dbconfig/20230828-191955-ladsgroup.json
  • 19:19 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 19:19 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 19:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T343718)', diff saved to https://phabricator.wikimedia.org/P51748 and previous config saved to /var/cache/conftool/dbconfig/20230828-191836-ladsgroup.json
  • 19:18 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 19:18 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 19:15 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1004.eqiad.wmnet with reason: host reimage
  • 19:13 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1004.eqiad.wmnet with reason: host reimage
  • 19:09 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk2001.codfw.wmnet with OS bookworm
  • 19:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P51747 and previous config saved to /var/cache/conftool/dbconfig/20230828-190449-ladsgroup.json
  • 19:01 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1004.eqiad.wmnet with OS bullseye
  • 18:59 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1004.eqiad.wmnet
  • 18:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T343718)', diff saved to https://phabricator.wikimedia.org/P51746 and previous config saved to /var/cache/conftool/dbconfig/20230828-185924-ladsgroup.json
  • 18:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 18:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 18:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T343718)', diff saved to https://phabricator.wikimedia.org/P51745 and previous config saved to /var/cache/conftool/dbconfig/20230828-185903-ladsgroup.json
  • 18:57 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 18:57 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 18:55 inflatador: bking@cumin1001 depool wdqs1004 for firmware update
  • 18:53 bking@cumin1001: conftool action : set/pooled=no; selector: name=wdqs1004.eqiad.wmnet
  • 18:52 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs1004.eqiad.wmnet
  • 18:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T343718)', diff saved to https://phabricator.wikimedia.org/P51744 and previous config saved to /var/cache/conftool/dbconfig/20230828-184943-ladsgroup.json
  • 18:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P51743 and previous config saved to /var/cache/conftool/dbconfig/20230828-184357-ladsgroup.json
  • 18:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T344589)', diff saved to https://phabricator.wikimedia.org/P51742 and previous config saved to /var/cache/conftool/dbconfig/20230828-184149-ladsgroup.json
  • 18:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-be2003.codfw.wmnet with OS bullseye
  • 18:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P51741 and previous config saved to /var/cache/conftool/dbconfig/20230828-182851-ladsgroup.json
  • 18:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P51740 and previous config saved to /var/cache/conftool/dbconfig/20230828-182642-ladsgroup.json
  • 18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1196 (T343718)', diff saved to https://phabricator.wikimedia.org/P51739 and previous config saved to /var/cache/conftool/dbconfig/20230828-182104-ladsgroup.json
  • 18:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 18:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 18:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 18:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 18:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T343718)', diff saved to https://phabricator.wikimedia.org/P51738 and previous config saved to /var/cache/conftool/dbconfig/20230828-182025-ladsgroup.json
  • 18:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2110 (T343718)', diff saved to https://phabricator.wikimedia.org/P51737 and previous config saved to /var/cache/conftool/dbconfig/20230828-181427-ladsgroup.json
  • 18:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 18:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 18:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T343718)', diff saved to https://phabricator.wikimedia.org/P51736 and previous config saved to /var/cache/conftool/dbconfig/20230828-181417-ladsgroup.json
  • 18:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T343718)', diff saved to https://phabricator.wikimedia.org/P51735 and previous config saved to /var/cache/conftool/dbconfig/20230828-181344-ladsgroup.json
  • 18:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P51734 and previous config saved to /var/cache/conftool/dbconfig/20230828-181136-ladsgroup.json
  • 18:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P51733 and previous config saved to /var/cache/conftool/dbconfig/20230828-180519-ladsgroup.json
  • 17:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P51732 and previous config saved to /var/cache/conftool/dbconfig/20230828-175911-ladsgroup.json
  • 17:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T344589)', diff saved to https://phabricator.wikimedia.org/P51731 and previous config saved to /var/cache/conftool/dbconfig/20230828-175630-ladsgroup.json
  • 17:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T344589)', diff saved to https://phabricator.wikimedia.org/P51730 and previous config saved to /var/cache/conftool/dbconfig/20230828-175151-ladsgroup.json
  • 17:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1224 (T344589)', diff saved to https://phabricator.wikimedia.org/P51729 and previous config saved to /var/cache/conftool/dbconfig/20230828-175019-ladsgroup.json
  • 17:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 17:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P51728 and previous config saved to /var/cache/conftool/dbconfig/20230828-175013-ladsgroup.json
  • 17:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 17:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T344589)', diff saved to https://phabricator.wikimedia.org/P51727 and previous config saved to /var/cache/conftool/dbconfig/20230828-174954-ladsgroup.json
  • 17:46 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bullseye
  • 17:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P51726 and previous config saved to /var/cache/conftool/dbconfig/20230828-174404-ladsgroup.json
  • 17:39 taavi@deploy1002: Finished scap: Backport for Set OATHAuth multiple devices WRITE_BOTH for all privates (T242031), Set OATHAuth multiple devices READ_NEW for checkuser, techconduct (T242031) (duration: 07m 41s)
  • 17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T343718)', diff saved to https://phabricator.wikimedia.org/P51725 and previous config saved to /var/cache/conftool/dbconfig/20230828-173726-ladsgroup.json
  • 17:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 17:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 17:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 17:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 17:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T343718)', diff saved to https://phabricator.wikimedia.org/P51724 and previous config saved to /var/cache/conftool/dbconfig/20230828-173650-ladsgroup.json
  • 17:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P51723 and previous config saved to /var/cache/conftool/dbconfig/20230828-173645-ladsgroup.json
  • 17:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T343718)', diff saved to https://phabricator.wikimedia.org/P51722 and previous config saved to /var/cache/conftool/dbconfig/20230828-173506-ladsgroup.json
  • 17:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P51721 and previous config saved to /var/cache/conftool/dbconfig/20230828-173448-ladsgroup.json
  • 17:34 taavi@deploy1002: taavi: Continuing with sync
  • 17:33 taavi@deploy1002: taavi: Backport for Set OATHAuth multiple devices WRITE_BOTH for all privates (T242031), Set OATHAuth multiple devices READ_NEW for checkuser, techconduct (T242031) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD opti
  • 17:32 taavi@deploy1002: Started scap: Backport for Set OATHAuth multiple devices WRITE_BOTH for all privates (T242031), Set OATHAuth multiple devices READ_NEW for checkuser, techconduct (T242031)
  • 17:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T343718)', diff saved to https://phabricator.wikimedia.org/P51720 and previous config saved to /var/cache/conftool/dbconfig/20230828-172858-ladsgroup.json
  • 17:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P51719 and previous config saved to /var/cache/conftool/dbconfig/20230828-172143-ladsgroup.json
  • 17:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P51718 and previous config saved to /var/cache/conftool/dbconfig/20230828-172138-ladsgroup.json
  • 17:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P51717 and previous config saved to /var/cache/conftool/dbconfig/20230828-171942-ladsgroup.json
  • 17:17 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-be2003.codfw.wmnet with OS bullseye
  • 17:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P51716 and previous config saved to /var/cache/conftool/dbconfig/20230828-170637-ladsgroup.json
  • 17:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T344589)', diff saved to https://phabricator.wikimedia.org/P51715 and previous config saved to /var/cache/conftool/dbconfig/20230828-170632-ladsgroup.json
  • 17:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T344589)', diff saved to https://phabricator.wikimedia.org/P51714 and previous config saved to /var/cache/conftool/dbconfig/20230828-170435-ladsgroup.json
  • 17:00 inflatador: bking@cumin1001 depool wdqs1005 for decom T344198
  • 17:00 bking@cumin1001: conftool action : set/pooled=no; selector: name=wdqs1005.eqiad.wmnet
  • 16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T344589)', diff saved to https://phabricator.wikimedia.org/P51713 and previous config saved to /var/cache/conftool/dbconfig/20230828-165906-ladsgroup.json
  • 16:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 16:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1186 (T343718)', diff saved to https://phabricator.wikimedia.org/P51712 and previous config saved to /var/cache/conftool/dbconfig/20230828-165846-ladsgroup.json
  • 16:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T344589)', diff saved to https://phabricator.wikimedia.org/P51711 and previous config saved to /var/cache/conftool/dbconfig/20230828-165839-ladsgroup.json
  • 16:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T343718)', diff saved to https://phabricator.wikimedia.org/P51710 and previous config saved to /var/cache/conftool/dbconfig/20230828-165824-ladsgroup.json
  • 16:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T344589)', diff saved to https://phabricator.wikimedia.org/P51709 and previous config saved to /var/cache/conftool/dbconfig/20230828-165730-ladsgroup.json
  • 16:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 16:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 16:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T344589)', diff saved to https://phabricator.wikimedia.org/P51708 and previous config saved to /var/cache/conftool/dbconfig/20230828-165706-ladsgroup.json
  • 16:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T343718)', diff saved to https://phabricator.wikimedia.org/P51707 and previous config saved to /var/cache/conftool/dbconfig/20230828-165131-ladsgroup.json
  • 16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T343718)', diff saved to https://phabricator.wikimedia.org/P51706 and previous config saved to /var/cache/conftool/dbconfig/20230828-164359-ladsgroup.json
  • 16:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 16:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T343718)', diff saved to https://phabricator.wikimedia.org/P51705 and previous config saved to /var/cache/conftool/dbconfig/20230828-164349-ladsgroup.json
  • 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P51704 and previous config saved to /var/cache/conftool/dbconfig/20230828-164332-ladsgroup.json
  • 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P51703 and previous config saved to /var/cache/conftool/dbconfig/20230828-164318-ladsgroup.json
  • 16:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P51702 and previous config saved to /var/cache/conftool/dbconfig/20230828-164200-ladsgroup.json
  • 16:32 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 16:32 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 16:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bullseye
  • 16:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P51701 and previous config saved to /var/cache/conftool/dbconfig/20230828-162843-ladsgroup.json
  • 16:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P51700 and previous config saved to /var/cache/conftool/dbconfig/20230828-162826-ladsgroup.json
  • 16:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P51699 and previous config saved to /var/cache/conftool/dbconfig/20230828-162812-ladsgroup.json
  • 16:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P51698 and previous config saved to /var/cache/conftool/dbconfig/20230828-162654-ladsgroup.json
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1029 (T344589)', diff saved to https://phabricator.wikimedia.org/P51697 and previous config saved to /var/cache/conftool/dbconfig/20230828-162005-ladsgroup.json
  • 16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T343718)', diff saved to https://phabricator.wikimedia.org/P51696 and previous config saved to /var/cache/conftool/dbconfig/20230828-161406-ladsgroup.json
  • 16:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 16:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T343718)', diff saved to https://phabricator.wikimedia.org/P51695 and previous config saved to /var/cache/conftool/dbconfig/20230828-161345-ladsgroup.json
  • 16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P51694 and previous config saved to /var/cache/conftool/dbconfig/20230828-161337-ladsgroup.json
  • 16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T344589)', diff saved to https://phabricator.wikimedia.org/P51693 and previous config saved to /var/cache/conftool/dbconfig/20230828-161320-ladsgroup.json
  • 16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T343718)', diff saved to https://phabricator.wikimedia.org/P51692 and previous config saved to /var/cache/conftool/dbconfig/20230828-161306-ladsgroup.json
  • 16:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T344589)', diff saved to https://phabricator.wikimedia.org/P51691 and previous config saved to /var/cache/conftool/dbconfig/20230828-161147-ladsgroup.json
  • 16:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T344589)', diff saved to https://phabricator.wikimedia.org/P51690 and previous config saved to /var/cache/conftool/dbconfig/20230828-160709-ladsgroup.json
  • 16:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 16:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 16:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 16:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T344589)', diff saved to https://phabricator.wikimedia.org/P51689 and previous config saved to /var/cache/conftool/dbconfig/20230828-160639-ladsgroup.json
  • 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T344589)', diff saved to https://phabricator.wikimedia.org/P51688 and previous config saved to /var/cache/conftool/dbconfig/20230828-160546-ladsgroup.json
  • 16:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 16:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T344589)', diff saved to https://phabricator.wikimedia.org/P51687 and previous config saved to /var/cache/conftool/dbconfig/20230828-160522-ladsgroup.json
  • 16:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P51686 and previous config saved to /var/cache/conftool/dbconfig/20230828-160459-ladsgroup.json
  • 15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P51685 and previous config saved to /var/cache/conftool/dbconfig/20230828-155839-ladsgroup.json
  • 15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T343718)', diff saved to https://phabricator.wikimedia.org/P51684 and previous config saved to /var/cache/conftool/dbconfig/20230828-155830-ladsgroup.json
  • 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P51683 and previous config saved to /var/cache/conftool/dbconfig/20230828-155133-ladsgroup.json
  • 15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P51682 and previous config saved to /var/cache/conftool/dbconfig/20230828-155016-ladsgroup.json
  • 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P51681 and previous config saved to /var/cache/conftool/dbconfig/20230828-154953-ladsgroup.json
  • 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P51680 and previous config saved to /var/cache/conftool/dbconfig/20230828-154327-ladsgroup.json
  • 15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T343718)', diff saved to https://phabricator.wikimedia.org/P51679 and previous config saved to /var/cache/conftool/dbconfig/20230828-153655-ladsgroup.json
  • 15:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 15:36 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['moss-be2003']
  • 15:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T343718)', diff saved to https://phabricator.wikimedia.org/P51678 and previous config saved to /var/cache/conftool/dbconfig/20230828-153634-ladsgroup.json
  • 15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P51677 and previous config saved to /var/cache/conftool/dbconfig/20230828-153627-ladsgroup.json
  • 15:36 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['moss-be2003']
  • 15:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P51676 and previous config saved to /var/cache/conftool/dbconfig/20230828-153510-ladsgroup.json
  • 15:35 fabfur: enable puppet and start pybal on lvs6001 (T344587)
  • 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1029 (T344589)', diff saved to https://phabricator.wikimedia.org/P51675 and previous config saved to /var/cache/conftool/dbconfig/20230828-153447-ladsgroup.json
  • 15:34 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs6001.drmrs.wmnet
  • 15:32 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs6001.drmrs.wmnet
  • 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1029 (T344589)', diff saved to https://phabricator.wikimedia.org/P51674 and previous config saved to /var/cache/conftool/dbconfig/20230828-152948-ladsgroup.json
  • 15:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance
  • 15:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance
  • 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1027 (T344589)', diff saved to https://phabricator.wikimedia.org/P51673 and previous config saved to /var/cache/conftool/dbconfig/20230828-152925-ladsgroup.json
  • 15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T343718)', diff saved to https://phabricator.wikimedia.org/P51672 and previous config saved to /var/cache/conftool/dbconfig/20230828-152820-ladsgroup.json
  • 15:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P51671 and previous config saved to /var/cache/conftool/dbconfig/20230828-152128-ladsgroup.json
  • 15:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T344589)', diff saved to https://phabricator.wikimedia.org/P51670 and previous config saved to /var/cache/conftool/dbconfig/20230828-152121-ladsgroup.json
  • 15:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T344589)', diff saved to https://phabricator.wikimedia.org/P51669 and previous config saved to /var/cache/conftool/dbconfig/20230828-152004-ladsgroup.json
  • 15:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:16 fabfur: disable puppet and stop pybal on lvs6001 for reboot (T344587)
  • 15:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2151 (T344589)', diff saved to https://phabricator.wikimedia.org/P51668 and previous config saved to /var/cache/conftool/dbconfig/20230828-151511-ladsgroup.json
  • 15:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 15:14 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:14 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T344589)', diff saved to https://phabricator.wikimedia.org/P51667 and previous config saved to /var/cache/conftool/dbconfig/20230828-151446-ladsgroup.json
  • 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1027', diff saved to https://phabricator.wikimedia.org/P51666 and previous config saved to /var/cache/conftool/dbconfig/20230828-151418-ladsgroup.json
  • 15:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T344589)', diff saved to https://phabricator.wikimedia.org/P51665 and previous config saved to /var/cache/conftool/dbconfig/20230828-151300-ladsgroup.json
  • 15:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 15:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 15:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T344589)', diff saved to https://phabricator.wikimedia.org/P51664 and previous config saved to /var/cache/conftool/dbconfig/20230828-151236-ladsgroup.json
  • 15:09 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:07 isaranto@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 15:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P51663 and previous config saved to /var/cache/conftool/dbconfig/20230828-150622-ladsgroup.json
  • 15:06 isaranto@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 15:05 isaranto@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P51662 and previous config saved to /var/cache/conftool/dbconfig/20230828-145940-ladsgroup.json
  • 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2106 (T343718)', diff saved to https://phabricator.wikimedia.org/P51661 and previous config saved to /var/cache/conftool/dbconfig/20230828-145921-ladsgroup.json
  • 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1027', diff saved to https://phabricator.wikimedia.org/P51660 and previous config saved to /var/cache/conftool/dbconfig/20230828-145912-ladsgroup.json
  • 14:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 14:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 14:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P51659 and previous config saved to /var/cache/conftool/dbconfig/20230828-145730-ladsgroup.json
  • 14:55 elukey@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-eqiad
  • 14:54 claime: bounced ferm.service on ml-serve1008
  • 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2026.codfw.wmnet
  • 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
  • 14:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T343718)', diff saved to https://phabricator.wikimedia.org/P51658 and previous config saved to /var/cache/conftool/dbconfig/20230828-145116-ladsgroup.json
  • 14:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T343718)', diff saved to https://phabricator.wikimedia.org/P51657 and previous config saved to /var/cache/conftool/dbconfig/20230828-144924-ladsgroup.json
  • 14:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 14:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 14:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T343718)', diff saved to https://phabricator.wikimedia.org/P51656 and previous config saved to /var/cache/conftool/dbconfig/20230828-144903-ladsgroup.json
  • 14:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
  • 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P51655 and previous config saved to /var/cache/conftool/dbconfig/20230828-144433-ladsgroup.json
  • 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1027 (T344589)', diff saved to https://phabricator.wikimedia.org/P51654 and previous config saved to /var/cache/conftool/dbconfig/20230828-144406-ladsgroup.json
  • 14:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P51653 and previous config saved to /var/cache/conftool/dbconfig/20230828-144224-ladsgroup.json
  • 14:40 fabfur: enable puppet and start pybal on lvs6002 (T344587)
  • 14:40 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs6002.drmrs.wmnet
  • 14:39 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2026.codfw.wmnet
  • 14:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1027 (T344589)', diff saved to https://phabricator.wikimedia.org/P51652 and previous config saved to /var/cache/conftool/dbconfig/20230828-143808-ladsgroup.json
  • 14:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1027.eqiad.wmnet with reason: Maintenance
  • 14:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1027.eqiad.wmnet with reason: Maintenance
  • 14:37 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs6002.drmrs.wmnet
  • 14:36 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host puppetserver1002.eqiad.wmnet with OS bookworm
  • 14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1030 (T344589)', diff saved to https://phabricator.wikimedia.org/P51651 and previous config saved to /var/cache/conftool/dbconfig/20230828-143453-ladsgroup.json
  • 14:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P51650 and previous config saved to /var/cache/conftool/dbconfig/20230828-143357-ladsgroup.json
  • 14:32 bblack: esams cp clusters: rolling restarts of varnish-frontend ~1h apart over the next ~8h, to apply memory sizing change from: https://gerrit.wikimedia.org/r/c/operations/puppet/+/952866/ (earlier run only did 1 host per cluster before we changed direction!)
  • 14:31 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1027.eqiad.wmnet
  • 14:31 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase1027.eqiad.wmnet
  • 14:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T344589)', diff saved to https://phabricator.wikimedia.org/P51649 and previous config saved to /var/cache/conftool/dbconfig/20230828-142927-ladsgroup.json
  • 14:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T344589)', diff saved to https://phabricator.wikimedia.org/P51648 and previous config saved to /var/cache/conftool/dbconfig/20230828-142718-ladsgroup.json
  • 14:25 claime: bounced ferm.service on ml-serve1007
  • 14:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2129 (T344589)', diff saved to https://phabricator.wikimedia.org/P51647 and previous config saved to /var/cache/conftool/dbconfig/20230828-142105-ladsgroup.json
  • 14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1173 (T344589)', diff saved to https://phabricator.wikimedia.org/P51646 and previous config saved to /var/cache/conftool/dbconfig/20230828-142056-ladsgroup.json
  • 14:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 14:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 14:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 14:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T344589)', diff saved to https://phabricator.wikimedia.org/P51645 and previous config saved to /var/cache/conftool/dbconfig/20230828-142034-ladsgroup.json
  • 14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T344589)', diff saved to https://phabricator.wikimedia.org/P51644 and previous config saved to /var/cache/conftool/dbconfig/20230828-142033-ladsgroup.json
  • 14:20 fabfur: disable puppet and stop pybal on lvs6002 for reboot (T344587)
  • 14:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1030', diff saved to https://phabricator.wikimedia.org/P51643 and previous config saved to /var/cache/conftool/dbconfig/20230828-141946-ladsgroup.json
  • 14:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P51642 and previous config saved to /var/cache/conftool/dbconfig/20230828-141851-ladsgroup.json
  • 14:16 fabfur: enable puppet and start pybal on lvs6003 (T344587)
  • 14:16 stevemunene@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 14:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T343718)', diff saved to https://phabricator.wikimedia.org/P51641 and previous config saved to /var/cache/conftool/dbconfig/20230828-141505-ladsgroup.json
  • 14:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 14:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 14:14 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs6003.drmrs.wmnet
  • 14:12 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 14:12 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs6003.drmrs.wmnet
  • 14:11 fabfur: disable puppet and stop pybal on lvs6003 for reboot (T344587)
  • 14:07 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:07 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P51640 and previous config saved to /var/cache/conftool/dbconfig/20230828-140528-ladsgroup.json
  • 14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P51639 and previous config saved to /var/cache/conftool/dbconfig/20230828-140527-ladsgroup.json
  • 14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1030', diff saved to https://phabricator.wikimedia.org/P51638 and previous config saved to /var/cache/conftool/dbconfig/20230828-140440-ladsgroup.json
  • 14:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T343718)', diff saved to https://phabricator.wikimedia.org/P51637 and previous config saved to /var/cache/conftool/dbconfig/20230828-140345-ladsgroup.json
  • 14:02 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 14:02 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 13:51 moritzm: bounce ferm on ml-serve1006
  • 13:50 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetserver1002.eqiad.wmnet with OS bookworm
  • 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P51636 and previous config saved to /var/cache/conftool/dbconfig/20230828-135021-ladsgroup.json
  • 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P51635 and previous config saved to /var/cache/conftool/dbconfig/20230828-135021-ladsgroup.json
  • 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2025.codfw.wmnet
  • 13:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1030 (T344589)', diff saved to https://phabricator.wikimedia.org/P51634 and previous config saved to /var/cache/conftool/dbconfig/20230828-134934-ladsgroup.json
  • 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
  • 13:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw1-b1-codfw
  • 13:45 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cloudsw1-b1-codfw
  • 13:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
  • 13:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T343718)', diff saved to https://phabricator.wikimedia.org/P51633 and previous config saved to /var/cache/conftool/dbconfig/20230828-134137-ladsgroup.json
  • 13:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 13:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 13:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 13:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T344589)', diff saved to https://phabricator.wikimedia.org/P51632 and previous config saved to /var/cache/conftool/dbconfig/20230828-133756-ladsgroup.json
  • 13:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 13:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T344589)', diff saved to https://phabricator.wikimedia.org/P51631 and previous config saved to /var/cache/conftool/dbconfig/20230828-133514-ladsgroup.json
  • 13:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1030 (T344589)', diff saved to https://phabricator.wikimedia.org/P51630 and previous config saved to /var/cache/conftool/dbconfig/20230828-133040-ladsgroup.json
  • 13:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1030.eqiad.wmnet with reason: Maintenance
  • 13:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1030.eqiad.wmnet with reason: Maintenance
  • 13:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1026 (T344589)', diff saved to https://phabricator.wikimedia.org/P51629 and previous config saved to /var/cache/conftool/dbconfig/20230828-133016-ladsgroup.json
  • 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T343718)', diff saved to https://phabricator.wikimedia.org/P51628 and previous config saved to /var/cache/conftool/dbconfig/20230828-132724-ladsgroup.json
  • 13:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 13:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T343718)', diff saved to https://phabricator.wikimedia.org/P51627 and previous config saved to /var/cache/conftool/dbconfig/20230828-132703-ladsgroup.json
  • 13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T344589)', diff saved to https://phabricator.wikimedia.org/P51626 and previous config saved to /var/cache/conftool/dbconfig/20230828-132655-ladsgroup.json
  • 13:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T344589)', diff saved to https://phabricator.wikimedia.org/P51625 and previous config saved to /var/cache/conftool/dbconfig/20230828-132648-ladsgroup.json
  • 13:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 13:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T344589)', diff saved to https://phabricator.wikimedia.org/P51624 and previous config saved to /var/cache/conftool/dbconfig/20230828-132632-ladsgroup.json
  • 13:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T344589)', diff saved to https://phabricator.wikimedia.org/P51623 and previous config saved to /var/cache/conftool/dbconfig/20230828-132623-ladsgroup.json
  • 13:24 fabfur: enable puppet and start pybal on lvs5004 (T344587)
  • 13:23 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5004.eqsin.wmnet
  • 13:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P51622 and previous config saved to /var/cache/conftool/dbconfig/20230828-132250-ladsgroup.json
  • 13:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2025.codfw.wmnet
  • 13:20 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5004.eqsin.wmnet
  • 13:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2024.codfw.wmnet
  • 13:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2024.codfw.wmnet
  • 13:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P51621 and previous config saved to /var/cache/conftool/dbconfig/20230828-131510-ladsgroup.json
  • 13:14 urbanecm@deploy1002: Finished scap: Backport for Revert "ltwiki: Disable Growth features" (T344013) (duration: 09m 04s)
  • 13:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2024.codfw.wmnet
  • 13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P51620 and previous config saved to /var/cache/conftool/dbconfig/20230828-131157-ladsgroup.json
  • 13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P51619 and previous config saved to /var/cache/conftool/dbconfig/20230828-131125-ladsgroup.json
  • 13:11 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2024.codfw.wmnet
  • 13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P51618 and previous config saved to /var/cache/conftool/dbconfig/20230828-131117-ladsgroup.json
  • 13:08 urbanecm@deploy1002: urbanecm: Continuing with sync
  • 13:08 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2023.codfw.wmnet
  • 13:07 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
  • 13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P51617 and previous config saved to /var/cache/conftool/dbconfig/20230828-130744-ladsgroup.json
  • 13:06 urbanecm@deploy1002: urbanecm: Backport for Revert "ltwiki: Disable Growth features" (T344013) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:05 urbanecm@deploy1002: Started scap: Backport for Revert "ltwiki: Disable Growth features" (T344013)
  • 13:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2022.codfw.wmnet
  • 13:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2022.codfw.wmnet
  • 13:01 bblack: esams cp clusters: rolling restarts of varnish-frontend ~1h apart over the next ~8h, to apply memory sizing change from: https://gerrit.wikimedia.org/r/c/operations/puppet/+/952555/
  • 13:01 fabfur: disable puppet and stop pybal on lvs5004 for reboot (T344587)
  • 13:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 13:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 13:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T343718)', diff saved to https://phabricator.wikimedia.org/P51616 and previous config saved to /var/cache/conftool/dbconfig/20230828-130012-ladsgroup.json
  • 13:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P51615 and previous config saved to /var/cache/conftool/dbconfig/20230828-130004-ladsgroup.json
  • 12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P51614 and previous config saved to /var/cache/conftool/dbconfig/20230828-125651-ladsgroup.json
  • 12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P51613 and previous config saved to /var/cache/conftool/dbconfig/20230828-125619-ladsgroup.json
  • 12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P51612 and previous config saved to /var/cache/conftool/dbconfig/20230828-125610-ladsgroup.json
  • 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2022.codfw.wmnet
  • 12:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T344589)', diff saved to https://phabricator.wikimedia.org/P51611 and previous config saved to /var/cache/conftool/dbconfig/20230828-125237-ladsgroup.json
  • 12:49 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2022.codfw.wmnet
  • 12:46 jbond@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host puppetserver1002
  • 12:45 jbond@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host puppetserver1002
  • 12:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P51610 and previous config saved to /var/cache/conftool/dbconfig/20230828-124506-ladsgroup.json
  • 12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1026 (T344589)', diff saved to https://phabricator.wikimedia.org/P51609 and previous config saved to /var/cache/conftool/dbconfig/20230828-124457-ladsgroup.json
  • 12:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T343718)', diff saved to https://phabricator.wikimedia.org/P51608 and previous config saved to /var/cache/conftool/dbconfig/20230828-124145-ladsgroup.json
  • 12:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T344589)', diff saved to https://phabricator.wikimedia.org/P51607 and previous config saved to /var/cache/conftool/dbconfig/20230828-124113-ladsgroup.json
  • 12:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T344589)', diff saved to https://phabricator.wikimedia.org/P51606 and previous config saved to /var/cache/conftool/dbconfig/20230828-124104-ladsgroup.json
  • 12:40 fabfur: enable puppet and start pybal on lvs5005 (T344587)
  • 12:40 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:40 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add puppetserver1002 - jbond@cumin1001"
  • 12:39 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5005.eqsin.wmnet
  • 12:39 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add puppetserver1002 - jbond@cumin1001"
  • 12:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1221 (T344589)', diff saved to https://phabricator.wikimedia.org/P51605 and previous config saved to /var/cache/conftool/dbconfig/20230828-123917-ladsgroup.json
  • 12:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 12:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 12:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T344589)', diff saved to https://phabricator.wikimedia.org/P51604 and previous config saved to /var/cache/conftool/dbconfig/20230828-123847-ladsgroup.json
  • 12:37 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5005.eqsin.wmnet
  • 12:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T344589)', diff saved to https://phabricator.wikimedia.org/P51603 and previous config saved to /var/cache/conftool/dbconfig/20230828-123452-ladsgroup.json
  • 12:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 12:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T344589)', diff saved to https://phabricator.wikimedia.org/P51602 and previous config saved to /var/cache/conftool/dbconfig/20230828-123444-ladsgroup.json
  • 12:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 12:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 12:34 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.tls (exit_code=97) for network device cloudsw1-b1-codfw
  • 12:34 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cloudsw1-b1-codfw
  • 12:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 12:33 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 12:33 jbond@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1026 (T344589)', diff saved to https://phabricator.wikimedia.org/P51601 and previous config saved to /var/cache/conftool/dbconfig/20230828-123004-ladsgroup.json
  • 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P51600 and previous config saved to /var/cache/conftool/dbconfig/20230828-123000-ladsgroup.json
  • 12:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance
  • 12:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance
  • 12:29 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 12:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 12:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 12:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P51599 and previous config saved to /var/cache/conftool/dbconfig/20230828-122341-ladsgroup.json
  • 12:20 fabfur: disable puppet and stop pybal on lvs5005 for reboot (T344587)
  • 12:18 fabfur: enable puppet and start pybal on lvs5006 (T344587)
  • 12:17 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5006.eqsin.wmnet
  • 12:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2021.codfw.wmnet
  • 12:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2021.codfw.wmnet
  • 12:15 urbanecm@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 12:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw1-b1-codfw
  • 12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T343718)', diff saved to https://phabricator.wikimedia.org/P51598 and previous config saved to /var/cache/conftool/dbconfig/20230828-121454-ladsgroup.json
  • 12:14 urbanecm@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 12:14 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5006.eqsin.wmnet
  • 12:14 urbanecm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 12:13 fabfur: disable puppet and stop pybal on lvs5006 for reboot (T344587)
  • 12:13 urbanecm@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 12:12 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device cloudsw1-b1-codfw
  • 12:11 urbanecm@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 12:11 urbanecm@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 12:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2021.codfw.wmnet
  • 12:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P51597 and previous config saved to /var/cache/conftool/dbconfig/20230828-120835-ladsgroup.json
  • 12:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T343718)', diff saved to https://phabricator.wikimedia.org/P51596 and previous config saved to /var/cache/conftool/dbconfig/20230828-120530-ladsgroup.json
  • 12:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 12:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 12:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T343718)', diff saved to https://phabricator.wikimedia.org/P51595 and previous config saved to /var/cache/conftool/dbconfig/20230828-120509-ladsgroup.json
  • 12:00 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2021.codfw.wmnet
  • 11:57 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2019.codfw.wmnet
  • 11:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2019.codfw.wmnet
  • 11:55 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts puppetmaster1006.eqiad.wmnet
  • 11:55 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:55 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetmaster1006.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jbond@cumin1001"
  • 11:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T344589)', diff saved to https://phabricator.wikimedia.org/P51594 and previous config saved to /var/cache/conftool/dbconfig/20230828-115328-ladsgroup.json
  • 11:53 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetmaster1006.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jbond@cumin1001"
  • 11:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2019.codfw.wmnet
  • 11:50 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 11:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P51593 and previous config saved to /var/cache/conftool/dbconfig/20230828-115003-ladsgroup.json
  • 11:49 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2019.codfw.wmnet
  • 11:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1199 (T344589)', diff saved to https://phabricator.wikimedia.org/P51592 and previous config saved to /var/cache/conftool/dbconfig/20230828-114706-ladsgroup.json
  • 11:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 11:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 11:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T344589)', diff saved to https://phabricator.wikimedia.org/P51591 and previous config saved to /var/cache/conftool/dbconfig/20230828-114642-ladsgroup.json
  • 11:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T344589)', diff saved to https://phabricator.wikimedia.org/P51590 and previous config saved to /var/cache/conftool/dbconfig/20230828-114556-ladsgroup.json
  • 11:44 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts puppetmaster1006.eqiad.wmnet
  • 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2018.codfw.wmnet
  • 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2018.codfw.wmnet
  • 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T343718)', diff saved to https://phabricator.wikimedia.org/P51589 and previous config saved to /var/cache/conftool/dbconfig/20230828-113733-ladsgroup.json
  • 11:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 11:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T343718)', diff saved to https://phabricator.wikimedia.org/P51588 and previous config saved to /var/cache/conftool/dbconfig/20230828-113712-ladsgroup.json
  • 11:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P51587 and previous config saved to /var/cache/conftool/dbconfig/20230828-113455-ladsgroup.json
  • 11:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2018.codfw.wmnet
  • 11:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P51585 and previous config saved to /var/cache/conftool/dbconfig/20230828-113136-ladsgroup.json
  • 11:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P51584 and previous config saved to /var/cache/conftool/dbconfig/20230828-113050-ladsgroup.json
  • 11:29 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2018.codfw.wmnet
  • 11:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2017.codfw.wmnet
  • 11:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2017.codfw.wmnet
  • 11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P51583 and previous config saved to /var/cache/conftool/dbconfig/20230828-112206-ladsgroup.json
  • 11:20 fabfur: enable puppet and start pybal on lvs4009 (T344587)
  • 11:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T343718)', diff saved to https://phabricator.wikimedia.org/P51582 and previous config saved to /var/cache/conftool/dbconfig/20230828-111949-ladsgroup.json
  • 11:18 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4009.ulsfo.wmnet
  • 11:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P51581 and previous config saved to /var/cache/conftool/dbconfig/20230828-111630-ladsgroup.json
  • 11:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P51580 and previous config saved to /var/cache/conftool/dbconfig/20230828-111544-ladsgroup.json
  • 11:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2017.codfw.wmnet
  • 11:15 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs4009.ulsfo.wmnet
  • 11:15 kamila@deploy1002: Finished scap: base image update due to T344991 (duration: 09m 31s)
  • 11:11 moritzm: bounce ferm on ml-serve10001
  • 11:07 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2017.codfw.wmnet
  • 11:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P51579 and previous config saved to /var/cache/conftool/dbconfig/20230828-110700-ladsgroup.json
  • 11:05 kamila@deploy1002: Started scap: base image update due to T344991
  • 11:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T344589)', diff saved to https://phabricator.wikimedia.org/P51578 and previous config saved to /var/cache/conftool/dbconfig/20230828-110124-ladsgroup.json
  • 11:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T344589)', diff saved to https://phabricator.wikimedia.org/P51577 and previous config saved to /var/cache/conftool/dbconfig/20230828-110038-ladsgroup.json
  • 10:58 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on restbase1027.eqiad.wmnet with reason: T345058 - service probes flapping
  • 10:57 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on restbase1027.eqiad.wmnet with reason: T345058 - service probes flapping
  • 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2016.codfw.wmnet
  • 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2016.codfw.wmnet
  • 10:55 fabfur: disable puppet and stop pybal on lvs4009 for reboot (T344587)
  • 10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1190 (T344589)', diff saved to https://phabricator.wikimedia.org/P51576 and previous config saved to /var/cache/conftool/dbconfig/20230828-105503-ladsgroup.json
  • 10:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 10:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 10:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2172 (T344589)', diff saved to https://phabricator.wikimedia.org/P51575 and previous config saved to /var/cache/conftool/dbconfig/20230828-105407-ladsgroup.json
  • 10:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 10:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 10:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T344589)', diff saved to https://phabricator.wikimedia.org/P51574 and previous config saved to /var/cache/conftool/dbconfig/20230828-105342-ladsgroup.json
  • 10:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T343718)', diff saved to https://phabricator.wikimedia.org/P51573 and previous config saved to /var/cache/conftool/dbconfig/20230828-105153-ladsgroup.json
  • 10:50 moritzm: installing exim4 bugfix updates from Bookworm point release
  • 10:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 10:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 10:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T344589)', diff saved to https://phabricator.wikimedia.org/P51572 and previous config saved to /var/cache/conftool/dbconfig/20230828-105002-ladsgroup.json
  • 10:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2016.codfw.wmnet
  • 10:46 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2016.codfw.wmnet
  • 10:46 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 10:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T343718)', diff saved to https://phabricator.wikimedia.org/P51571 and previous config saved to /var/cache/conftool/dbconfig/20230828-104407-ladsgroup.json
  • 10:44 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 10:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 10:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 10:42 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 10:42 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 10:41 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 10:41 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 10:39 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
  • 10:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 10:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P51570 and previous config saved to /var/cache/conftool/dbconfig/20230828-103836-ladsgroup.json
  • 10:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 10:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T344589)', diff saved to https://phabricator.wikimedia.org/P51569 and previous config saved to /var/cache/conftool/dbconfig/20230828-103827-ladsgroup.json
  • 10:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2027 (T344589)', diff saved to https://phabricator.wikimedia.org/P51568 and previous config saved to /var/cache/conftool/dbconfig/20230828-103826-ladsgroup.json
  • 10:37 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 10:35 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P51567 and previous config saved to /var/cache/conftool/dbconfig/20230828-103456-ladsgroup.json
  • 10:31 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 10:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2015.codfw.wmnet
  • 10:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2015.codfw.wmnet
  • 10:29 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P51566 and previous config saved to /var/cache/conftool/dbconfig/20230828-102330-ladsgroup.json
  • 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P51565 and previous config saved to /var/cache/conftool/dbconfig/20230828-102320-ladsgroup.json
  • 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2027', diff saved to https://phabricator.wikimedia.org/P51564 and previous config saved to /var/cache/conftool/dbconfig/20230828-102320-ladsgroup.json
  • 10:23 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 10:23 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 10:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2015.codfw.wmnet
  • 10:21 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 10:20 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P51563 and previous config saved to /var/cache/conftool/dbconfig/20230828-101949-ladsgroup.json
  • 10:17 fabfur: enable puppet and start pybal on lvs4008 for reboot (T344587)
  • 10:16 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4008.ulsfo.wmnet
  • 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2015.codfw.wmnet
  • 10:15 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T343718)', diff saved to https://phabricator.wikimedia.org/P51562 and previous config saved to /var/cache/conftool/dbconfig/20230828-101426-ladsgroup.json
  • 10:14 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 10:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 10:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T343718)', diff saved to https://phabricator.wikimedia.org/P51561 and previous config saved to /var/cache/conftool/dbconfig/20230828-101405-ladsgroup.json
  • 10:13 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs4008.ulsfo.wmnet
  • 10:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 10:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 10:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T343718)', diff saved to https://phabricator.wikimedia.org/P51560 and previous config saved to /var/cache/conftool/dbconfig/20230828-101238-ladsgroup.json
  • 10:11 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 10:11 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 10:11 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 10:10 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:10 claime: Deploying 952812 for T344814 to mw-debug and mw-api-ext
  • 10:10 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 10:09 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T344589)', diff saved to https://phabricator.wikimedia.org/P51559 and previous config saved to /var/cache/conftool/dbconfig/20230828-100823-ladsgroup.json
  • 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P51558 and previous config saved to /var/cache/conftool/dbconfig/20230828-100814-ladsgroup.json
  • 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2027', diff saved to https://phabricator.wikimedia.org/P51557 and previous config saved to /var/cache/conftool/dbconfig/20230828-100814-ladsgroup.json
  • 10:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T344589)', diff saved to https://phabricator.wikimedia.org/P51556 and previous config saved to /var/cache/conftool/dbconfig/20230828-100443-ladsgroup.json
  • 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2013.codfw.wmnet
  • 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2013.codfw.wmnet
  • 10:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2155 (T344589)', diff saved to https://phabricator.wikimedia.org/P51555 and previous config saved to /var/cache/conftool/dbconfig/20230828-100045-ladsgroup.json
  • 10:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 10:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 10:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 10:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 10:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T344589)', diff saved to https://phabricator.wikimedia.org/P51554 and previous config saved to /var/cache/conftool/dbconfig/20230828-100005-ladsgroup.json
  • 09:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P51553 and previous config saved to /var/cache/conftool/dbconfig/20230828-095859-ladsgroup.json
  • 09:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P51552 and previous config saved to /var/cache/conftool/dbconfig/20230828-095732-ladsgroup.json
  • 09:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T344589)', diff saved to https://phabricator.wikimedia.org/P51551 and previous config saved to /var/cache/conftool/dbconfig/20230828-095722-ladsgroup.json
  • 09:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 09:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 09:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T344589)', diff saved to https://phabricator.wikimedia.org/P51550 and previous config saved to /var/cache/conftool/dbconfig/20230828-095658-ladsgroup.json
  • 09:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2013.codfw.wmnet
  • 09:54 fabfur: disable puppet and stop pybal on lvs4008 for reboot (T344587)
  • 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T344589)', diff saved to https://phabricator.wikimedia.org/P51549 and previous config saved to /var/cache/conftool/dbconfig/20230828-095308-ladsgroup.json
  • 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2027 (T344589)', diff saved to https://phabricator.wikimedia.org/P51548 and previous config saved to /var/cache/conftool/dbconfig/20230828-095308-ladsgroup.json
  • 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2027 (T344589)', diff saved to https://phabricator.wikimedia.org/P51547 and previous config saved to /var/cache/conftool/dbconfig/20230828-094813-ladsgroup.json
  • 09:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2027.codfw.wmnet with reason: Maintenance
  • 09:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2027.codfw.wmnet with reason: Maintenance
  • 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2029 (T344589)', diff saved to https://phabricator.wikimedia.org/P51546 and previous config saved to /var/cache/conftool/dbconfig/20230828-094748-ladsgroup.json
  • 09:47 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2013.codfw.wmnet
  • 09:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2012.codfw.wmnet
  • 09:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2012.codfw.wmnet
  • 09:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1214 (T344589)', diff saved to https://phabricator.wikimedia.org/P51545 and previous config saved to /var/cache/conftool/dbconfig/20230828-094650-ladsgroup.json
  • 09:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 09:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 09:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T344589)', diff saved to https://phabricator.wikimedia.org/P51544 and previous config saved to /var/cache/conftool/dbconfig/20230828-094626-ladsgroup.json
  • 09:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P51543 and previous config saved to /var/cache/conftool/dbconfig/20230828-094458-ladsgroup.json
  • 09:44 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet
  • 09:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P51542 and previous config saved to /var/cache/conftool/dbconfig/20230828-094353-ladsgroup.json
  • 09:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P51541 and previous config saved to /var/cache/conftool/dbconfig/20230828-094220-ladsgroup.json
  • 09:42 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet
  • 09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P51540 and previous config saved to /var/cache/conftool/dbconfig/20230828-094152-ladsgroup.json
  • 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2012.codfw.wmnet
  • 09:41 fabfur: ignore previous message: s/codfw/ulsfo/
  • 09:39 fabfur: begin rebooting lvs hosts in codfw (T344587)
  • 09:37 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host contint1002.wikimedia.org
  • 09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2029', diff saved to https://phabricator.wikimedia.org/P51539 and previous config saved to /var/cache/conftool/dbconfig/20230828-093242-ladsgroup.json
  • 09:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P51538 and previous config saved to /var/cache/conftool/dbconfig/20230828-093120-ladsgroup.json
  • 09:30 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host contint1002.wikimedia.org
  • 09:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P51537 and previous config saved to /var/cache/conftool/dbconfig/20230828-092952-ladsgroup.json
  • 09:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T343718)', diff saved to https://phabricator.wikimedia.org/P51536 and previous config saved to /var/cache/conftool/dbconfig/20230828-092847-ladsgroup.json
  • 09:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T343718)', diff saved to https://phabricator.wikimedia.org/P51535 and previous config saved to /var/cache/conftool/dbconfig/20230828-092713-ladsgroup.json
  • 09:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P51534 and previous config saved to /var/cache/conftool/dbconfig/20230828-092646-ladsgroup.json
  • 09:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2029', diff saved to https://phabricator.wikimedia.org/P51533 and previous config saved to /var/cache/conftool/dbconfig/20230828-091735-ladsgroup.json
  • 09:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P51532 and previous config saved to /var/cache/conftool/dbconfig/20230828-091613-ladsgroup.json
  • 09:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T344589)', diff saved to https://phabricator.wikimedia.org/P51531 and previous config saved to /var/cache/conftool/dbconfig/20230828-091446-ladsgroup.json
  • 09:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T344589)', diff saved to https://phabricator.wikimedia.org/P51530 and previous config saved to /var/cache/conftool/dbconfig/20230828-091140-ladsgroup.json
  • 09:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2147 (T344589)', diff saved to https://phabricator.wikimedia.org/P51529 and previous config saved to /var/cache/conftool/dbconfig/20230828-090819-ladsgroup.json
  • 09:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 09:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 09:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T344589)', diff saved to https://phabricator.wikimedia.org/P51528 and previous config saved to /var/cache/conftool/dbconfig/20230828-090749-ladsgroup.json
  • 09:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T344589)', diff saved to https://phabricator.wikimedia.org/P51527 and previous config saved to /var/cache/conftool/dbconfig/20230828-090318-ladsgroup.json
  • 09:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 09:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 09:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T344589)', diff saved to https://phabricator.wikimedia.org/P51526 and previous config saved to /var/cache/conftool/dbconfig/20230828-090255-ladsgroup.json
  • 09:02 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2012.codfw.wmnet
  • 09:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2029 (T344589)', diff saved to https://phabricator.wikimedia.org/P51525 and previous config saved to /var/cache/conftool/dbconfig/20230828-090229-ladsgroup.json
  • 09:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T344589)', diff saved to https://phabricator.wikimedia.org/P51524 and previous config saved to /var/cache/conftool/dbconfig/20230828-090107-ladsgroup.json
  • 09:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2011.codfw.wmnet
  • 09:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2011.codfw.wmnet
  • 08:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2029 (T344589)', diff saved to https://phabricator.wikimedia.org/P51523 and previous config saved to /var/cache/conftool/dbconfig/20230828-085737-ladsgroup.json
  • 08:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2029.codfw.wmnet with reason: Maintenance
  • 08:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2029.codfw.wmnet with reason: Maintenance
  • 08:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1211 (T344589)', diff saved to https://phabricator.wikimedia.org/P51522 and previous config saved to /var/cache/conftool/dbconfig/20230828-085456-ladsgroup.json
  • 08:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 08:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 08:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T344589)', diff saved to https://phabricator.wikimedia.org/P51521 and previous config saved to /var/cache/conftool/dbconfig/20230828-085432-ladsgroup.json
  • 08:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P51520 and previous config saved to /var/cache/conftool/dbconfig/20230828-085243-ladsgroup.json
  • 08:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2011.codfw.wmnet
  • 08:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T343718)', diff saved to https://phabricator.wikimedia.org/P51519 and previous config saved to /var/cache/conftool/dbconfig/20230828-084947-ladsgroup.json
  • 08:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 08:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 08:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T343718)', diff saved to https://phabricator.wikimedia.org/P51518 and previous config saved to /var/cache/conftool/dbconfig/20230828-084926-ladsgroup.json
  • 08:48 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2011.codfw.wmnet
  • 08:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P51517 and previous config saved to /var/cache/conftool/dbconfig/20230828-084748-ladsgroup.json
  • 08:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2010.codfw.wmnet
  • 08:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1132 (T343718)', diff saved to https://phabricator.wikimedia.org/P51516 and previous config saved to /var/cache/conftool/dbconfig/20230828-084710-ladsgroup.json
  • 08:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2010.codfw.wmnet
  • 08:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 08:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 08:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T343718)', diff saved to https://phabricator.wikimedia.org/P51515 and previous config saved to /var/cache/conftool/dbconfig/20230828-084650-ladsgroup.json
  • 08:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2010.codfw.wmnet
  • 08:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2031 (T344589)', diff saved to https://phabricator.wikimedia.org/P51514 and previous config saved to /var/cache/conftool/dbconfig/20230828-083949-ladsgroup.json
  • 08:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P51513 and previous config saved to /var/cache/conftool/dbconfig/20230828-083926-ladsgroup.json
  • 08:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P51512 and previous config saved to /var/cache/conftool/dbconfig/20230828-083737-ladsgroup.json
  • 08:35 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2010.codfw.wmnet
  • 08:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2009.codfw.wmnet
  • 08:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2009.codfw.wmnet
  • 08:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P51511 and previous config saved to /var/cache/conftool/dbconfig/20230828-083420-ladsgroup.json
  • 08:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P51510 and previous config saved to /var/cache/conftool/dbconfig/20230828-083242-ladsgroup.json
  • 08:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P51509 and previous config saved to /var/cache/conftool/dbconfig/20230828-083143-ladsgroup.json
  • 08:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2009.codfw.wmnet
  • 08:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2031', diff saved to https://phabricator.wikimedia.org/P51508 and previous config saved to /var/cache/conftool/dbconfig/20230828-082443-ladsgroup.json
  • 08:24 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2009.codfw.wmnet
  • 08:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P51507 and previous config saved to /var/cache/conftool/dbconfig/20230828-082420-ladsgroup.json
  • 08:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T344589)', diff saved to https://phabricator.wikimedia.org/P51506 and previous config saved to /var/cache/conftool/dbconfig/20230828-082231-ladsgroup.json
  • 08:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2001.codfw.wmnet
  • 08:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet
  • 08:19 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 08:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P51505 and previous config saved to /var/cache/conftool/dbconfig/20230828-081913-ladsgroup.json
  • 08:19 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 08:18 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 08:18 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 08:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T344589)', diff saved to https://phabricator.wikimedia.org/P51504 and previous config saved to /var/cache/conftool/dbconfig/20230828-081736-ladsgroup.json
  • 08:17 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:16 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 08:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P51503 and previous config saved to /var/cache/conftool/dbconfig/20230828-081637-ladsgroup.json
  • 08:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet
  • 08:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T344589)', diff saved to https://phabricator.wikimedia.org/P51502 and previous config saved to /var/cache/conftool/dbconfig/20230828-081245-ladsgroup.json
  • 08:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 08:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 08:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T344589)', diff saved to https://phabricator.wikimedia.org/P51501 and previous config saved to /var/cache/conftool/dbconfig/20230828-081220-ladsgroup.json
  • 08:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T344589)', diff saved to https://phabricator.wikimedia.org/P51500 and previous config saved to /var/cache/conftool/dbconfig/20230828-081117-ladsgroup.json
  • 08:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 08:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 08:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T344589)', diff saved to https://phabricator.wikimedia.org/P51499 and previous config saved to /var/cache/conftool/dbconfig/20230828-081051-ladsgroup.json
  • 08:10 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
  • 08:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2031', diff saved to https://phabricator.wikimedia.org/P51498 and previous config saved to /var/cache/conftool/dbconfig/20230828-080936-ladsgroup.json
  • 08:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T344589)', diff saved to https://phabricator.wikimedia.org/P51497 and previous config saved to /var/cache/conftool/dbconfig/20230828-080914-ladsgroup.json
  • 08:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T343718)', diff saved to https://phabricator.wikimedia.org/P51496 and previous config saved to /var/cache/conftool/dbconfig/20230828-080407-ladsgroup.json
  • 08:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T343718)', diff saved to https://phabricator.wikimedia.org/P51495 and previous config saved to /var/cache/conftool/dbconfig/20230828-080131-ladsgroup.json
  • 08:01 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 08:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1209 (T344589)', diff saved to https://phabricator.wikimedia.org/P51494 and previous config saved to /var/cache/conftool/dbconfig/20230828-080045-ladsgroup.json
  • 08:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 08:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 08:00 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 08:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T344589)', diff saved to https://phabricator.wikimedia.org/P51493 and previous config saved to /var/cache/conftool/dbconfig/20230828-080021-ladsgroup.json
  • 08:00 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 07:59 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 07:59 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 07:58 jayme@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 07:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P51492 and previous config saved to /var/cache/conftool/dbconfig/20230828-075714-ladsgroup.json
  • 07:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P51491 and previous config saved to /var/cache/conftool/dbconfig/20230828-075544-ladsgroup.json
  • 07:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2031 (T344589)', diff saved to https://phabricator.wikimedia.org/P51490 and previous config saved to /var/cache/conftool/dbconfig/20230828-075430-ladsgroup.json
  • 07:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2031 (T344589)', diff saved to https://phabricator.wikimedia.org/P51489 and previous config saved to /var/cache/conftool/dbconfig/20230828-075036-ladsgroup.json
  • 07:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2031.codfw.wmnet with reason: Maintenance
  • 07:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2031.codfw.wmnet with reason: Maintenance
  • 07:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2026 (T344589)', diff saved to https://phabricator.wikimedia.org/P51488 and previous config saved to /var/cache/conftool/dbconfig/20230828-075011-ladsgroup.json
  • 07:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P51487 and previous config saved to /var/cache/conftool/dbconfig/20230828-074515-ladsgroup.json
  • 07:45 moritzm: fail over Ganeti master in codfw-test to ganeti-test2003
  • 07:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P51486 and previous config saved to /var/cache/conftool/dbconfig/20230828-074208-ladsgroup.json
  • 07:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2003.codfw.wmnet
  • 07:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2003.codfw.wmnet
  • 07:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P51485 and previous config saved to /var/cache/conftool/dbconfig/20230828-074038-ladsgroup.json
  • 07:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
  • 07:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2026', diff saved to https://phabricator.wikimedia.org/P51484 and previous config saved to /var/cache/conftool/dbconfig/20230828-073505-ladsgroup.json
  • 07:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast6002.wikimedia.org
  • 07:31 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2003.codfw.wmnet
  • 07:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P51483 and previous config saved to /var/cache/conftool/dbconfig/20230828-073009-ladsgroup.json
  • 07:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2002.codfw.wmnet
  • 07:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
  • 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast6002.wikimedia.org
  • 07:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T344589)', diff saved to https://phabricator.wikimedia.org/P51482 and previous config saved to /var/cache/conftool/dbconfig/20230828-072701-ladsgroup.json
  • 07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T343718)', diff saved to https://phabricator.wikimedia.org/P51481 and previous config saved to /var/cache/conftool/dbconfig/20230828-072644-ladsgroup.json
  • 07:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 07:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T343718)', diff saved to https://phabricator.wikimedia.org/P51480 and previous config saved to /var/cache/conftool/dbconfig/20230828-072623-ladsgroup.json
  • 07:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T344589)', diff saved to https://phabricator.wikimedia.org/P51479 and previous config saved to /var/cache/conftool/dbconfig/20230828-072532-ladsgroup.json
  • 07:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T343718)', diff saved to https://phabricator.wikimedia.org/P51478 and previous config saved to /var/cache/conftool/dbconfig/20230828-072422-ladsgroup.json
  • 07:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 07:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 07:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
  • 07:20 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
  • 07:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2119 (T344589)', diff saved to https://phabricator.wikimedia.org/P51477 and previous config saved to /var/cache/conftool/dbconfig/20230828-072025-ladsgroup.json
  • 07:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 07:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 07:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T344589)', diff saved to https://phabricator.wikimedia.org/P51476 and previous config saved to /var/cache/conftool/dbconfig/20230828-072000-ladsgroup.json
  • 07:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2026', diff saved to https://phabricator.wikimedia.org/P51475 and previous config saved to /var/cache/conftool/dbconfig/20230828-071959-ladsgroup.json
  • 07:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5003.wikimedia.org
  • 07:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T344589)', diff saved to https://phabricator.wikimedia.org/P51474 and previous config saved to /var/cache/conftool/dbconfig/20230828-071824-ladsgroup.json
  • 07:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 07:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 07:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T344589)', diff saved to https://phabricator.wikimedia.org/P51473 and previous config saved to /var/cache/conftool/dbconfig/20230828-071800-ladsgroup.json
  • 07:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T344589)', diff saved to https://phabricator.wikimedia.org/P51472 and previous config saved to /var/cache/conftool/dbconfig/20230828-071503-ladsgroup.json
  • 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P51471 and previous config saved to /var/cache/conftool/dbconfig/20230828-071117-ladsgroup.json
  • 07:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5003.wikimedia.org
  • 07:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1203 (T344589)', diff saved to https://phabricator.wikimedia.org/P51470 and previous config saved to /var/cache/conftool/dbconfig/20230828-070847-ladsgroup.json
  • 07:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 07:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 07:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T344589)', diff saved to https://phabricator.wikimedia.org/P51469 and previous config saved to /var/cache/conftool/dbconfig/20230828-070823-ladsgroup.json
  • 07:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2026 (T344589)', diff saved to https://phabricator.wikimedia.org/P51468 and previous config saved to /var/cache/conftool/dbconfig/20230828-070453-ladsgroup.json
  • 07:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P51467 and previous config saved to /var/cache/conftool/dbconfig/20230828-070254-ladsgroup.json
  • 06:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2026 (T344589)', diff saved to https://phabricator.wikimedia.org/P51466 and previous config saved to /var/cache/conftool/dbconfig/20230828-065958-ladsgroup.json
  • 06:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2026.codfw.wmnet with reason: Maintenance
  • 06:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2026.codfw.wmnet with reason: Maintenance
  • 06:59 moritzm: installing perf updates on bullseye hosts
  • 06:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P51465 and previous config saved to /var/cache/conftool/dbconfig/20230828-065611-ladsgroup.json
  • 06:55 moritzm: installing nftables bugfix update from bullseye point release
  • 06:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P51464 and previous config saved to /var/cache/conftool/dbconfig/20230828-065316-ladsgroup.json
  • 06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P51463 and previous config saved to /var/cache/conftool/dbconfig/20230828-064948-ladsgroup.json
  • 06:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P51462 and previous config saved to /var/cache/conftool/dbconfig/20230828-064748-ladsgroup.json
  • 06:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 06:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 06:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T343718)', diff saved to https://phabricator.wikimedia.org/P51461 and previous config saved to /var/cache/conftool/dbconfig/20230828-064105-ladsgroup.json
  • 06:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P51460 and previous config saved to /var/cache/conftool/dbconfig/20230828-063810-ladsgroup.json
  • 06:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T344589)', diff saved to https://phabricator.wikimedia.org/P51459 and previous config saved to /var/cache/conftool/dbconfig/20230828-063442-ladsgroup.json
  • 06:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T344589)', diff saved to https://phabricator.wikimedia.org/P51458 and previous config saved to /var/cache/conftool/dbconfig/20230828-063242-ladsgroup.json
  • 06:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2110 (T344589)', diff saved to https://phabricator.wikimedia.org/P51457 and previous config saved to /var/cache/conftool/dbconfig/20230828-062805-ladsgroup.json
  • 06:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 06:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 06:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T344589)', diff saved to https://phabricator.wikimedia.org/P51456 and previous config saved to /var/cache/conftool/dbconfig/20230828-062740-ladsgroup.json
  • 06:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T344589)', diff saved to https://phabricator.wikimedia.org/P51455 and previous config saved to /var/cache/conftool/dbconfig/20230828-062617-ladsgroup.json
  • 06:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 06:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 06:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T344589)', diff saved to https://phabricator.wikimedia.org/P51454 and previous config saved to /var/cache/conftool/dbconfig/20230828-062552-ladsgroup.json
  • 06:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T344589)', diff saved to https://phabricator.wikimedia.org/P51453 and previous config saved to /var/cache/conftool/dbconfig/20230828-062304-ladsgroup.json
  • 06:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1193 (T344589)', diff saved to https://phabricator.wikimedia.org/P51452 and previous config saved to /var/cache/conftool/dbconfig/20230828-061651-ladsgroup.json
  • 06:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 06:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 06:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T344589)', diff saved to https://phabricator.wikimedia.org/P51451 and previous config saved to /var/cache/conftool/dbconfig/20230828-061627-ladsgroup.json
  • 06:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P51450 and previous config saved to /var/cache/conftool/dbconfig/20230828-061233-ladsgroup.json
  • 06:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P51449 and previous config saved to /var/cache/conftool/dbconfig/20230828-061046-ladsgroup.json
  • 06:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2103 (T343718)', diff saved to https://phabricator.wikimedia.org/P51448 and previous config saved to /var/cache/conftool/dbconfig/20230828-060317-ladsgroup.json
  • 06:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 06:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P51447 and previous config saved to /var/cache/conftool/dbconfig/20230828-060121-ladsgroup.json
  • 05:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P51446 and previous config saved to /var/cache/conftool/dbconfig/20230828-055751-ladsgroup.json
  • 05:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P51445 and previous config saved to /var/cache/conftool/dbconfig/20230828-055727-ladsgroup.json
  • 05:56 ladsgroup@deploy1002: Finished scap: Backport for Stop writing to old extlinks columns in s4 (T342683) (duration: 15m 36s)
  • 05:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P51444 and previous config saved to /var/cache/conftool/dbconfig/20230828-055539-ladsgroup.json
  • 05:50 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 05:49 ladsgroup@deploy1002: ladsgroup: Backport for Stop writing to old extlinks columns in s4 (T342683) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 05:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P51443 and previous config saved to /var/cache/conftool/dbconfig/20230828-054615-ladsgroup.json
  • 05:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P51442 and previous config saved to /var/cache/conftool/dbconfig/20230828-054247-ladsgroup.json
  • 05:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T344589)', diff saved to https://phabricator.wikimedia.org/P51441 and previous config saved to /var/cache/conftool/dbconfig/20230828-054221-ladsgroup.json
  • 05:41 marostegui: failover m5-master to dbproxy1021
  • 05:41 ladsgroup@deploy1002: Started scap: Backport for Stop writing to old extlinks columns in s4 (T342683)
  • 05:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T344589)', diff saved to https://phabricator.wikimedia.org/P51440 and previous config saved to /var/cache/conftool/dbconfig/20230828-054033-ladsgroup.json
  • 05:34 elukey: powercycle restbase1027 - stopped publishing metrics days ago, no root tty available in mgmt console
  • 05:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T344589)', diff saved to https://phabricator.wikimedia.org/P51439 and previous config saved to /var/cache/conftool/dbconfig/20230828-053108-ladsgroup.json
  • 05:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2106 (T344589)', diff saved to https://phabricator.wikimedia.org/P51438 and previous config saved to /var/cache/conftool/dbconfig/20230828-053045-ladsgroup.json
  • 05:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 05:30 elukey: depool restbase1027 - a lot of ping down events registered, a check up is needed
  • 05:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 05:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P51437 and previous config saved to /var/cache/conftool/dbconfig/20230828-052742-ladsgroup.json
  • 05:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 05:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 05:26 ladsgroup@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 05:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T344589)', diff saved to https://phabricator.wikimedia.org/P51436 and previous config saved to /var/cache/conftool/dbconfig/20230828-052610-ladsgroup.json
  • 05:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 05:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 05:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 05:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 05:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 05:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1192 (T344589)', diff saved to https://phabricator.wikimedia.org/P51435 and previous config saved to /var/cache/conftool/dbconfig/20230828-051349-ladsgroup.json
  • 05:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 05:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 05:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P51434 and previous config saved to /var/cache/conftool/dbconfig/20230828-051237-ladsgroup.json

2023-08-27

  • 07:28 elukey: silence rdb1011:6380's Redis alert (ORES-related) for 30 days to avoid spam

2023-08-26

  • 13:07 elukey: silence rdb1011:6378's Redis alert (ORES-related) for 30 days to avoid spam

2023-08-25

  • 21:03 inflatador: bking@cumin1001 shutting off wdqs1005 in preparation for decommission T344198
  • 21:02 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on wdqs1005.eqiad.wmnet with reason: to be decommissioned soon
  • 21:02 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on wdqs1005.eqiad.wmnet with reason: to be decommissioned soon
  • 19:48 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-be2003.codfw.wmnet with OS bullseye
  • 19:39 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 18:55 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bullseye
  • 18:48 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 18:47 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 18:47 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 18:46 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 18:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:45 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:45 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:26 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:25 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:23 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:23 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:22 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:22 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:15 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 18:14 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 18:14 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 18:13 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 18:13 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:12 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:12 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:12 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['moss-be2003']
  • 17:46 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['moss-be2003']
  • 17:46 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync
  • 17:45 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 17:45 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync
  • 17:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['moss-be2003']
  • 17:45 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync
  • 17:44 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: sync
  • 17:44 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync
  • 17:44 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: sync
  • 17:43 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 17:43 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 17:41 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 17:41 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 17:39 bking@deploy1002: Finished deploy [wdqs/wdqs@16e3dcf]: push deploy after bullseye reimage T343124 (duration: 00m 19s)
  • 17:39 bking@deploy1002: Started deploy [wdqs/wdqs@16e3dcf]: push deploy after bullseye reimage T343124
  • 17:36 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1009.eqiad.wmnet with OS bullseye
  • 17:30 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 17:30 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:30 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 17:29 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 17:26 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 17:18 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1009.eqiad.wmnet with reason: host reimage
  • 17:16 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1009.eqiad.wmnet with reason: host reimage
  • 17:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['moss-be2003']
  • 17:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['moss-be2003']
  • 17:14 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['moss-be2003']
  • 17:13 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['moss-be2003']
  • 17:13 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['moss-be2003']
  • 17:11 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:03 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1009.eqiad.wmnet with OS bullseye
  • 17:03 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:57 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:43 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:42 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:41 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 16:41 cmooney@cumin1001: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 16:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2048.codfw.wmnet with OS bullseye
  • 16:27 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host moss-be2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding new host moss-be2003 to CODFW - jhancock@cumin2002"
  • 16:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding new host moss-be2003 to CODFW - jhancock@cumin2002"
  • 16:10 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:17 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2048.codfw.wmnet with reason: host reimage
  • 14:58 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2048.codfw.wmnet with reason: host reimage
  • 14:52 sukhe: force run agent on A:cp-esams
  • 14:36 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2048.codfw.wmnet with OS bullseye
  • 14:31 claime: powercycled kubernetes2009
  • 13:40 moritzm: installing w3m security updates
  • 13:33 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 13:32 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 13:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 12:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetmaster1001.eqiad.wmnet
  • 12:32 moritzm: imported wmf-laptop 0.5.8 to apt.wikimedia.org
  • 12:27 jbond@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host puppetmaster2001.codfw.wmnet
  • 12:20 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetmaster1001.eqiad.wmnet
  • 12:20 jbond@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetmaster2001.codfw.wmnet
  • 12:19 jbond: disable puppet fleet wide for reboots
  • 12:18 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetmaster1006.eqiad.wmnet
  • 12:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast4005.wikimedia.org
  • 12:14 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet with OS bookworm
  • 12:12 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetmaster1006.eqiad.wmnet
  • 12:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast4005.wikimedia.org
  • 11:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 11:54 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 11:39 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 11:38 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 11:37 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 11:29 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 11:29 eoghan@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host phab-test1001.eqiad.wmnet
  • 11:24 eoghan@cumin2002: START - Cookbook sre.hosts.reboot-single for host phab-test1001.eqiad.wmnet
  • 11:17 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 11:17 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 11:17 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 11:16 eoghan@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host phab2002.codfw.wmnet
  • 11:16 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 11:16 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 11:15 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 11:15 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 11:15 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 11:15 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 11:14 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 11:13 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 11:13 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 11:12 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 11:12 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 11:11 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 11:11 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 11:10 eoghan@cumin2002: START - Cookbook sre.hosts.reboot-single for host phab2002.codfw.wmnet
  • 11:10 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 11:10 cgoubert@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 11:10 cgoubert@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 11:09 cgoubert@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 11:09 cgoubert@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 11:09 cgoubert@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 11:09 cgoubert@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 11:09 cgoubert@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 11:09 cgoubert@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 11:01 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:00 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:00 fabfur: enabled puppet and pybal on lvs2011 (T344587)
  • 11:00 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:00 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:59 claime: Deploying mediawiki: Add missing controls for php-fpm - T341320
  • 10:59 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2011.codfw.wmnet
  • 10:56 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs2011.codfw.wmnet
  • 10:51 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 10:48 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 10:38 fabfur: disabled puppet and pybal on lvs2011 for reboot (T344587)
  • 10:28 kartik@deploy1002: Finished scap: Backport for ext.uls.interface.js: Inline isNamed() method (T344635) (duration: 14m 06s)
  • 10:22 kartik@deploy1002: abi and kartik: Continuing with sync
  • 10:15 kartik@deploy1002: abi and kartik: Backport for ext.uls.interface.js: Inline isNamed() method (T344635) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 10:14 kartik@deploy1002: Started scap: Backport for ext.uls.interface.js: Inline isNamed() method (T344635)
  • 10:10 fabfur: enabled puppet and pybal on lvs2012 (T344587)
  • 10:09 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2012.codfw.wmnet
  • 10:09 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling reboot on A:thanos-fe
  • 10:06 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs2012.codfw.wmnet
  • 10:06 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new bastion - jmm@cumin2002"
  • 10:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new bastion - jmm@cumin2002"
  • 09:57 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 09:57 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 09:57 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 09:56 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 09:56 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 09:55 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 09:55 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 09:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast6003.wikimedia.org
  • 09:54 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 09:52 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:51 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 09:50 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast6003.wikimedia.org
  • 09:47 fabfur: disabling puppet and pybal on lvs2012 for reboot (T344587)
  • 09:44 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 09:44 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 09:44 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 09:44 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 09:43 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 09:42 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 09:42 cgoubert@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:42 cgoubert@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 09:41 cgoubert@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:41 cgoubert@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:34 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
  • 09:28 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
  • 09:24 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 09:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor1002.eqiad.wmnet
  • 09:24 fabfur: enabled puppet and pybal on lvs2013 for reboot (T344587)
  • 09:22 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2013.codfw.wmnet
  • 09:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor1002.eqiad.wmnet
  • 09:19 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs2013.codfw.wmnet
  • 09:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor2002.codfw.wmnet
  • 09:14 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling reboot on A:thanos-fe
  • 09:14 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 09:14 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host phab1004.eqiad.wmnet
  • 09:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor2002.codfw.wmnet
  • 09:11 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 09:10 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 09:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast5004.wikimedia.org
  • 09:09 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 09:08 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 09:08 eoghan@cumin1001: START - Cookbook sre.hosts.reboot-single for host phab1004.eqiad.wmnet
  • 09:08 cgoubert@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:07 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host bast6003.wikimedia.org
  • 09:07 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host bast6003.wikimedia.org with OS bookworm
  • 09:07 cgoubert@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 09:07 cgoubert@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:07 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast5004.wikimedia.org
  • 09:05 cgoubert@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:02 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 09:00 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 08:55 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 08:49 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 08:47 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 08:46 fabfur: stopping puppet and pybal on lvs2013 for reboot (T344587)
  • 08:44 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 08:44 claime: mw-debug: Remove limits for tls-proxy container - T344814
  • 08:43 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast6003.wikimedia.org with OS bookworm
  • 08:42 jnuche@deploy1002: Installation of scap version "4.58.0" completed for 1 hosts
  • 08:41 jnuche@deploy1002: Installing scap version "4.58.0" for 1 hosts
  • 08:40 fabfur: started puppet and pybal on lvs2014 (T344587)
  • 08:39 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2014.codfw.wmnet
  • 08:36 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs2014.codfw.wmnet
  • 08:36 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast6003.wikimedia.org - jmm@cumin2002"
  • 08:35 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast6003.wikimedia.org - jmm@cumin2002"
  • 08:35 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) bast6003.wikimedia.org on all recursors
  • 08:35 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast6003.wikimedia.org on all recursors
  • 08:35 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:35 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast6003.wikimedia.org - jmm@cumin2002"
  • 08:33 fabfur: stopping puppet and pybal on lvs2014 for reboot (T344587)
  • 08:23 vgutierrez: re-enabling puppet on acme-chief clients
  • 08:21 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief2001.codfw.wmnet
  • 08:20 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast6003.wikimedia.org - jmm@cumin2002"
  • 08:19 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief2001.codfw.wmnet
  • 08:16 vgutierrez: disabling puppet on acme-chief clients prior to acmechief2001 reboot
  • 08:09 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:09 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast6003.wikimedia.org
  • 08:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host bast5004.wikimedia.org
  • 08:04 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host bast5004.wikimedia.org with OS bookworm
  • 08:03 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 08:03 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 07:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1002.eqiad.wmnet
  • 07:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1002.eqiad.wmnet
  • 07:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2002.codfw.wmnet
  • 07:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2002.codfw.wmnet
  • 07:30 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 07:29 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 07:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1002.eqiad.wmnet
  • 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet
  • 07:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
  • 07:12 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast5004.wikimedia.org with OS bookworm
  • 07:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast5004.wikimedia.org - jmm@cumin2002"
  • 07:10 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast5004.wikimedia.org - jmm@cumin2002"
  • 07:10 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) bast5004.wikimedia.org on all recursors
  • 07:10 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast5004.wikimedia.org on all recursors
  • 07:10 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:10 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast5004.wikimedia.org - jmm@cumin2002"
  • 07:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
  • 07:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast5004.wikimedia.org - jmm@cumin2002"
  • 07:07 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:07 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 07:06 moritzm: installing cups security updates
  • 07:05 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:05 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 07:04 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 07:04 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast5004.wikimedia.org
  • 06:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-rw2001.wikimedia.org
  • 06:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-rw2001.wikimedia.org
  • 06:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-rw1001.wikimedia.org
  • 06:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-rw1001.wikimedia.org
  • 05:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P51427 and previous config saved to /var/cache/conftool/dbconfig/20230825-054701-ladsgroup.json
  • 05:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P51426 and previous config saved to /var/cache/conftool/dbconfig/20230825-053156-ladsgroup.json
  • 05:28 marostegui: failover m3-master to dbproxy1020
  • 05:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P51425 and previous config saved to /var/cache/conftool/dbconfig/20230825-051651-ladsgroup.json
  • 05:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P51424 and previous config saved to /var/cache/conftool/dbconfig/20230825-050147-ladsgroup.json

2023-08-24

  • 23:10 bblack: geodns: DE+GB mapped back to esams (were temporarily on drmrs)
  • 22:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2025.codfw.wmnet with OS bullseye
  • 21:59 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2025.codfw.wmnet with OS bullseye
  • 21:43 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2025.codfw.wmnet with OS bullseye
  • 21:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2025.codfw.wmnet with OS bullseye
  • 21:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2025.codfw.wmnet with OS bullseye
  • 21:29 bking@deploy1002: Finished deploy [wdqs/wdqs@16e3dcf]: allow list changes T343856 0.3.125 (duration: 00m 15s)
  • 21:29 bking@deploy1002: Started deploy [wdqs/wdqs@16e3dcf]: allow list changes T343856 0.3.125
  • 21:28 bking@deploy1002: Finished deploy [wdqs/wdqs@16e3dcf]: allow list changes T343856 0.3.125 (duration: 08m 18s)
  • 21:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T344589)', diff saved to https://phabricator.wikimedia.org/P51422 and previous config saved to /var/cache/conftool/dbconfig/20230824-212554-ladsgroup.json
  • 21:23 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2025.codfw.wmnet with OS bullseye
  • 21:19 bking@deploy1002: Started deploy [wdqs/wdqs@16e3dcf]: allow list changes T343856 0.3.125
  • 21:18 bking@deploy1002: Finished deploy [wdqs/wdqs@16e3dcf]: allow list changes T343856 0.3.125 (duration: 02m 17s)
  • 21:16 bking@deploy1002: Started deploy [wdqs/wdqs@16e3dcf]: allow list changes T343856 0.3.125
  • 21:15 bking@deploy1002: Finished deploy [wdqs/wdqs@16e3dcf]: (no justification provided) (duration: 00m 55s)
  • 21:14 bking@deploy1002: Started deploy [wdqs/wdqs@16e3dcf]: (no justification provided)
  • 21:14 bking@deploy1002: Finished deploy [wdqs/wdqs@16e3dcf]: (no justification provided) (duration: 00m 40s)
  • 21:14 bking@deploy1002: Started deploy [wdqs/wdqs@16e3dcf]: (no justification provided)
  • 21:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P51421 and previous config saved to /var/cache/conftool/dbconfig/20230824-211048-ladsgroup.json
  • 21:09 bking@deploy1002: Finished deploy [wdqs/wdqs@16e3dcf]: allow list changes T343856 0.3.125 (duration: 02m 56s)
  • 21:06 bking@deploy1002: Started deploy [wdqs/wdqs@16e3dcf]: allow list changes T343856 0.3.125
  • 21:06 bking@deploy1002: Finished deploy [wdqs/wdqs@16e3dcf]: allow list changes T343856 0.3.125 (duration: 22m 03s)
  • 21:01 thcipriani: mwmaint1002:foreachwiki extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --create-system-user # ref. 952132
  • 20:58 thcipriani@deploy1002: Finished scap: Backport for Add option to just create the 'Global rename script' system user (T344632), watchlist: Don't assume only named users have watchlist access (T344870) (duration: 12m 31s)
  • 20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P51419 and previous config saved to /var/cache/conftool/dbconfig/20230824-205541-ladsgroup.json
  • 20:52 thcipriani@deploy1002: thcipriani and jdrewniak and krinkle: Continuing with sync
  • 20:47 thcipriani@deploy1002: thcipriani and jdrewniak and krinkle: Backport for Add option to just create the 'Global rename script' system user (T344632), watchlist: Don't assume only named users have watchlist access (T344870) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (
  • 20:45 thcipriani@deploy1002: Started scap: Backport for Add option to just create the 'Global rename script' system user (T344632), watchlist: Don't assume only named users have watchlist access (T344870)
  • 20:44 bking@deploy1002: Started deploy [wdqs/wdqs@16e3dcf]: allow list changes T343856 0.3.125
  • 20:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T344589)', diff saved to https://phabricator.wikimedia.org/P51418 and previous config saved to /var/cache/conftool/dbconfig/20230824-204035-ladsgroup.json
  • 20:37 bking@deploy1002: Finished deploy [wdqs/wdqs@2455ffd]: (no justification provided) (duration: 04m 41s)
  • 20:34 inflatador: bking@deploy1002 'scap deploy new wdqs T343856'
  • 20:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1219 (T344589)', diff saved to https://phabricator.wikimedia.org/P51417 and previous config saved to /var/cache/conftool/dbconfig/20230824-203322-ladsgroup.json
  • 20:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 20:33 bking@deploy1002: Started deploy [wdqs/wdqs@2455ffd]: (no justification provided)
  • 20:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 20:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 20:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 20:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T344589)', diff saved to https://phabricator.wikimedia.org/P51416 and previous config saved to /var/cache/conftool/dbconfig/20230824-202836-ladsgroup.json
  • 20:21 thcipriani@deploy1002: Finished scap: Backport for Remove unused RESTBase-related VisualEditor config settings (T341618) (duration: 09m 58s)
  • 20:15 thcipriani@deploy1002: thcipriani and matmarex: Continuing with sync
  • 20:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P51415 and previous config saved to /var/cache/conftool/dbconfig/20230824-201329-ladsgroup.json
  • 20:12 thcipriani@deploy1002: thcipriani and matmarex: Backport for Remove unused RESTBase-related VisualEditor config settings (T341618) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:11 thcipriani@deploy1002: Started scap: Backport for Remove unused RESTBase-related VisualEditor config settings (T341618)
  • 20:03 effie: enabling puppet on thanos-fe* hosts
  • 19:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P51414 and previous config saved to /var/cache/conftool/dbconfig/20230824-195823-ladsgroup.json
  • 19:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T344589)', diff saved to https://phabricator.wikimedia.org/P51412 and previous config saved to /var/cache/conftool/dbconfig/20230824-194317-ladsgroup.json
  • 19:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1207 (T344589)', diff saved to https://phabricator.wikimedia.org/P51411 and previous config saved to /var/cache/conftool/dbconfig/20230824-193458-ladsgroup.json
  • 19:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 19:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 19:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T344589)', diff saved to https://phabricator.wikimedia.org/P51410 and previous config saved to /var/cache/conftool/dbconfig/20230824-193434-ladsgroup.json
  • 19:30 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
  • 19:30 effie: pool kartotherian to eqiad and depool from codfw
  • 19:22 cstone: payments-wiki upgraded from 7bf896f8 to b25307fe
  • 19:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P51409 and previous config saved to /var/cache/conftool/dbconfig/20230824-191928-ladsgroup.json
  • 19:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T344589)', diff saved to https://phabricator.wikimedia.org/P51408 and previous config saved to /var/cache/conftool/dbconfig/20230824-191322-ladsgroup.json
  • 19:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P51407 and previous config saved to /var/cache/conftool/dbconfig/20230824-190422-ladsgroup.json
  • 19:03 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.41.0-wmf.23 refs T343725
  • 18:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P51406 and previous config saved to /var/cache/conftool/dbconfig/20230824-185816-ladsgroup.json
  • 18:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T344589)', diff saved to https://phabricator.wikimedia.org/P51405 and previous config saved to /var/cache/conftool/dbconfig/20230824-184915-ladsgroup.json
  • 18:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P51404 and previous config saved to /var/cache/conftool/dbconfig/20230824-184308-ladsgroup.json
  • 18:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T344589)', diff saved to https://phabricator.wikimedia.org/P51403 and previous config saved to /var/cache/conftool/dbconfig/20230824-182802-ladsgroup.json
  • 18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1206 (T344589)', diff saved to https://phabricator.wikimedia.org/P51402 and previous config saved to /var/cache/conftool/dbconfig/20230824-182151-ladsgroup.json
  • 18:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 18:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T344589)', diff saved to https://phabricator.wikimedia.org/P51401 and previous config saved to /var/cache/conftool/dbconfig/20230824-182128-ladsgroup.json
  • 18:20 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 18:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T344589)', diff saved to https://phabricator.wikimedia.org/P51400 and previous config saved to /var/cache/conftool/dbconfig/20230824-182032-ladsgroup.json
  • 18:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 18:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 18:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T344589)', diff saved to https://phabricator.wikimedia.org/P51399 and previous config saved to /var/cache/conftool/dbconfig/20230824-182006-ladsgroup.json
  • 18:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P51398 and previous config saved to /var/cache/conftool/dbconfig/20230824-180621-ladsgroup.json
  • 18:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P51397 and previous config saved to /var/cache/conftool/dbconfig/20230824-180500-ladsgroup.json
  • 17:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P51396 and previous config saved to /var/cache/conftool/dbconfig/20230824-175115-ladsgroup.json
  • 17:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P51395 and previous config saved to /var/cache/conftool/dbconfig/20230824-174954-ladsgroup.json
  • 17:36 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 17:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T344589)', diff saved to https://phabricator.wikimedia.org/P51394 and previous config saved to /var/cache/conftool/dbconfig/20230824-173609-ladsgroup.json
  • 17:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T344589)', diff saved to https://phabricator.wikimedia.org/P51393 and previous config saved to /var/cache/conftool/dbconfig/20230824-173448-ladsgroup.json
  • 17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1196 (T344589)', diff saved to https://phabricator.wikimedia.org/P51391 and previous config saved to /var/cache/conftool/dbconfig/20230824-172851-ladsgroup.json
  • 17:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 17:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 17:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 17:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T344589)', diff saved to https://phabricator.wikimedia.org/P51390 and previous config saved to /var/cache/conftool/dbconfig/20230824-172820-ladsgroup.json
  • 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T344589)', diff saved to https://phabricator.wikimedia.org/P51389 and previous config saved to /var/cache/conftool/dbconfig/20230824-172723-ladsgroup.json
  • 17:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 17:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 17:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T344589)', diff saved to https://phabricator.wikimedia.org/P51388 and previous config saved to /var/cache/conftool/dbconfig/20230824-172658-ladsgroup.json
  • 17:23 ryankemper: [WCQS] T344882 `ryankemper@wcqs1003:~$ sudo depool`
  • 17:17 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@15ed2de]: (no justification provided) (duration: 00m 19s)
  • 17:17 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@15ed2de]: (no justification provided)
  • 17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P51387 and previous config saved to /var/cache/conftool/dbconfig/20230824-171314-ladsgroup.json
  • 17:13 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:12 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:12 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:12 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:12 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:11 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P51386 and previous config saved to /var/cache/conftool/dbconfig/20230824-171152-ladsgroup.json
  • 17:10 bd808: Toolhub updated to a59d37
  • 17:10 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 17:08 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 17:08 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 17:07 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 17:06 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 17:05 bd808@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P51385 and previous config saved to /var/cache/conftool/dbconfig/20230824-165807-ladsgroup.json
  • 16:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P51384 and previous config saved to /var/cache/conftool/dbconfig/20230824-165646-ladsgroup.json
  • 16:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T343718)', diff saved to https://phabricator.wikimedia.org/P51383 and previous config saved to /var/cache/conftool/dbconfig/20230824-165609-ladsgroup.json
  • 16:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2025']
  • 16:49 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2025']
  • 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2025']
  • 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T344589)', diff saved to https://phabricator.wikimedia.org/P51382 and previous config saved to /var/cache/conftool/dbconfig/20230824-164301-ladsgroup.json
  • 16:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T344589)', diff saved to https://phabricator.wikimedia.org/P51381 and previous config saved to /var/cache/conftool/dbconfig/20230824-164140-ladsgroup.json
  • 16:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P51380 and previous config saved to /var/cache/conftool/dbconfig/20230824-164103-ladsgroup.json
  • 16:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2025']
  • 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1186 (T344589)', diff saved to https://phabricator.wikimedia.org/P51378 and previous config saved to /var/cache/conftool/dbconfig/20230824-163543-ladsgroup.json
  • 16:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 16:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T344589)', diff saved to https://phabricator.wikimedia.org/P51377 and previous config saved to /var/cache/conftool/dbconfig/20230824-163519-ladsgroup.json
  • 16:35 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-master1001.eqiad.wmnet
  • 16:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T344589)', diff saved to https://phabricator.wikimedia.org/P51376 and previous config saved to /var/cache/conftool/dbconfig/20230824-163419-ladsgroup.json
  • 16:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 16:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 16:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 16:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 16:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T344589)', diff saved to https://phabricator.wikimedia.org/P51375 and previous config saved to /var/cache/conftool/dbconfig/20230824-163347-ladsgroup.json
  • 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:30 stevemunene@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 16:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:27 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 16:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P51374 and previous config saved to /var/cache/conftool/dbconfig/20230824-162556-ladsgroup.json
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P51373 and previous config saved to /var/cache/conftool/dbconfig/20230824-162013-ladsgroup.json
  • 16:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P51372 and previous config saved to /var/cache/conftool/dbconfig/20230824-161841-ladsgroup.json
  • 16:12 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-master1001.eqiad.wmnet
  • 16:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T343718)', diff saved to https://phabricator.wikimedia.org/P51371 and previous config saved to /var/cache/conftool/dbconfig/20230824-161050-ladsgroup.json
  • 16:10 bking@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wcqs,name=eqiad
  • 16:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:08 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20230824-160502-ladsgroup.json
  • 16:04 sukhe: enable puppet on A:lvs and A:esams and force run agent to merge 952247
  • 16:03 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P51369 and previous config saved to /var/cache/conftool/dbconfig/20230824-160335-ladsgroup.json
  • 16:00 sukhe: disable puppet on A:lvs and A:esams to merge 952247
  • 15:51 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T344589)', diff saved to https://phabricator.wikimedia.org/P51368 and previous config saved to /var/cache/conftool/dbconfig/20230824-154956-ladsgroup.json
  • 15:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T344589)', diff saved to https://phabricator.wikimedia.org/P51367 and previous config saved to /var/cache/conftool/dbconfig/20230824-154829-ladsgroup.json
  • 15:45 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T344589)', diff saved to https://phabricator.wikimedia.org/P51366 and previous config saved to /var/cache/conftool/dbconfig/20230824-154238-ladsgroup.json
  • 15:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 15:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 15:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T344589)', diff saved to https://phabricator.wikimedia.org/P51365 and previous config saved to /var/cache/conftool/dbconfig/20230824-154102-ladsgroup.json
  • 15:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 15:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T344589)', diff saved to https://phabricator.wikimedia.org/P51364 and previous config saved to /var/cache/conftool/dbconfig/20230824-154037-ladsgroup.json
  • 15:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 15:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T344589)', diff saved to https://phabricator.wikimedia.org/P51363 and previous config saved to /var/cache/conftool/dbconfig/20230824-153835-ladsgroup.json
  • 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T343718)', diff saved to https://phabricator.wikimedia.org/P51362 and previous config saved to /var/cache/conftool/dbconfig/20230824-153443-ladsgroup.json
  • 15:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 15:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T343718)', diff saved to https://phabricator.wikimedia.org/P51361 and previous config saved to /var/cache/conftool/dbconfig/20230824-153422-ladsgroup.json
  • 15:27 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 15:26 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 15:26 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 15:25 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 15:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P51360 and previous config saved to /var/cache/conftool/dbconfig/20230824-152531-ladsgroup.json
  • 15:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P51359 and previous config saved to /var/cache/conftool/dbconfig/20230824-152329-ladsgroup.json
  • 15:22 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
  • 15:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:21 effie: depool kartotherian on eqiad
  • 15:20 oblivian@deploy1002: Started scap: (no justification provided)
  • 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P51358 and previous config saved to /var/cache/conftool/dbconfig/20230824-151916-ladsgroup.json
  • 15:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:16 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host restbase1020.eqiad.wmnet
  • 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P51357 and previous config saved to /var/cache/conftool/dbconfig/20230824-151414-ladsgroup.json
  • 15:13 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 15:13 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 15:13 bking@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=wcqs,name=eqiad
  • 15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P51356 and previous config saved to /var/cache/conftool/dbconfig/20230824-151025-ladsgroup.json
  • 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P51355 and previous config saved to /var/cache/conftool/dbconfig/20230824-150823-ladsgroup.json
  • 15:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P51354 and previous config saved to /var/cache/conftool/dbconfig/20230824-150410-ladsgroup.json
  • 15:03 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1020.eqiad.wmnet
  • 15:03 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1019.eqiad.wmnet
  • 15:02 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
  • 15:02 effie: pool kartotherian on codfw
  • 15:02 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling reboot on A:wikidough and not A:esams and A:wikidough
  • 15:01 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P51353 and previous config saved to /var/cache/conftool/dbconfig/20230824-145909-ladsgroup.json
  • 14:55 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 14:55 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 14:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T344589)', diff saved to https://phabricator.wikimedia.org/P51352 and previous config saved to /var/cache/conftool/dbconfig/20230824-145519-ladsgroup.json
  • 14:54 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1019.eqiad.wmnet
  • 14:54 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1016.eqiad.wmnet
  • 14:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T344589)', diff saved to https://phabricator.wikimedia.org/P51351 and previous config saved to /var/cache/conftool/dbconfig/20230824-145317-ladsgroup.json
  • 14:53 moritzm: installing poppler security updates
  • 14:52 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 14:52 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 14:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T343718)', diff saved to https://phabricator.wikimedia.org/P51350 and previous config saved to /var/cache/conftool/dbconfig/20230824-144903-ladsgroup.json
  • 14:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T344589)', diff saved to https://phabricator.wikimedia.org/P51349 and previous config saved to /var/cache/conftool/dbconfig/20230824-144810-ladsgroup.json
  • 14:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 14:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T344589)', diff saved to https://phabricator.wikimedia.org/P51348 and previous config saved to /var/cache/conftool/dbconfig/20230824-144801-ladsgroup.json
  • 14:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 14:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 14:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T344589)', diff saved to https://phabricator.wikimedia.org/P51347 and previous config saved to /var/cache/conftool/dbconfig/20230824-144745-ladsgroup.json
  • 14:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 14:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T344589)', diff saved to https://phabricator.wikimedia.org/P51346 and previous config saved to /var/cache/conftool/dbconfig/20230824-144737-ladsgroup.json
  • 14:44 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1016.eqiad.wmnet
  • 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P51345 and previous config saved to /var/cache/conftool/dbconfig/20230824-144404-ladsgroup.json
  • 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P51344 and previous config saved to /var/cache/conftool/dbconfig/20230824-143239-ladsgroup.json
  • 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P51343 and previous config saved to /var/cache/conftool/dbconfig/20230824-143231-ladsgroup.json
  • 14:31 moritzm: restarting FPM on mw canaries
  • 14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P51342 and previous config saved to /var/cache/conftool/dbconfig/20230824-142900-ladsgroup.json
  • 14:23 sukhe@cumin2002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling reboot on A:wikidough and not A:esams and A:wikidough
  • 14:21 moritzm: installing openssl security updates on buster
  • 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P51341 and previous config saved to /var/cache/conftool/dbconfig/20230824-141733-ladsgroup.json
  • 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P51340 and previous config saved to /var/cache/conftool/dbconfig/20230824-141725-ladsgroup.json
  • 14:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T343718)', diff saved to https://phabricator.wikimedia.org/P51339 and previous config saved to /var/cache/conftool/dbconfig/20230824-141043-ladsgroup.json
  • 14:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 14:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 14:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T343718)', diff saved to https://phabricator.wikimedia.org/P51338 and previous config saved to /var/cache/conftool/dbconfig/20230824-141022-ladsgroup.json
  • 14:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T344589)', diff saved to https://phabricator.wikimedia.org/P51337 and previous config saved to /var/cache/conftool/dbconfig/20230824-140226-ladsgroup.json
  • 14:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T344589)', diff saved to https://phabricator.wikimedia.org/P51336 and previous config saved to /var/cache/conftool/dbconfig/20230824-140218-ladsgroup.json
  • 13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T344589)', diff saved to https://phabricator.wikimedia.org/P51335 and previous config saved to /var/cache/conftool/dbconfig/20230824-135659-ladsgroup.json
  • 13:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 13:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T344589)', diff saved to https://phabricator.wikimedia.org/P51334 and previous config saved to /var/cache/conftool/dbconfig/20230824-135636-ladsgroup.json
  • 13:55 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 13:55 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 13:55 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 13:55 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 13:55 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 13:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P51333 and previous config saved to /var/cache/conftool/dbconfig/20230824-135516-ladsgroup.json
  • 13:55 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 13:55 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 13:55 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 13:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T344589)', diff saved to https://phabricator.wikimedia.org/P51332 and previous config saved to /var/cache/conftool/dbconfig/20230824-135456-ladsgroup.json
  • 13:54 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 13:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 13:54 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 13:54 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 13:54 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 13:54 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 13:54 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 13:54 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 13:54 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 13:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 13:54 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 13:54 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 13:54 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 13:54 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 13:53 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 13:53 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 13:51 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 13:50 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 13:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 13:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T344589)', diff saved to https://phabricator.wikimedia.org/P51331 and previous config saved to /var/cache/conftool/dbconfig/20230824-135004-ladsgroup.json
  • 13:48 fabfur: enabled puppet and pybal on lvs1017 (T344587)
  • 13:47 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1017.eqiad.wmnet
  • 13:46 bblack: cp3075: restart varnish frontend (changing malloc storage from https://gerrit.wikimedia.org/r/c/operations/puppet/+/952207/ )
  • 13:43 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 13:43 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1017.eqiad.wmnet
  • 13:43 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 13:43 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 13:42 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 13:42 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 13:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P51330 and previous config saved to /var/cache/conftool/dbconfig/20230824-134129-ladsgroup.json
  • 13:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P51329 and previous config saved to /var/cache/conftool/dbconfig/20230824-134010-ladsgroup.json
  • 13:37 marostegui: failover m2-master to dbproxy1023
  • 13:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P51328 and previous config saved to /var/cache/conftool/dbconfig/20230824-133458-ladsgroup.json
  • 13:33 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P51327 and previous config saved to /var/cache/conftool/dbconfig/20230824-132623-ladsgroup.json
  • 13:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T343718)', diff saved to https://phabricator.wikimedia.org/P51326 and previous config saved to /var/cache/conftool/dbconfig/20230824-132504-ladsgroup.json
  • 13:23 fabfur: disabling puppet and pybal on lvs1017 for reboot (T344587)
  • 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P51325 and previous config saved to /var/cache/conftool/dbconfig/20230824-131952-ladsgroup.json
  • 13:11 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@c579111] (releasing): (no justification provided) (duration: 00m 21s)
  • 13:11 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@c579111] (releasing): (no justification provided)
  • 13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T344589)', diff saved to https://phabricator.wikimedia.org/P51324 and previous config saved to /var/cache/conftool/dbconfig/20230824-131117-ladsgroup.json
  • 13:08 bblack: cp3074: restart varnish frontend (changing malloc storage from https://gerrit.wikimedia.org/r/c/operations/puppet/+/952207/ )
  • 13:05 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@c579111] (releasing): (no justification provided) (duration: 01m 27s)
  • 13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1132 (T344589)', diff saved to https://phabricator.wikimedia.org/P51323 and previous config saved to /var/cache/conftool/dbconfig/20230824-130519-ladsgroup.json
  • 13:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 13:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T344589)', diff saved to https://phabricator.wikimedia.org/P51322 and previous config saved to /var/cache/conftool/dbconfig/20230824-130455-ladsgroup.json
  • 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T344589)', diff saved to https://phabricator.wikimedia.org/P51321 and previous config saved to /var/cache/conftool/dbconfig/20230824-130446-ladsgroup.json
  • 13:04 fabfur: puppet and pybal reenabled on lvs1018 (T344587)
  • 13:04 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@c579111] (releasing): (no justification provided)
  • 13:04 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
  • 13:03 marostegui: failover m1-master to dbproxy1022
  • 13:02 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
  • 12:59 sukhe: running homer "asw1-b*27-esams*" commit "add doh300[34]"
  • 12:58 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 12:58 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 12:57 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1018.eqiad.wmnet
  • 12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T344589)', diff saved to https://phabricator.wikimedia.org/P51320 and previous config saved to /var/cache/conftool/dbconfig/20230824-125607-ladsgroup.json
  • 12:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 12:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 12:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T344589)', diff saved to https://phabricator.wikimedia.org/P51319 and previous config saved to /var/cache/conftool/dbconfig/20230824-125542-ladsgroup.json
  • 12:54 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1018.eqiad.wmnet
  • 12:49 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
  • 12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P51318 and previous config saved to /var/cache/conftool/dbconfig/20230824-124942-ladsgroup.json
  • 12:48 effie: depool kartotherian in eqiad
  • 12:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T343718)', diff saved to https://phabricator.wikimedia.org/P51317 and previous config saved to /var/cache/conftool/dbconfig/20230824-124758-ladsgroup.json
  • 12:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 12:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 12:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T343718)', diff saved to https://phabricator.wikimedia.org/P51316 and previous config saved to /var/cache/conftool/dbconfig/20230824-124737-ladsgroup.json
  • 12:45 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-coord1001.eqiad.wmnet
  • 12:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P51315 and previous config saved to /var/cache/conftool/dbconfig/20230824-124036-ladsgroup.json
  • 12:39 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-coord1001.eqiad.wmnet
  • 12:35 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 12:34 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 12:34 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 12:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P51314 and previous config saved to /var/cache/conftool/dbconfig/20230824-123436-ladsgroup.json
  • 12:34 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P51313 and previous config saved to /var/cache/conftool/dbconfig/20230824-123231-ladsgroup.json
  • 12:25 fabfur: errata corrige: not lvs1020 but lvs1018
  • 12:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P51312 and previous config saved to /var/cache/conftool/dbconfig/20230824-122530-ladsgroup.json
  • 12:25 fabfur: disabling puppet and pybal on lvs1020 for reboot (T344587)
  • 12:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T344589)', diff saved to https://phabricator.wikimedia.org/P51311 and previous config saved to /var/cache/conftool/dbconfig/20230824-121930-ladsgroup.json
  • 12:18 cgoubert@deploy1002: Finished scap: Redeploying mw-on-k8s - T344904 (duration: 02m 07s)
  • 12:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P51310 and previous config saved to /var/cache/conftool/dbconfig/20230824-121725-ladsgroup.json
  • 12:16 cgoubert@deploy1002: Started scap: Redeploying mw-on-k8s - T344904
  • 12:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T344589)', diff saved to https://phabricator.wikimedia.org/P51309 and previous config saved to /var/cache/conftool/dbconfig/20230824-121158-ladsgroup.json
  • 12:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 12:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 12:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T344589)', diff saved to https://phabricator.wikimedia.org/P51308 and previous config saved to /var/cache/conftool/dbconfig/20230824-121024-ladsgroup.json
  • 12:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 12:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 12:06 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_drmrs and A:cp
  • 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T344589)', diff saved to https://phabricator.wikimedia.org/P51307 and previous config saved to /var/cache/conftool/dbconfig/20230824-120611-ladsgroup.json
  • 12:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T344589)', diff saved to https://phabricator.wikimedia.org/P51306 and previous config saved to /var/cache/conftool/dbconfig/20230824-120352-ladsgroup.json
  • 12:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 12:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 12:02 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_drmrs and A:cp
  • 12:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T343718)', diff saved to https://phabricator.wikimedia.org/P51305 and previous config saved to /var/cache/conftool/dbconfig/20230824-120218-ladsgroup.json
  • 12:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 12:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 11:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P51304 and previous config saved to /var/cache/conftool/dbconfig/20230824-115105-ladsgroup.json
  • 11:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 11:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T343718)', diff saved to https://phabricator.wikimedia.org/P51303 and previous config saved to /var/cache/conftool/dbconfig/20230824-115056-ladsgroup.json
  • 11:49 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 11:48 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
  • 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P51302 and previous config saved to /var/cache/conftool/dbconfig/20230824-113559-ladsgroup.json
  • 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P51301 and previous config saved to /var/cache/conftool/dbconfig/20230824-113550-ladsgroup.json
  • 11:31 taavi: foreachwikiindblist fishbowl extensions/OATHAuth/maintenance/UpdateForMultipleDevicesSupport.php | tee oathauth-multiple-fishbowl.log # T242031
  • 11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T343718)', diff saved to https://phabricator.wikimedia.org/P51300 and previous config saved to /var/cache/conftool/dbconfig/20230824-112532-ladsgroup.json
  • 11:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 11:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 11:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 11:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T343718)', diff saved to https://phabricator.wikimedia.org/P51299 and previous config saved to /var/cache/conftool/dbconfig/20230824-112507-ladsgroup.json
  • 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T344589)', diff saved to https://phabricator.wikimedia.org/P51298 and previous config saved to /var/cache/conftool/dbconfig/20230824-112052-ladsgroup.json
  • 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P51297 and previous config saved to /var/cache/conftool/dbconfig/20230824-112043-ladsgroup.json
  • 11:16 fabfur: lvs1019 up and running (T344587)
  • 11:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T344589)', diff saved to https://phabricator.wikimedia.org/P51296 and previous config saved to /var/cache/conftool/dbconfig/20230824-111432-ladsgroup.json
  • 11:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 11:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 11:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T344589)', diff saved to https://phabricator.wikimedia.org/P51295 and previous config saved to /var/cache/conftool/dbconfig/20230824-111407-ladsgroup.json
  • 11:12 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1019.eqiad.wmnet
  • 11:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P51294 and previous config saved to /var/cache/conftool/dbconfig/20230824-111001-ladsgroup.json
  • 11:09 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1019.eqiad.wmnet
  • 11:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T343718)', diff saved to https://phabricator.wikimedia.org/P51293 and previous config saved to /var/cache/conftool/dbconfig/20230824-110537-ladsgroup.json
  • 11:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1202 (T343718)', diff saved to https://phabricator.wikimedia.org/P51292 and previous config saved to /var/cache/conftool/dbconfig/20230824-110226-ladsgroup.json
  • 11:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 11:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 11:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T343718)', diff saved to https://phabricator.wikimedia.org/P51291 and previous config saved to /var/cache/conftool/dbconfig/20230824-110206-ladsgroup.json
  • 10:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P51290 and previous config saved to /var/cache/conftool/dbconfig/20230824-105900-ladsgroup.json
  • 10:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P51289 and previous config saved to /var/cache/conftool/dbconfig/20230824-105454-ladsgroup.json
  • 10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P51288 and previous config saved to /var/cache/conftool/dbconfig/20230824-104659-ladsgroup.json
  • 10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P51287 and previous config saved to /var/cache/conftool/dbconfig/20230824-104354-ladsgroup.json
  • 10:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T343718)', diff saved to https://phabricator.wikimedia.org/P51286 and previous config saved to /var/cache/conftool/dbconfig/20230824-103948-ladsgroup.json
  • 10:32 fabfur: stopping pybal and rebooting lvs1019 (T344587)
  • 10:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P51285 and previous config saved to /var/cache/conftool/dbconfig/20230824-103153-ladsgroup.json
  • 10:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T344589)', diff saved to https://phabricator.wikimedia.org/P51284 and previous config saved to /var/cache/conftool/dbconfig/20230824-102848-ladsgroup.json
  • 10:22 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
  • 10:22 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 10:22 effie: pool kartotherian on codfw
  • 10:21 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 10:21 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 10:20 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 10:19 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 10:18 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 10:17 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 10:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T343718)', diff saved to https://phabricator.wikimedia.org/P51283 and previous config saved to /var/cache/conftool/dbconfig/20230824-101647-ladsgroup.json
  • 10:16 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 10:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T344589)', diff saved to https://phabricator.wikimedia.org/P51282 and previous config saved to /var/cache/conftool/dbconfig/20230824-101527-ladsgroup.json
  • 10:15 effie: Disable puppet on thanos-fe (eqiad), rollout cfssl on thanos-fe in codfw
  • 10:14 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T343718)', diff saved to https://phabricator.wikimedia.org/P51281 and previous config saved to /var/cache/conftool/dbconfig/20230824-101437-ladsgroup.json
  • 10:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 10:14 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 10:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T343718)', diff saved to https://phabricator.wikimedia.org/P51280 and previous config saved to /var/cache/conftool/dbconfig/20230824-101405-ladsgroup.json
  • 10:08 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 10:08 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 10:06 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 10:06 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 10:04 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pybal-test2003.codfw.wmnet
  • 10:03 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T343718)', diff saved to https://phabricator.wikimedia.org/P51279 and previous config saved to /var/cache/conftool/dbconfig/20230824-100321-ladsgroup.json
  • 10:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 10:03 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 10:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T343718)', diff saved to https://phabricator.wikimedia.org/P51278 and previous config saved to /var/cache/conftool/dbconfig/20230824-100259-ladsgroup.json
  • 10:02 fabfur: end reboot of lvs1020 (pybal service enabled) (T344587)
  • 10:00 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host pybal-test2003.codfw.wmnet
  • 10:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P51277 and previous config saved to /var/cache/conftool/dbconfig/20230824-100021-ladsgroup.json
  • 09:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P51276 and previous config saved to /var/cache/conftool/dbconfig/20230824-095858-ladsgroup.json
  • 09:57 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief1001.eqiad.wmnet
  • 09:57 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1020.eqiad.wmnet
  • 09:54 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1020.eqiad.wmnet
  • 09:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief1001.eqiad.wmnet
  • 09:52 fabfur: reboot lvs1020 to apply patch (T344587)
  • 09:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 09:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 09:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T344589)', diff saved to https://phabricator.wikimedia.org/P51275 and previous config saved to /var/cache/conftool/dbconfig/20230824-095117-ladsgroup.json
  • 09:49 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host karapace1002.eqiad.wmnet
  • 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P51274 and previous config saved to /var/cache/conftool/dbconfig/20230824-094753-ladsgroup.json
  • 09:45 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test2001.codfw.wmnet
  • 09:45 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host karapace1002.eqiad.wmnet
  • 09:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P51273 and previous config saved to /var/cache/conftool/dbconfig/20230824-094515-ladsgroup.json
  • 09:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P51272 and previous config saved to /var/cache/conftool/dbconfig/20230824-094352-ladsgroup.json
  • 09:42 moritzm: removed stretch-wikimedia from apt.wikimedia.org (obsolete)
  • 09:41 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief-test2001.codfw.wmnet
  • 09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1025 (T344589)', diff saved to https://phabricator.wikimedia.org/P51271 and previous config saved to /var/cache/conftool/dbconfig/20230824-094109-ladsgroup.json
  • 09:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test1001.eqiad.wmnet
  • 09:36 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief-test1001.eqiad.wmnet
  • 09:36 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host karapace1001.eqiad.wmnet
  • 09:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P51270 and previous config saved to /var/cache/conftool/dbconfig/20230824-093611-ladsgroup.json
  • 09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P51269 and previous config saved to /var/cache/conftool/dbconfig/20230824-093247-ladsgroup.json
  • 09:32 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host karapace1001.eqiad.wmnet
  • 09:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T344589)', diff saved to https://phabricator.wikimedia.org/P51268 and previous config saved to /var/cache/conftool/dbconfig/20230824-093008-ladsgroup.json
  • 09:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T343718)', diff saved to https://phabricator.wikimedia.org/P51267 and previous config saved to /var/cache/conftool/dbconfig/20230824-092846-ladsgroup.json
  • 09:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1191 (T343718)', diff saved to https://phabricator.wikimedia.org/P51266 and previous config saved to /var/cache/conftool/dbconfig/20230824-092636-ladsgroup.json
  • 09:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 09:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 09:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T343718)', diff saved to https://phabricator.wikimedia.org/P51265 and previous config saved to /var/cache/conftool/dbconfig/20230824-092614-ladsgroup.json
  • 09:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1025', diff saved to https://phabricator.wikimedia.org/P51264 and previous config saved to /var/cache/conftool/dbconfig/20230824-092603-ladsgroup.json
  • 09:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T344589)', diff saved to https://phabricator.wikimedia.org/P51263 and previous config saved to /var/cache/conftool/dbconfig/20230824-092147-ladsgroup.json
  • 09:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T344589)', diff saved to https://phabricator.wikimedia.org/P51262 and previous config saved to /var/cache/conftool/dbconfig/20230824-092122-ladsgroup.json
  • 09:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 09:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P51261 and previous config saved to /var/cache/conftool/dbconfig/20230824-092105-ladsgroup.json
  • 09:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 09:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T344589)', diff saved to https://phabricator.wikimedia.org/P51260 and previous config saved to /var/cache/conftool/dbconfig/20230824-092056-ladsgroup.json
  • 09:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T343718)', diff saved to https://phabricator.wikimedia.org/P51259 and previous config saved to /var/cache/conftool/dbconfig/20230824-091741-ladsgroup.json
  • 09:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P51258 and previous config saved to /var/cache/conftool/dbconfig/20230824-091108-ladsgroup.json
  • 09:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1025', diff saved to https://phabricator.wikimedia.org/P51257 and previous config saved to /var/cache/conftool/dbconfig/20230824-091057-ladsgroup.json
  • 09:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T344589)', diff saved to https://phabricator.wikimedia.org/P51256 and previous config saved to /var/cache/conftool/dbconfig/20230824-090559-ladsgroup.json
  • 09:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P51255 and previous config saved to /var/cache/conftool/dbconfig/20230824-090550-ladsgroup.json
  • 08:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1222 (T344589)', diff saved to https://phabricator.wikimedia.org/P51254 and previous config saved to /var/cache/conftool/dbconfig/20230824-085834-ladsgroup.json
  • 08:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 08:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 08:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T344589)', diff saved to https://phabricator.wikimedia.org/P51253 and previous config saved to /var/cache/conftool/dbconfig/20230824-085810-ladsgroup.json
  • 08:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P51252 and previous config saved to /var/cache/conftool/dbconfig/20230824-085602-ladsgroup.json
  • 08:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1025 (T344589)', diff saved to https://phabricator.wikimedia.org/P51251 and previous config saved to /var/cache/conftool/dbconfig/20230824-085551-ladsgroup.json
  • 08:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P51250 and previous config saved to /var/cache/conftool/dbconfig/20230824-085044-ladsgroup.json
  • 08:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P51249 and previous config saved to /var/cache/conftool/dbconfig/20230824-084304-ladsgroup.json
  • 08:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2025 (T344589)', diff saved to https://phabricator.wikimedia.org/P51248 and previous config saved to /var/cache/conftool/dbconfig/20230824-084303-ladsgroup.json
  • 08:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 08:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 08:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T343718)', diff saved to https://phabricator.wikimedia.org/P51247 and previous config saved to /var/cache/conftool/dbconfig/20230824-084055-ladsgroup.json
  • 08:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T343718)', diff saved to https://phabricator.wikimedia.org/P51246 and previous config saved to /var/cache/conftool/dbconfig/20230824-083644-ladsgroup.json
  • 08:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 08:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 08:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T344589)', diff saved to https://phabricator.wikimedia.org/P51245 and previous config saved to /var/cache/conftool/dbconfig/20230824-083537-ladsgroup.json
  • 08:33 taavi@deploy1002: Finished scap: Backport for Set OATHAuth multiple devices WRITE_BOTH for all fishbowls (T242031) (duration: 07m 45s)
  • 08:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T343718)', diff saved to https://phabricator.wikimedia.org/P51244 and previous config saved to /var/cache/conftool/dbconfig/20230824-083248-ladsgroup.json
  • 08:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 08:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 08:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T343718)', diff saved to https://phabricator.wikimedia.org/P51243 and previous config saved to /var/cache/conftool/dbconfig/20230824-083226-ladsgroup.json
  • 08:30 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/termbox: apply
  • 08:30 fabfur@cumin1001: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_drmrs and A:cp
  • 08:30 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/termbox: apply
  • 08:30 fabfur@cumin1001: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_drmrs and A:cp
  • 08:29 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/termbox: apply
  • 08:28 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/termbox: apply
  • 08:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T344589)', diff saved to https://phabricator.wikimedia.org/P51242 and previous config saved to /var/cache/conftool/dbconfig/20230824-082814-ladsgroup.json
  • 08:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 08:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2025', diff saved to https://phabricator.wikimedia.org/P51241 and previous config saved to /var/cache/conftool/dbconfig/20230824-082757-ladsgroup.json
  • 08:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 08:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T344589)', diff saved to https://phabricator.wikimedia.org/P51240 and previous config saved to /var/cache/conftool/dbconfig/20230824-082748-ladsgroup.json
  • 08:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Change db2179 groups', diff saved to https://phabricator.wikimedia.org/P51239 and previous config saved to /var/cache/conftool/dbconfig/20230824-082742-ladsgroup.json
  • 08:27 taavi@deploy1002: taavi: Continuing with sync
  • 08:27 taavi@deploy1002: taavi: Backport for Set OATHAuth multiple devices WRITE_BOTH for all fishbowls (T242031) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 08:25 taavi@deploy1002: Started scap: Backport for Set OATHAuth multiple devices WRITE_BOTH for all fishbowls (T242031)
  • 08:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 08:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 08:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1003.wikimedia.org
  • 08:19 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/termbox: apply
  • 08:17 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/termbox: apply
  • 08:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P51238 and previous config saved to /var/cache/conftool/dbconfig/20230824-081720-ladsgroup.json
  • 08:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2140 T344883', diff saved to https://phabricator.wikimedia.org/P51237 and previous config saved to /var/cache/conftool/dbconfig/20230824-081654-ladsgroup.json
  • 08:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1003.wikimedia.org
  • 08:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2179 to s4 primary T344883', diff saved to https://phabricator.wikimedia.org/P51236 and previous config saved to /var/cache/conftool/dbconfig/20230824-081442-ladsgroup.json
  • 08:14 Amir1: Starting s4 codfw failover from db2140 to db2179 - T344883
  • 08:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org
  • 08:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P51235 and previous config saved to /var/cache/conftool/dbconfig/20230824-081229-ladsgroup.json
  • 08:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org
  • 08:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host build2001.codfw.wmnet
  • 08:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2025', diff saved to https://phabricator.wikimedia.org/P51234 and previous config saved to /var/cache/conftool/dbconfig/20230824-080534-ladsgroup.json
  • 08:05 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 08:05 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 08:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T344589)', diff saved to https://phabricator.wikimedia.org/P51233 and previous config saved to /var/cache/conftool/dbconfig/20230824-080522-ladsgroup.json
  • 08:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 08:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 08:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T343718)', diff saved to https://phabricator.wikimedia.org/P51232 and previous config saved to /var/cache/conftool/dbconfig/20230824-080316-ladsgroup.json
  • 08:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P51231 and previous config saved to /var/cache/conftool/dbconfig/20230824-080214-ladsgroup.json
  • 08:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host build2001.codfw.wmnet
  • 07:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1197 (T344589)', diff saved to https://phabricator.wikimedia.org/P51230 and previous config saved to /var/cache/conftool/dbconfig/20230824-075906-ladsgroup.json
  • 07:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 07:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 07:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T344589)', diff saved to https://phabricator.wikimedia.org/P51229 and previous config saved to /var/cache/conftool/dbconfig/20230824-075842-ladsgroup.json
  • 07:58 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:57 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 07:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P51228 and previous config saved to /var/cache/conftool/dbconfig/20230824-075722-ladsgroup.json
  • 07:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1025 (T344589)', diff saved to https://phabricator.wikimedia.org/P51227 and previous config saved to /var/cache/conftool/dbconfig/20230824-075529-ladsgroup.json
  • 07:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1025.eqiad.wmnet with reason: Maintenance
  • 07:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1025.eqiad.wmnet with reason: Maintenance
  • 07:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1024 (T344589)', diff saved to https://phabricator.wikimedia.org/P51226 and previous config saved to /var/cache/conftool/dbconfig/20230824-075505-ladsgroup.json
  • 07:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2025 (T344589)', diff saved to https://phabricator.wikimedia.org/P51225 and previous config saved to /var/cache/conftool/dbconfig/20230824-075028-ladsgroup.json
  • 07:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P51224 and previous config saved to /var/cache/conftool/dbconfig/20230824-074810-ladsgroup.json
  • 07:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T343718)', diff saved to https://phabricator.wikimedia.org/P51223 and previous config saved to /var/cache/conftool/dbconfig/20230824-074708-ladsgroup.json
  • 07:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P51222 and previous config saved to /var/cache/conftool/dbconfig/20230824-074336-ladsgroup.json
  • 07:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 T344883
  • 07:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 T344883
  • 07:42 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 07:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T344589)', diff saved to https://phabricator.wikimedia.org/P51221 and previous config saved to /var/cache/conftool/dbconfig/20230824-074216-ladsgroup.json
  • 07:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6002.wikimedia.org
  • 07:41 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 07:41 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 07:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1024', diff saved to https://phabricator.wikimedia.org/P51220 and previous config saved to /var/cache/conftool/dbconfig/20230824-073959-ladsgroup.json
  • 07:39 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 07:39 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 07:39 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 07:39 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 07:38 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 07:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6002.wikimedia.org
  • 07:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install5002.wikimedia.org
  • 07:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T344589)', diff saved to https://phabricator.wikimedia.org/P51219 and previous config saved to /var/cache/conftool/dbconfig/20230824-073355-ladsgroup.json
  • 07:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P51218 and previous config saved to /var/cache/conftool/dbconfig/20230824-073304-ladsgroup.json
  • 07:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install5002.wikimedia.org
  • 07:30 apergos: UTC morning backport and config deployment window complete
  • 07:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install4002.wikimedia.org
  • 07:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P51217 and previous config saved to /var/cache/conftool/dbconfig/20230824-072829-ladsgroup.json
  • 07:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install4002.wikimedia.org
  • 07:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1024', diff saved to https://phabricator.wikimedia.org/P51216 and previous config saved to /var/cache/conftool/dbconfig/20230824-072453-ladsgroup.json
  • 07:23 ariel@deploy1002: Finished scap: Backport for [enwiktionary] Remove the Index and Index_talk namespaces (T344816) (duration: 10m 01s)
  • 07:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Deool es2025', diff saved to https://phabricator.wikimedia.org/P51215 and previous config saved to /var/cache/conftool/dbconfig/20230824-072301-ladsgroup.json
  • 07:22 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:22 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 07:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3003.wikimedia.org
  • 07:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P51214 and previous config saved to /var/cache/conftool/dbconfig/20230824-071849-ladsgroup.json
  • 07:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T343718)', diff saved to https://phabricator.wikimedia.org/P51213 and previous config saved to /var/cache/conftool/dbconfig/20230824-071757-ladsgroup.json
  • 07:17 ariel@deploy1002: zoranzoki21 and ariel: Continuing with sync
  • 07:14 ariel@deploy1002: zoranzoki21 and ariel: Backport for [enwiktionary] Remove the Index and Index_talk namespaces (T344816) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3003.wikimedia.org
  • 07:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T344589)', diff saved to https://phabricator.wikimedia.org/P51212 and previous config saved to /var/cache/conftool/dbconfig/20230824-071323-ladsgroup.json
  • 07:13 ariel@deploy1002: Started scap: Backport for [enwiktionary] Remove the Index and Index_talk namespaces (T344816)
  • 07:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T344589)', diff saved to https://phabricator.wikimedia.org/P51211 and previous config saved to /var/cache/conftool/dbconfig/20230824-071204-ladsgroup.json
  • 07:09 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 07:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1024 (T344589)', diff saved to https://phabricator.wikimedia.org/P51210 and previous config saved to /var/cache/conftool/dbconfig/20230824-070946-ladsgroup.json
  • 07:08 jayme@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 07:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T343718)', diff saved to https://phabricator.wikimedia.org/P51209 and previous config saved to /var/cache/conftool/dbconfig/20230824-070723-ladsgroup.json
  • 07:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 07:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1188 (T344589)', diff saved to https://phabricator.wikimedia.org/P51208 and previous config saved to /var/cache/conftool/dbconfig/20230824-070710-ladsgroup.json
  • 07:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 07:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 07:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T343718)', diff saved to https://phabricator.wikimedia.org/P51207 and previous config saved to /var/cache/conftool/dbconfig/20230824-070702-ladsgroup.json
  • 07:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 07:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T344589)', diff saved to https://phabricator.wikimedia.org/P51206 and previous config saved to /var/cache/conftool/dbconfig/20230824-070646-ladsgroup.json
  • 07:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install2004.wikimedia.org
  • 07:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Pool es2025', diff saved to https://phabricator.wikimedia.org/P51205 and previous config saved to /var/cache/conftool/dbconfig/20230824-070417-ladsgroup.json
  • 07:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P51204 and previous config saved to /var/cache/conftool/dbconfig/20230824-070343-ladsgroup.json
  • 07:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2025 (T344589)', diff saved to https://phabricator.wikimedia.org/P51203 and previous config saved to /var/cache/conftool/dbconfig/20230824-070332-ladsgroup.json
  • 07:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2025.codfw.wmnet with reason: Maintenance
  • 07:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2025.codfw.wmnet with reason: Maintenance
  • 07:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2023 (T344589)', diff saved to https://phabricator.wikimedia.org/P51202 and previous config saved to /var/cache/conftool/dbconfig/20230824-070307-ladsgroup.json
  • 07:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install2004.wikimedia.org
  • 07:01 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 07:01 jayme@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 06:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install1004.wikimedia.org
  • 06:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P51201 and previous config saved to /var/cache/conftool/dbconfig/20230824-065658-ladsgroup.json
  • 06:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install1004.wikimedia.org
  • 06:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P51200 and previous config saved to /var/cache/conftool/dbconfig/20230824-065155-ladsgroup.json
  • 06:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P51199 and previous config saved to /var/cache/conftool/dbconfig/20230824-065140-ladsgroup.json
  • 06:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T344589)', diff saved to https://phabricator.wikimedia.org/P51198 and previous config saved to /var/cache/conftool/dbconfig/20230824-064830-ladsgroup.json
  • 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast1003.wikimedia.org
  • 06:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2023', diff saved to https://phabricator.wikimedia.org/P51197 and previous config saved to /var/cache/conftool/dbconfig/20230824-064801-ladsgroup.json
  • 06:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast1003.wikimedia.org
  • 06:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3314 (T344589)', diff saved to https://phabricator.wikimedia.org/P51196 and previous config saved to /var/cache/conftool/dbconfig/20230824-064205-ladsgroup.json
  • 06:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P51195 and previous config saved to /var/cache/conftool/dbconfig/20230824-064152-ladsgroup.json
  • 06:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3312 (T344589)', diff saved to https://phabricator.wikimedia.org/P51194 and previous config saved to /var/cache/conftool/dbconfig/20230824-064044-ladsgroup.json
  • 06:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 06:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 06:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T344589)', diff saved to https://phabricator.wikimedia.org/P51193 and previous config saved to /var/cache/conftool/dbconfig/20230824-064030-ladsgroup.json
  • 06:40 Amir1: killed mwscript updateSpecialPages.php metawiki --override --only=Mostlinked blocking db depool
  • 06:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P51192 and previous config saved to /var/cache/conftool/dbconfig/20230824-063649-ladsgroup.json
  • 06:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P51191 and previous config saved to /var/cache/conftool/dbconfig/20230824-063633-ladsgroup.json
  • 06:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2023', diff saved to https://phabricator.wikimedia.org/P51190 and previous config saved to /var/cache/conftool/dbconfig/20230824-063255-ladsgroup.json
  • 06:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2179 with weight 0 T344883', diff saved to https://phabricator.wikimedia.org/P51189 and previous config saved to /var/cache/conftool/dbconfig/20230824-063240-ladsgroup.json
  • 06:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 T344883
  • 06:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 T344883
  • 06:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T343718)', diff saved to https://phabricator.wikimedia.org/P51188 and previous config saved to /var/cache/conftool/dbconfig/20230824-062824-ladsgroup.json
  • 06:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 06:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 06:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T343718)', diff saved to https://phabricator.wikimedia.org/P51187 and previous config saved to /var/cache/conftool/dbconfig/20230824-062802-ladsgroup.json
  • 06:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 06:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 06:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T344589)', diff saved to https://phabricator.wikimedia.org/P51186 and previous config saved to /var/cache/conftool/dbconfig/20230824-062645-ladsgroup.json
  • 06:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P51185 and previous config saved to /var/cache/conftool/dbconfig/20230824-062523-ladsgroup.json
  • 06:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T343718)', diff saved to https://phabricator.wikimedia.org/P51184 and previous config saved to /var/cache/conftool/dbconfig/20230824-062143-ladsgroup.json
  • 06:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T344589)', diff saved to https://phabricator.wikimedia.org/P51183 and previous config saved to /var/cache/conftool/dbconfig/20230824-062127-ladsgroup.json
  • 06:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2179 (T344589)', diff saved to https://phabricator.wikimedia.org/P51182 and previous config saved to /var/cache/conftool/dbconfig/20230824-061813-ladsgroup.json
  • 06:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 06:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 06:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2023 (T344589)', diff saved to https://phabricator.wikimedia.org/P51181 and previous config saved to /var/cache/conftool/dbconfig/20230824-061748-ladsgroup.json
  • 06:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T344589)', diff saved to https://phabricator.wikimedia.org/P51180 and previous config saved to /var/cache/conftool/dbconfig/20230824-061413-ladsgroup.json
  • 06:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 06:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 06:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T344589)', diff saved to https://phabricator.wikimedia.org/P51179 and previous config saved to /var/cache/conftool/dbconfig/20230824-061348-ladsgroup.json
  • 06:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P51178 and previous config saved to /var/cache/conftool/dbconfig/20230824-061256-ladsgroup.json
  • 06:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P51177 and previous config saved to /var/cache/conftool/dbconfig/20230824-061017-ladsgroup.json
  • 06:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 06:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 06:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1024 (T344589)', diff saved to https://phabricator.wikimedia.org/P51176 and previous config saved to /var/cache/conftool/dbconfig/20230824-060924-ladsgroup.json
  • 06:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1024.eqiad.wmnet with reason: Maintenance
  • 06:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1024.eqiad.wmnet with reason: Maintenance
  • 06:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1160 T344881', diff saved to https://phabricator.wikimedia.org/P51175 and previous config saved to /var/cache/conftool/dbconfig/20230824-060647-ladsgroup.json
  • 06:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1138 to s4 primary and set section read-write T344881', diff saved to https://phabricator.wikimedia.org/P51174 and previous config saved to /var/cache/conftool/dbconfig/20230824-060245-ladsgroup.json
  • 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s4 eqiad as read-only for maintenance - T344881', diff saved to https://phabricator.wikimedia.org/P51173 and previous config saved to /var/cache/conftool/dbconfig/20230824-060157-ladsgroup.json
  • 06:01 Amir1: Starting s4 eqiad failover from db1160 to db1138 - T344881
  • 05:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2023 (T344589)', diff saved to https://phabricator.wikimedia.org/P51172 and previous config saved to /var/cache/conftool/dbconfig/20230824-055846-ladsgroup.json
  • 05:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P51171 and previous config saved to /var/cache/conftool/dbconfig/20230824-055842-ladsgroup.json
  • 05:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2023.codfw.wmnet with reason: Maintenance
  • 05:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2023.codfw.wmnet with reason: Maintenance
  • 05:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P51170 and previous config saved to /var/cache/conftool/dbconfig/20230824-055750-ladsgroup.json
  • 05:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T344589)', diff saved to https://phabricator.wikimedia.org/P51169 and previous config saved to /var/cache/conftool/dbconfig/20230824-055511-ladsgroup.json
  • 05:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T344589)', diff saved to https://phabricator.wikimedia.org/P51168 and previous config saved to /var/cache/conftool/dbconfig/20230824-054726-ladsgroup.json
  • 05:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 05:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 05:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 05:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 05:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T344589)', diff saved to https://phabricator.wikimedia.org/P51167 and previous config saved to /var/cache/conftool/dbconfig/20230824-054656-ladsgroup.json
  • 05:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P51166 and previous config saved to /var/cache/conftool/dbconfig/20230824-054335-ladsgroup.json
  • 05:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T343718)', diff saved to https://phabricator.wikimedia.org/P51165 and previous config saved to /var/cache/conftool/dbconfig/20230824-054244-ladsgroup.json
  • 05:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T343718)', diff saved to https://phabricator.wikimedia.org/P51164 and previous config saved to /var/cache/conftool/dbconfig/20230824-054044-ladsgroup.json
  • 05:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 05:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T343718)', diff saved to https://phabricator.wikimedia.org/P51163 and previous config saved to /var/cache/conftool/dbconfig/20230824-054033-ladsgroup.json
  • 05:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 05:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T343718)', diff saved to https://phabricator.wikimedia.org/P51162 and previous config saved to /var/cache/conftool/dbconfig/20230824-054023-ladsgroup.json
  • 05:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 05:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 05:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T343718)', diff saved to https://phabricator.wikimedia.org/P51161 and previous config saved to /var/cache/conftool/dbconfig/20230824-054005-ladsgroup.json
  • 05:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P51160 and previous config saved to /var/cache/conftool/dbconfig/20230824-053150-ladsgroup.json
  • 05:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T344589)', diff saved to https://phabricator.wikimedia.org/P51159 and previous config saved to /var/cache/conftool/dbconfig/20230824-052829-ladsgroup.json
  • 05:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P51158 and previous config saved to /var/cache/conftool/dbconfig/20230824-052517-ladsgroup.json
  • 05:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P51157 and previous config saved to /var/cache/conftool/dbconfig/20230824-052459-ladsgroup.json
  • 05:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T344589)', diff saved to https://phabricator.wikimedia.org/P51156 and previous config saved to /var/cache/conftool/dbconfig/20230824-052208-ladsgroup.json
  • 05:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 05:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 05:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T344589)', diff saved to https://phabricator.wikimedia.org/P51155 and previous config saved to /var/cache/conftool/dbconfig/20230824-052138-ladsgroup.json
  • 05:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1138 with weight 0 T344881', diff saved to https://phabricator.wikimedia.org/P51154 and previous config saved to /var/cache/conftool/dbconfig/20230824-051951-ladsgroup.json
  • 05:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 T344881
  • 05:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 T344881
  • 05:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P51153 and previous config saved to /var/cache/conftool/dbconfig/20230824-051644-ladsgroup.json
  • 05:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2022 (T344589)', diff saved to https://phabricator.wikimedia.org/P51152 and previous config saved to /var/cache/conftool/dbconfig/20230824-051259-ladsgroup.json
  • 05:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P51151 and previous config saved to /var/cache/conftool/dbconfig/20230824-051010-ladsgroup.json
  • 05:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P51150 and previous config saved to /var/cache/conftool/dbconfig/20230824-050953-ladsgroup.json
  • 05:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P51149 and previous config saved to /var/cache/conftool/dbconfig/20230824-050632-ladsgroup.json
  • 05:01 ladsgroup@deploy1002: Finished scap: Backport for Stop writing to the old columns of extlinks in enwiki (T342683) (duration: 08m 16s)
  • 05:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T344589)', diff saved to https://phabricator.wikimedia.org/P51148 and previous config saved to /var/cache/conftool/dbconfig/20230824-050137-ladsgroup.json
  • 04:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2022', diff saved to https://phabricator.wikimedia.org/P51147 and previous config saved to /var/cache/conftool/dbconfig/20230824-045753-ladsgroup.json
  • 04:56 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 04:55 ladsgroup@deploy1002: ladsgroup: Backport for Stop writing to the old columns of extlinks in enwiki (T342683) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 04:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T343718)', diff saved to https://phabricator.wikimedia.org/P51146 and previous config saved to /var/cache/conftool/dbconfig/20230824-045504-ladsgroup.json
  • 04:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T343718)', diff saved to https://phabricator.wikimedia.org/P51145 and previous config saved to /var/cache/conftool/dbconfig/20230824-045447-ladsgroup.json
  • 04:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T344589)', diff saved to https://phabricator.wikimedia.org/P51144 and previous config saved to /var/cache/conftool/dbconfig/20230824-045352-ladsgroup.json
  • 04:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 04:53 ladsgroup@deploy1002: Started scap: Backport for Stop writing to the old columns of extlinks in enwiki (T342683)
  • 04:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 04:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T344589)', diff saved to https://phabricator.wikimedia.org/P51143 and previous config saved to /var/cache/conftool/dbconfig/20230824-045326-ladsgroup.json
  • 04:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1136 (T343718)', diff saved to https://phabricator.wikimedia.org/P51142 and previous config saved to /var/cache/conftool/dbconfig/20230824-045236-ladsgroup.json
  • 04:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 04:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 04:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T343718)', diff saved to https://phabricator.wikimedia.org/P51141 and previous config saved to /var/cache/conftool/dbconfig/20230824-045215-ladsgroup.json
  • 04:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P51140 and previous config saved to /var/cache/conftool/dbconfig/20230824-045125-ladsgroup.json
  • 04:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2022', diff saved to https://phabricator.wikimedia.org/P51139 and previous config saved to /var/cache/conftool/dbconfig/20230824-044247-ladsgroup.json
  • 04:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P51138 and previous config saved to /var/cache/conftool/dbconfig/20230824-043820-ladsgroup.json
  • 04:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P51137 and previous config saved to /var/cache/conftool/dbconfig/20230824-043709-ladsgroup.json
  • 04:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T344589)', diff saved to https://phabricator.wikimedia.org/P51136 and previous config saved to /var/cache/conftool/dbconfig/20230824-043619-ladsgroup.json
  • 04:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T344589)', diff saved to https://phabricator.wikimedia.org/P51135 and previous config saved to /var/cache/conftool/dbconfig/20230824-043334-ladsgroup.json
  • 04:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2022 (T344589)', diff saved to https://phabricator.wikimedia.org/P51134 and previous config saved to /var/cache/conftool/dbconfig/20230824-042740-ladsgroup.json
  • 04:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P51133 and previous config saved to /var/cache/conftool/dbconfig/20230824-042314-ladsgroup.json
  • 04:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P51132 and previous config saved to /var/cache/conftool/dbconfig/20230824-042202-ladsgroup.json
  • 04:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P51131 and previous config saved to /var/cache/conftool/dbconfig/20230824-041827-ladsgroup.json
  • 04:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T344589)', diff saved to https://phabricator.wikimedia.org/P51130 and previous config saved to /var/cache/conftool/dbconfig/20230824-041759-ladsgroup.json
  • 04:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2022 (T344589)', diff saved to https://phabricator.wikimedia.org/P51129 and previous config saved to /var/cache/conftool/dbconfig/20230824-041537-ladsgroup.json
  • 04:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2022.codfw.wmnet with reason: Maintenance
  • 04:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2022.codfw.wmnet with reason: Maintenance
  • 04:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T343718)', diff saved to https://phabricator.wikimedia.org/P51128 and previous config saved to /var/cache/conftool/dbconfig/20230824-041421-ladsgroup.json
  • 04:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 04:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 04:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T344589)', diff saved to https://phabricator.wikimedia.org/P51127 and previous config saved to /var/cache/conftool/dbconfig/20230824-040808-ladsgroup.json
  • 04:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T343718)', diff saved to https://phabricator.wikimedia.org/P51126 and previous config saved to /var/cache/conftool/dbconfig/20230824-040656-ladsgroup.json
  • 04:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P51125 and previous config saved to /var/cache/conftool/dbconfig/20230824-040321-ladsgroup.json
  • 04:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P51124 and previous config saved to /var/cache/conftool/dbconfig/20230824-040253-ladsgroup.json
  • 04:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T344589)', diff saved to https://phabricator.wikimedia.org/P51123 and previous config saved to /var/cache/conftool/dbconfig/20230824-040139-ladsgroup.json
  • 04:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 04:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 03:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 03:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 03:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T344589)', diff saved to https://phabricator.wikimedia.org/P51122 and previous config saved to /var/cache/conftool/dbconfig/20230824-034815-ladsgroup.json
  • 03:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P51121 and previous config saved to /var/cache/conftool/dbconfig/20230824-034747-ladsgroup.json
  • 03:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1138 (T344589)', diff saved to https://phabricator.wikimedia.org/P51120 and previous config saved to /var/cache/conftool/dbconfig/20230824-034056-ladsgroup.json
  • 03:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 03:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 03:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T344589)', diff saved to https://phabricator.wikimedia.org/P51119 and previous config saved to /var/cache/conftool/dbconfig/20230824-033240-ladsgroup.json
  • 03:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 03:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 03:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T344589)', diff saved to https://phabricator.wikimedia.org/P51118 and previous config saved to /var/cache/conftool/dbconfig/20230824-032633-ladsgroup.json
  • 03:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T343718)', diff saved to https://phabricator.wikimedia.org/P51117 and previous config saved to /var/cache/conftool/dbconfig/20230824-032545-ladsgroup.json
  • 03:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 03:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 03:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T344589)', diff saved to https://phabricator.wikimedia.org/P51116 and previous config saved to /var/cache/conftool/dbconfig/20230824-032508-ladsgroup.json
  • 03:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 03:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 03:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T344589)', diff saved to https://phabricator.wikimedia.org/P51115 and previous config saved to /var/cache/conftool/dbconfig/20230824-032443-ladsgroup.json
  • 03:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P51114 and previous config saved to /var/cache/conftool/dbconfig/20230824-030937-ladsgroup.json
  • 02:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P51113 and previous config saved to /var/cache/conftool/dbconfig/20230824-025431-ladsgroup.json
  • 02:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 02:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 02:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T344589)', diff saved to https://phabricator.wikimedia.org/P51112 and previous config saved to /var/cache/conftool/dbconfig/20230824-023924-ladsgroup.json
  • 02:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1178.eqiad.wmnet with reason: Host needs maint
  • 02:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1178.eqiad.wmnet with reason: Host needs maint
  • 02:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T344589)', diff saved to https://phabricator.wikimedia.org/P51111 and previous config saved to /var/cache/conftool/dbconfig/20230824-023407-ladsgroup.json
  • 02:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 02:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 00:24 eileen: civicrm upgraded from 9afa91fb to 6a2cdf10
  • 00:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2040.codfw.wmnet with OS bullseye
  • 00:20 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:11 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"

2023-08-23

  • 23:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2040.codfw.wmnet with reason: host reimage
  • 23:51 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2040.codfw.wmnet with reason: host reimage
  • 23:47 eileen: civicrm upgraded from bfedbcb9 to 9afa91fb
  • 23:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2041.codfw.wmnet with OS bullseye
  • 23:45 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:43 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 23:40 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2042.codfw.wmnet with OS bullseye
  • 23:34 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:30 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2040.codfw.wmnet with OS bullseye
  • 23:29 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2040']
  • 23:24 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2041.codfw.wmnet with reason: host reimage
  • 23:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2041.codfw.wmnet with reason: host reimage
  • 23:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2042.codfw.wmnet with reason: host reimage
  • 23:10 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2042.codfw.wmnet with reason: host reimage
  • 23:10 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2040']
  • 23:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2043.codfw.wmnet with OS bullseye
  • 23:09 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:07 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2040.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2041.codfw.wmnet with OS bullseye
  • 22:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2041']
  • 22:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2044.codfw.wmnet with OS bullseye
  • 22:52 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2043.codfw.wmnet with reason: host reimage
  • 22:50 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:50 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2040.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2045.codfw.wmnet with OS bullseye
  • 22:49 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:49 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2042.codfw.wmnet with OS bullseye
  • 22:48 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2043.codfw.wmnet with reason: host reimage
  • 22:47 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:46 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2041']
  • 22:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2042']
  • 22:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2041.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2044.codfw.wmnet with reason: host reimage
  • 22:33 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2041.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:33 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2042']
  • 22:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2045.codfw.wmnet with reason: host reimage
  • 22:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2046.codfw.wmnet with OS bullseye
  • 22:31 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2042.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:30 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2044.codfw.wmnet with reason: host reimage
  • 22:30 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:26 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2045.codfw.wmnet with reason: host reimage
  • 22:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2043.codfw.wmnet with OS bullseye
  • 22:24 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2043']
  • 22:20 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2042.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:19 ryankemper@cumin1001: START - Cookbook sre.wdqs.reboot
  • 22:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2046.codfw.wmnet with reason: host reimage
  • 22:13 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2043']
  • 22:11 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2046.codfw.wmnet with reason: host reimage
  • 22:09 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2044.codfw.wmnet with OS bullseye
  • 22:08 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_codfw and A:cp
  • 22:06 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2048.codfw.wmnet with OS bullseye
  • 22:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2045.codfw.wmnet with OS bullseye
  • 22:04 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_codfw and A:cp
  • 22:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2043.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2044']
  • 22:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2045']
  • 21:57 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2027.codfw.wmnet
  • 21:52 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2043.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:51 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2044']
  • 21:51 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes2044']
  • 21:51 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2044']
  • 21:50 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2045']
  • 21:49 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2046.codfw.wmnet with OS bullseye
  • 21:49 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2027.codfw.wmnet
  • 21:49 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2026.codfw.wmnet
  • 21:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2047.codfw.wmnet with OS bullseye
  • 21:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2046']
  • 21:40 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2026.codfw.wmnet
  • 21:40 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2023.codfw.wmnet
  • 21:38 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:32 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2023.codfw.wmnet
  • 21:32 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2018.codfw.wmnet
  • 21:23 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2018.codfw.wmnet
  • 21:23 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2017.codfw.wmnet
  • 21:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2047.codfw.wmnet with reason: host reimage
  • 21:19 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2047.codfw.wmnet with reason: host reimage
  • 21:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2044.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:15 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2017.codfw.wmnet
  • 21:15 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2012.codfw.wmnet
  • 21:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2045.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:07 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2012.codfw.wmnet
  • 21:06 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2044.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:05 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs1009.eqiad.wmnet with reason: jnl export/downtime test
  • 21:05 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs1009.eqiad.wmnet with reason: jnl export/downtime test
  • 21:05 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2025.codfw.wmnet
  • 21:05 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2045.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2046']
  • 21:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2046.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2049.codfw.wmnet with OS bullseye
  • 21:02 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2050.codfw.wmnet with OS bullseye
  • 21:02 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:01 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:58 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:58 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2047.codfw.wmnet with OS bullseye
  • 20:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2047']
  • 20:57 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2025.codfw.wmnet
  • 20:57 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2022.codfw.wmnet
  • 20:55 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2046.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2051.codfw.wmnet with OS bullseye
  • 20:54 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:53 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:48 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2022.codfw.wmnet
  • 20:48 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2020.codfw.wmnet
  • 20:46 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2048.codfw.wmnet with OS bullseye
  • 20:45 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2047']
  • 20:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2047.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2049.codfw.wmnet with reason: host reimage
  • 20:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2048']
  • 20:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2050.codfw.wmnet with reason: host reimage
  • 20:41 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2020.codfw.wmnet
  • 20:41 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2016.codfw.wmnet
  • 20:40 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2049.codfw.wmnet with reason: host reimage
  • 20:39 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2050.codfw.wmnet with reason: host reimage
  • 20:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2051.codfw.wmnet with reason: host reimage
  • 20:35 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2047.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:34 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2051.codfw.wmnet with reason: host reimage
  • 20:33 hmonroy@deploy1002: Finished scap: Backport for clienthints: Lower API max lag time to 5 minutes on group0 and 1 (T344797) (duration: 07m 09s)
  • 20:33 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2048']
  • 20:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2052.codfw.wmnet with OS bullseye
  • 20:32 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:32 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2016.codfw.wmnet
  • 20:32 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2015.codfw.wmnet
  • 20:30 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2048.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:28 hmonroy@deploy1002: dreamyjazz and hmonroy: Continuing with sync
  • 20:28 hmonroy@deploy1002: dreamyjazz and hmonroy: Backport for clienthints: Lower API max lag time to 5 minutes on group0 and 1 (T344797) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:26 hmonroy@deploy1002: Started scap: Backport for clienthints: Lower API max lag time to 5 minutes on group0 and 1 (T344797)
  • 20:25 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2015.codfw.wmnet
  • 20:23 hmonroy@deploy1002: Finished scap: Backport for wikidiff2: set maxSplitSize = 10 on group1 wikis (T341754) (duration: 10m 24s)
  • 20:18 hmonroy@deploy1002: hmonroy: Continuing with sync
  • 20:17 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2024.codfw.wmnet
  • 20:17 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2048.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:15 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2049.codfw.wmnet with OS bullseye
  • 20:15 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2050.codfw.wmnet with OS bullseye
  • 20:14 hmonroy@deploy1002: hmonroy: Backport for wikidiff2: set maxSplitSize = 10 on group1 wikis (T341754) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2052.codfw.wmnet with reason: host reimage
  • 20:13 hmonroy@deploy1002: Started scap: Backport for wikidiff2: set maxSplitSize = 10 on group1 wikis (T341754)
  • 20:11 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2052.codfw.wmnet with reason: host reimage
  • 20:11 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2051.codfw.wmnet with OS bullseye
  • 20:09 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2024.codfw.wmnet
  • 20:09 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2021.codfw.wmnet
  • 20:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2049.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2050.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:00 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2021.codfw.wmnet
  • 20:00 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2019.codfw.wmnet
  • 19:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2051.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:55 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2049.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:53 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2050.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:52 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2019.codfw.wmnet
  • 19:52 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2014.codfw.wmnet
  • 19:48 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2051.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:47 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2052.codfw.wmnet with OS bullseye
  • 19:46 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes2051.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2052']
  • 19:43 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2014.codfw.wmnet
  • 19:43 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2013.codfw.wmnet
  • 19:35 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2013.codfw.wmnet
  • 19:34 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2051.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:32 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:32 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add mgmt DNS for kubernetes2051 - pt1979@cumin2002"
  • 19:32 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2052']
  • 19:31 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add mgmt DNS for kubernetes2051 - pt1979@cumin2002"
  • 19:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2053.codfw.wmnet with OS bullseye
  • 19:31 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2052.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:29 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 19:28 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:21 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cassandra-dev2003.codfw.wmnet
  • 19:20 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2052.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:14 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host cassandra-dev2003.codfw.wmnet
  • 19:13 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cassandra-dev2002.codfw.wmnet
  • 19:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2053.codfw.wmnet with reason: host reimage
  • 19:09 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2053.codfw.wmnet with reason: host reimage
  • 19:06 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host cassandra-dev2002.codfw.wmnet
  • 18:57 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cassandra-dev2001.codfw.wmnet
  • 18:56 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@33de526]: (no justification provided) (duration: 00m 20s)
  • 18:55 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@33de526]: (no justification provided)
  • 18:45 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host cassandra-dev2001.codfw.wmnet
  • 18:45 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2053.codfw.wmnet with OS bullseye
  • 18:38 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2053.codfw.wmnet with OS bullseye
  • 18:19 dduvall@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.23 refs T343725 (duration: 06m 01s)
  • 18:19 herron: re-enabled icinga meta-monitoring on wikitech-static
  • 18:17 denisse: alert hosts maintenance finished
  • 18:13 denisse: making alert1001 the primary alert host
  • 18:09 denisse: updating DNS to point to alert1001
  • 18:03 denisse: failing over from alert2001 to alert1001
  • 17:51 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host alert1001.wikimedia.org
  • 17:51 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host alert1001.wikimedia.org
  • 17:47 denisse: make alert2001 the active host
  • 17:31 denisse: failing over alert1001 to alert2001
  • 17:24 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_codfw and A:cp
  • 17:24 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_codfw and A:cp
  • 17:23 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2053.codfw.wmnet with OS bullseye
  • 17:23 brett@cumin2002: END (ERROR) - Cookbook sre.cdn.roll-reboot (exit_code=97) rolling reboot on A:cp-upload_eqiad and A:cp
  • 17:23 brett@cumin2002: END (ERROR) - Cookbook sre.cdn.roll-reboot (exit_code=97) rolling reboot on A:cp-text_eqiad and A:cp
  • 17:22 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_eqiad and A:cp
  • 17:22 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_eqiad and A:cp
  • 17:20 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:20 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for kubernetes2040-kubernetes2052 - pt1979@cumin2002"
  • 17:19 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for kubernetes2040-kubernetes2052 - pt1979@cumin2002"
  • 17:19 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
  • 17:19 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
  • 17:17 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 17:10 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2053']
  • 17:07 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host alert2001.wikimedia.org
  • 17:07 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host alert2001.wikimedia.org
  • 17:06 denisse: reboot alert2001 for a kernel upgrade
  • 17:05 herron: set icinga downtime on wikitech-static
  • 17:03 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs1009.eqiad.wmnet with reason: jnl export
  • 17:03 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs1009.eqiad.wmnet with reason: jnl export
  • 17:00 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2053']
  • 16:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2053.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:45 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2053.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:45 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:43 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:43 pt1979@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 16:43 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for kubernetes2053 - pt1979@cumin2002"
  • 16:37 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
  • 16:35 bblack: cp3067-81 - rolling restart of varnish frontends (one at a time, 30 minute sleep between, will run for ~7.5h), for experimental cache memory settings from https://gerrit.wikimedia.org/r/c/operations/puppet/+/951949
  • 16:27 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
  • 16:25 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 16:24 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 16:17 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
  • 16:17 effie: depool maps/karothertian codfw
  • 16:10 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 16:09 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 16:09 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_eqiad and A:cp
  • 16:08 hnowlan@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 16:07 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_eqiad and A:cp
  • 16:07 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:07 hnowlan@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 16:06 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 16:06 hnowlan@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:05 hnowlan@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:57 bblack: cp3066 - varnish-frontend-restart for new memory params experiment
  • 15:55 effie: pooled codfw kartotherian/maps
  • 15:54 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
  • 15:44 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
  • 15:44 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1117.eqiad.wmnet with OS bullseye
  • 15:40 eevans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
  • 15:40 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 15:40 eevans@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
  • 15:39 eevans@deploy1002: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
  • 15:39 eevans@deploy1002: helmfile [codfw] START helmfile.d/services/sessionstore: apply
  • 15:38 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 15:37 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 15:37 eevans@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 15:35 eevans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/echostore: apply
  • 15:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm2004.codfw.wmnet with OS bookworm
  • 15:34 eevans@deploy1002: helmfile [eqiad] START helmfile.d/services/echostore: apply
  • 15:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: esams sandbox - ayounsi@cumin1001"
  • 15:33 eevans@deploy1002: helmfile [codfw] DONE helmfile.d/services/echostore: apply
  • 15:32 btullis@cumin1001: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid analytics cluster: Reboot Druid nodes
  • 15:31 eevans@deploy1002: helmfile [codfw] START helmfile.d/services/echostore: apply
  • 15:31 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: esams sandbox - ayounsi@cumin1001"
  • 15:30 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/echostore: apply
  • 15:29 eevans@deploy1002: helmfile [staging] START helmfile.d/services/echostore: apply
  • 15:29 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 15:21 brennen@deploy1002: Finished deploy [phabricator/deployment@82e8e76]: update phabricator to phorge (T333885) (duration: 00m 38s)
  • 15:21 brennen@deploy1002: Started deploy [phabricator/deployment@82e8e76]: update phabricator to phorge (T333885)
  • 15:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2004.codfw.wmnet with reason: host reimage
  • 15:16 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2004.codfw.wmnet with reason: host reimage
  • 15:12 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0) for Hadoop test cluster
  • 15:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cuminunpriv1001.eqiad.wmnet
  • 15:11 brennen@deploy1002: Finished deploy [phabricator/deployment@82e8e76]: update phabricator to phorge (T333885) (duration: 00m 34s)
  • 15:10 brennen@deploy1002: Started deploy [phabricator/deployment@82e8e76]: update phabricator to phorge (T333885)
  • 15:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cuminunpriv1001.eqiad.wmnet
  • 15:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast4004.wikimedia.org
  • 15:03 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1004.eqiad.wmnet with reason: Switch Phabricator to Phorge
  • 15:02 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1004.eqiad.wmnet with reason: Switch Phabricator to Phorge
  • 15:00 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-launcher1002.eqiad.wmnet
  • 14:59 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host testvm2004.codfw.wmnet with OS bookworm
  • 14:59 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
  • 14:59 akosiaris: pool kartotherian in codfw for testing T344324
  • 14:58 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 14:58 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast4004.wikimedia.org
  • 14:58 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 14:57 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1117.eqiad.wmnet with OS bullseye
  • 14:57 akosiaris: deploy codfw tegola-vector-tiles with high CPU limits to rule out a hunch. T344324
  • 14:56 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 14:50 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-launcher1002.eqiad.wmnet
  • 14:48 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1003.eqiad.wmnet
  • 14:44 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1003.eqiad.wmnet
  • 14:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2002.codfw.wmnet
  • 14:37 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2002.codfw.wmnet
  • 14:36 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1002.eqiad.wmnet
  • 14:34 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
  • 14:34 akosiaris: depool again kartotherian in codfw for testing T344324
  • 14:34 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp4052.*,cp5032.*} and A:cp
  • 14:32 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1002.eqiad.wmnet
  • 14:32 btullis@cumin1001: START - Cookbook sre.hadoop.reboot-workers for Hadoop test cluster
  • 14:32 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:28 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp4052.*,cp5032.*} and A:cp
  • 14:26 vgutierrez: update to HAProxy 2.7.10 in cp4052 and cp5032 - T344047
  • 14:26 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1001.eqiad.wmnet
  • 14:23 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
  • 14:23 akosiaris: pool kartotherian in codfw for testing T344324
  • 14:22 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1001.eqiad.wmnet
  • 14:21 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad and A:cp
  • 14:18 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad and A:cp
  • 14:16 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1117.eqiad.wmnet with OS bullseye
  • 14:06 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:06 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for clienthints: Remove duplicate entries when converting to DB rows (T344787) (duration: 13m 10s)
  • 14:05 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for kubernetes2053 - pt1979@cumin2002"
  • 14:03 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 14:01 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad and A:cp
  • 14:01 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad and A:cp
  • 14:00 lucaswerkmeister-wmde@deploy1002: dreamyjazz and lucaswerkmeister-wmde: Continuing with sync
  • 13:56 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka test-eqiad cluster: Reboot kafka nodes
  • 13:54 lucaswerkmeister-wmde@deploy1002: dreamyjazz and lucaswerkmeister-wmde: Backport for clienthints: Remove duplicate entries when converting to DB rows (T344787) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:53 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for clienthints: Remove duplicate entries when converting to DB rows (T344787)
  • 13:50 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host testvm2004.codfw.wmnet with OS bookworm
  • 13:48 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_esams and A:cp
  • 13:47 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_esams and A:cp
  • 13:38 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for clienthints: Remove duplicate entries when converting to DB rows (T344787) (duration: 21m 12s)
  • 13:33 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and dreamyjazz: Continuing with sync
  • 13:26 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_esams and A:cp
  • 13:26 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_esams and A:cp
  • 13:23 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-ctrl1002.eqiad.wmnet
  • 13:19 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and dreamyjazz: Backport for clienthints: Remove duplicate entries when converting to DB rows (T344787) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:17 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for clienthints: Remove duplicate entries when converting to DB rows (T344787)
  • 13:17 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for [pawiki] Enable the SandboxLink extension (T344815) (duration: 12m 06s)
  • 13:16 btullis@cumin1001: START - Cookbook sre.druid.reboot-workers for Druid analytics cluster: Reboot Druid nodes
  • 13:16 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host dse-k8s-ctrl1002.eqiad.wmnet
  • 13:11 lucaswerkmeister-wmde@deploy1002: zoranzoki21 and lucaswerkmeister-wmde: Continuing with sync
  • 13:08 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo and A:cp
  • 13:08 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo and A:cp
  • 13:06 lucaswerkmeister-wmde@deploy1002: zoranzoki21 and lucaswerkmeister-wmde: Backport for [pawiki] Enable the SandboxLink extension (T344815) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:05 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for [pawiki] Enable the SandboxLink extension (T344815)
  • 13:03 gmodena@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 13:03 gmodena@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 13:01 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 13:01 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 12:58 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 12:58 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 12:56 jelto: registry* - upgrade jwt-authorizer package on all 4 hosts to version 1.2.0-1 - T337474
  • 12:49 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo and A:cp
  • 12:49 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo and A:cp
  • 12:49 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-ctrl1001.eqiad.wmnet
  • 12:48 jelto: update jwt-authorizer package to v1.2.0 - T337474
  • 12:48 jelto: update jwt-authorizer package to v1.2.0
  • 12:47 btullis@cumin1001: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid public cluster: Reboot Druid nodes
  • 12:42 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host dse-k8s-ctrl1001.eqiad.wmnet
  • 12:40 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1010.eqiad.wmnet
  • 12:38 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling reboot on A:schema-eqiad
  • 12:34 gmodena@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:34 gmodena@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:34 gmodena@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:34 gmodena@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:32 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1010.eqiad.wmnet
  • 12:32 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host testvm2004.codfw.wmnet with OS bookworm
  • 12:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling reboot on A:schema-eqiad
  • 12:26 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:26 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T344589)', diff saved to https://phabricator.wikimedia.org/P51102 and previous config saved to /var/cache/conftool/dbconfig/20230823-122440-ladsgroup.json
  • 12:19 fabfur@cumin1001: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_eqiad and A:cp
  • 12:19 fabfur@cumin1001: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_eqiad and A:cp
  • 12:17 klausman@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-codfw
  • 12:14 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling reboot on A:schema-codfw
  • 12:12 btullis@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka test-eqiad cluster: Reboot kafka nodes
  • 12:11 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:11 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P51101 and previous config saved to /var/cache/conftool/dbconfig/20230823-120933-ladsgroup.json
  • 12:03 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling reboot on A:schema-codfw
  • 12:00 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_drmrs and A:cp
  • 12:00 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_drmrs and A:cp
  • 11:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor1002.eqiad.wmnet
  • 11:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P51100 and previous config saved to /var/cache/conftool/dbconfig/20230823-115427-ladsgroup.json
  • 11:51 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka jumbo-eqiad cluster: Reboot kafka nodes
  • 11:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor1002.eqiad.wmnet
  • 11:41 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_ulsfo and A:cp
  • 11:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T344589)', diff saved to https://phabricator.wikimedia.org/P51099 and previous config saved to /var/cache/conftool/dbconfig/20230823-113921-ladsgroup.json
  • 11:37 btullis@cumin1001: START - Cookbook sre.druid.reboot-workers for Druid public cluster: Reboot Druid nodes
  • 11:36 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling reboot on A:ldap-replicas-eqiad
  • 11:35 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_drmrs and A:cp
  • 11:35 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_drmrs and A:cp
  • 11:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2181 (T344589)', diff saved to https://phabricator.wikimedia.org/P51098 and previous config saved to /var/cache/conftool/dbconfig/20230823-113310-ladsgroup.json
  • 11:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 11:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 11:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T344589)', diff saved to https://phabricator.wikimedia.org/P51097 and previous config saved to /var/cache/conftool/dbconfig/20230823-113244-ladsgroup.json
  • 11:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1002.eqiad.wmnet
  • 11:31 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqsin and A:cp
  • 11:30 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqsin and A:cp
  • 11:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor2002.codfw.wmnet
  • 11:28 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling reboot on A:ldap-replicas-eqiad
  • 11:25 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-presto1002.eqiad.wmnet
  • 11:25 ayounsi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host atlas2001.wikimedia.org
  • 11:25 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM atlas2001.wikimedia.org - ayounsi@cumin1001"
  • 11:24 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM atlas2001.wikimedia.org - ayounsi@cumin1001"
  • 11:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor2002.codfw.wmnet
  • 11:23 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling reboot on A:ldap-replicas-codfw
  • 11:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) atlas2001.wikimedia.org on all recursors
  • 11:21 ayounsi@cumin1001: START - Cookbook sre.dns.wipe-cache atlas2001.wikimedia.org on all recursors
  • 11:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM atlas2001.wikimedia.org - ayounsi@cumin1001"
  • 11:19 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM atlas2001.wikimedia.org - ayounsi@cumin1001"
  • 11:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P51096 and previous config saved to /var/cache/conftool/dbconfig/20230823-111737-ladsgroup.json
  • 11:17 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 11:17 ayounsi@cumin1001: START - Cookbook sre.ganeti.makevm for new host atlas2001.wikimedia.org
  • 11:15 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling reboot on A:ldap-replicas-codfw
  • 11:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T343718)', diff saved to https://phabricator.wikimedia.org/P51095 and previous config saved to /var/cache/conftool/dbconfig/20230823-111500-ladsgroup.json
  • 11:02 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqsin and A:cp
  • 11:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P51094 and previous config saved to /var/cache/conftool/dbconfig/20230823-110231-ladsgroup.json
  • 11:01 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_codfw and not P{cp2042.*} and A:cp
  • 11:00 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqsin and A:cp
  • 11:00 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_codfw and not P{cp2041.*} and not P{cp2039.*} and A:cp
  • 10:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P51093 and previous config saved to /var/cache/conftool/dbconfig/20230823-105954-ladsgroup.json
  • 10:54 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/termbox: apply
  • 10:54 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/termbox: apply
  • 10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T344589)', diff saved to https://phabricator.wikimedia.org/P51092 and previous config saved to /var/cache/conftool/dbconfig/20230823-104725-ladsgroup.json
  • 10:46 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_codfw and not P{cp2042.*} and A:cp
  • 10:46 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_codfw and not P{cp2041.*} and not P{cp2039.*} and A:cp
  • 10:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P51091 and previous config saved to /var/cache/conftool/dbconfig/20230823-104445-ladsgroup.json
  • 10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T344589)', diff saved to https://phabricator.wikimedia.org/P51090 and previous config saved to /var/cache/conftool/dbconfig/20230823-104308-ladsgroup.json
  • 10:40 vgutierrez: rolling upgrade to HAProxy 2.6.15 - T344047
  • 10:37 vgutierrez: repool cp2039
  • 10:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T343718)', diff saved to https://phabricator.wikimedia.org/P51089 and previous config saved to /var/cache/conftool/dbconfig/20230823-102939-ladsgroup.json
  • 10:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P51088 and previous config saved to /var/cache/conftool/dbconfig/20230823-102801-ladsgroup.json
  • 10:14 vgutierrez: depool cp2039 to run some HAProxy experiments
  • 10:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P51087 and previous config saved to /var/cache/conftool/dbconfig/20230823-101255-ladsgroup.json
  • 10:09 fabfur: temporary depool/repool cp4040 for haproxy service restart
  • 10:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 10:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T343718)', diff saved to https://phabricator.wikimedia.org/P51086 and previous config saved to /var/cache/conftool/dbconfig/20230823-100340-ladsgroup.json
  • 09:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T344589)', diff saved to https://phabricator.wikimedia.org/P51085 and previous config saved to /var/cache/conftool/dbconfig/20230823-095749-ladsgroup.json
  • 09:57 klausman@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-codfw
  • 09:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3318 (T344589)', diff saved to https://phabricator.wikimedia.org/P51084 and previous config saved to /var/cache/conftool/dbconfig/20230823-095040-ladsgroup.json
  • 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T344589)', diff saved to https://phabricator.wikimedia.org/P51083 and previous config saved to /var/cache/conftool/dbconfig/20230823-094916-ladsgroup.json
  • 09:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 09:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T344589)', diff saved to https://phabricator.wikimedia.org/P51082 and previous config saved to /var/cache/conftool/dbconfig/20230823-094851-ladsgroup.json
  • 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P51081 and previous config saved to /var/cache/conftool/dbconfig/20230823-094834-ladsgroup.json
  • 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T343718)', diff saved to https://phabricator.wikimedia.org/P51079 and previous config saved to /var/cache/conftool/dbconfig/20230823-094727-ladsgroup.json
  • 09:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 09:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T343718)', diff saved to https://phabricator.wikimedia.org/P51078 and previous config saved to /var/cache/conftool/dbconfig/20230823-094706-ladsgroup.json
  • 09:38 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host druid1010.eqiad.wmnet
  • 09:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P51075 and previous config saved to /var/cache/conftool/dbconfig/20230823-093345-ladsgroup.json
  • 09:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P51074 and previous config saved to /var/cache/conftool/dbconfig/20230823-093327-ladsgroup.json
  • 09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P51073 and previous config saved to /var/cache/conftool/dbconfig/20230823-093200-ladsgroup.json
  • 09:31 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host druid1010.eqiad.wmnet
  • 09:31 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host druid1009.eqiad.wmnet
  • 09:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host druid1009.eqiad.wmnet
  • 09:24 stevemunene@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 09:21 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 09:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P51072 and previous config saved to /var/cache/conftool/dbconfig/20230823-091838-ladsgroup.json
  • 09:18 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 09:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T343718)', diff saved to https://phabricator.wikimedia.org/P51071 and previous config saved to /var/cache/conftool/dbconfig/20230823-091821-ladsgroup.json
  • 09:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P51070 and previous config saved to /var/cache/conftool/dbconfig/20230823-091653-ladsgroup.json
  • 09:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1223 (T343718)', diff saved to https://phabricator.wikimedia.org/P51069 and previous config saved to /var/cache/conftool/dbconfig/20230823-091242-ladsgroup.json
  • 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 09:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T343718)', diff saved to https://phabricator.wikimedia.org/P51068 and previous config saved to /var/cache/conftool/dbconfig/20230823-091221-ladsgroup.json
  • 09:06 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp2041.codfw.wmnet} and A:cp
  • 09:05 vgutierrez: update to HAProxy 2.6.15 in cp2041 (text) - T344047
  • 09:05 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp2041.codfw.wmnet} and A:cp
  • 09:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T344589)', diff saved to https://phabricator.wikimedia.org/P51067 and previous config saved to /var/cache/conftool/dbconfig/20230823-090332-ladsgroup.json
  • 09:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T343718)', diff saved to https://phabricator.wikimedia.org/P51066 and previous config saved to /var/cache/conftool/dbconfig/20230823-090147-ladsgroup.json
  • 08:59 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp2042.codfw.wmnet} and A:cp
  • 08:59 vgutierrez: update to HAProxy 2.6.15 in cp2042 (upload) - T344047
  • 08:58 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp2042.codfw.wmnet} and A:cp
  • 08:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P51065 and previous config saved to /var/cache/conftool/dbconfig/20230823-085715-ladsgroup.json
  • 08:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2166 (T344589)', diff saved to https://phabricator.wikimedia.org/P51064 and previous config saved to /var/cache/conftool/dbconfig/20230823-085706-ladsgroup.json
  • 08:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 08:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 08:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T344589)', diff saved to https://phabricator.wikimedia.org/P51063 and previous config saved to /var/cache/conftool/dbconfig/20230823-085640-ladsgroup.json
  • 08:47 fabfur: run puppet agent on lvs5004 to clear alert
  • 08:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T343718)', diff saved to https://phabricator.wikimedia.org/P51062 and previous config saved to /var/cache/conftool/dbconfig/20230823-084711-ladsgroup.json
  • 08:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 08:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 08:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 08:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 08:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T343718)', diff saved to https://phabricator.wikimedia.org/P51061 and previous config saved to /var/cache/conftool/dbconfig/20230823-084646-ladsgroup.json
  • 08:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P51060 and previous config saved to /var/cache/conftool/dbconfig/20230823-084203-ladsgroup.json
  • 08:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P51059 and previous config saved to /var/cache/conftool/dbconfig/20230823-084134-ladsgroup.json
  • 08:35 vgutierrez: fetch HAProxy 2.6.15 on thirdparty/haproxy26 for bullseye (apt.wm.o) - T344047
  • 08:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P51058 and previous config saved to /var/cache/conftool/dbconfig/20230823-083140-ladsgroup.json
  • 08:29 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host druid1011.eqiad.wmnet
  • 08:29 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 08:28 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T343718)', diff saved to https://phabricator.wikimedia.org/P51057 and previous config saved to /var/cache/conftool/dbconfig/20230823-082657-ladsgroup.json
  • 08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P51056 and previous config saved to /var/cache/conftool/dbconfig/20230823-082628-ladsgroup.json
  • 08:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host druid1011.eqiad.wmnet
  • 08:21 btullis@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka jumbo-eqiad cluster: Reboot kafka nodes
  • 08:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1212 (T343718)', diff saved to https://phabricator.wikimedia.org/P51055 and previous config saved to /var/cache/conftool/dbconfig/20230823-082116-ladsgroup.json
  • 08:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 08:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 08:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T343718)', diff saved to https://phabricator.wikimedia.org/P51054 and previous config saved to /var/cache/conftool/dbconfig/20230823-082047-ladsgroup.json
  • 08:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P51053 and previous config saved to /var/cache/conftool/dbconfig/20230823-081633-ladsgroup.json
  • 08:15 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 08:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T344589)', diff saved to https://phabricator.wikimedia.org/P51052 and previous config saved to /var/cache/conftool/dbconfig/20230823-081122-ladsgroup.json
  • 08:07 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm-test1001.wikimedia.org
  • 08:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P51051 and previous config saved to /var/cache/conftool/dbconfig/20230823-080541-ladsgroup.json
  • 08:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2165 (T344589)', diff saved to https://phabricator.wikimedia.org/P51050 and previous config saved to /var/cache/conftool/dbconfig/20230823-080500-ladsgroup.json
  • 08:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 08:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 08:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T344589)', diff saved to https://phabricator.wikimedia.org/P51049 and previous config saved to /var/cache/conftool/dbconfig/20230823-080435-ladsgroup.json
  • 08:04 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host etherpad1003.eqiad.wmnet
  • 08:03 slyngshede@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM idm-test1001.wikimedia.org
  • 08:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T343718)', diff saved to https://phabricator.wikimedia.org/P51048 and previous config saved to /var/cache/conftool/dbconfig/20230823-080127-ladsgroup.json
  • 08:00 fabfur: running puppet agent on lvs5006
  • 08:00 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host etherpad1003.eqiad.wmnet
  • 07:56 fabfur@cumin1001: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_ulsfo and A:cp
  • 07:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P51047 and previous config saved to /var/cache/conftool/dbconfig/20230823-075035-ladsgroup.json
  • 07:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P51046 and previous config saved to /var/cache/conftool/dbconfig/20230823-074928-ladsgroup.json
  • 07:39 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm2001.wikimedia.org
  • 07:36 slyngshede@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM idm2001.wikimedia.org
  • 07:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T343718)', diff saved to https://phabricator.wikimedia.org/P51045 and previous config saved to /var/cache/conftool/dbconfig/20230823-073529-ladsgroup.json
  • 07:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P51044 and previous config saved to /var/cache/conftool/dbconfig/20230823-073422-ladsgroup.json
  • 07:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1198 (T343718)', diff saved to https://phabricator.wikimedia.org/P51043 and previous config saved to /var/cache/conftool/dbconfig/20230823-073001-ladsgroup.json
  • 07:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 07:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 07:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T343718)', diff saved to https://phabricator.wikimedia.org/P51042 and previous config saved to /var/cache/conftool/dbconfig/20230823-072940-ladsgroup.json
  • 07:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T343718)', diff saved to https://phabricator.wikimedia.org/P51041 and previous config saved to /var/cache/conftool/dbconfig/20230823-071953-ladsgroup.json
  • 07:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 07:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 07:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T344589)', diff saved to https://phabricator.wikimedia.org/P51040 and previous config saved to /var/cache/conftool/dbconfig/20230823-071916-ladsgroup.json
  • 07:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp2002.wikimedia.org
  • 07:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P51039 and previous config saved to /var/cache/conftool/dbconfig/20230823-071433-ladsgroup.json
  • 07:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp2002.wikimedia.org
  • 07:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2164 (T344589)', diff saved to https://phabricator.wikimedia.org/P51038 and previous config saved to /var/cache/conftool/dbconfig/20230823-071249-ladsgroup.json
  • 07:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 07:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 07:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 07:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 07:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T344589)', diff saved to https://phabricator.wikimedia.org/P51037 and previous config saved to /var/cache/conftool/dbconfig/20230823-071220-ladsgroup.json
  • 06:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P51036 and previous config saved to /var/cache/conftool/dbconfig/20230823-065927-ladsgroup.json
  • 06:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P51035 and previous config saved to /var/cache/conftool/dbconfig/20230823-065714-ladsgroup.json
  • 06:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T343718)', diff saved to https://phabricator.wikimedia.org/P51034 and previous config saved to /var/cache/conftool/dbconfig/20230823-064421-ladsgroup.json
  • 06:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P51033 and previous config saved to /var/cache/conftool/dbconfig/20230823-064207-ladsgroup.json
  • 06:40 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gerrit1003.wikimedia.org
  • 06:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 06:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 06:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T343718)', diff saved to https://phabricator.wikimedia.org/P51032 and previous config saved to /var/cache/conftool/dbconfig/20230823-064019-ladsgroup.json
  • 06:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T343718)', diff saved to https://phabricator.wikimedia.org/P51031 and previous config saved to /var/cache/conftool/dbconfig/20230823-063754-ladsgroup.json
  • 06:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 06:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 06:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T343718)', diff saved to https://phabricator.wikimedia.org/P51030 and previous config saved to /var/cache/conftool/dbconfig/20230823-063733-ladsgroup.json
  • 06:33 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gerrit1003.wikimedia.org
  • 06:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T344589)', diff saved to https://phabricator.wikimedia.org/P51029 and previous config saved to /var/cache/conftool/dbconfig/20230823-062701-ladsgroup.json
  • 06:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P51028 and previous config saved to /var/cache/conftool/dbconfig/20230823-062513-ladsgroup.json
  • 06:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P51027 and previous config saved to /var/cache/conftool/dbconfig/20230823-062227-ladsgroup.json
  • 06:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T344589)', diff saved to https://phabricator.wikimedia.org/P51026 and previous config saved to /var/cache/conftool/dbconfig/20230823-062136-ladsgroup.json
  • 06:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 06:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 06:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T344589)', diff saved to https://phabricator.wikimedia.org/P51025 and previous config saved to /var/cache/conftool/dbconfig/20230823-062112-ladsgroup.json
  • 06:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2163 (T344589)', diff saved to https://phabricator.wikimedia.org/P51024 and previous config saved to /var/cache/conftool/dbconfig/20230823-062038-ladsgroup.json
  • 06:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 06:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 06:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T344589)', diff saved to https://phabricator.wikimedia.org/P51023 and previous config saved to /var/cache/conftool/dbconfig/20230823-062013-ladsgroup.json
  • 06:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P51022 and previous config saved to /var/cache/conftool/dbconfig/20230823-061007-ladsgroup.json
  • 06:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P51021 and previous config saved to /var/cache/conftool/dbconfig/20230823-060721-ladsgroup.json
  • 06:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P51020 and previous config saved to /var/cache/conftool/dbconfig/20230823-060606-ladsgroup.json
  • 06:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P51019 and previous config saved to /var/cache/conftool/dbconfig/20230823-060506-ladsgroup.json
  • 05:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T343718)', diff saved to https://phabricator.wikimedia.org/P51018 and previous config saved to /var/cache/conftool/dbconfig/20230823-055500-ladsgroup.json
  • 05:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T343718)', diff saved to https://phabricator.wikimedia.org/P51017 and previous config saved to /var/cache/conftool/dbconfig/20230823-055215-ladsgroup.json
  • 05:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P51016 and previous config saved to /var/cache/conftool/dbconfig/20230823-055059-ladsgroup.json
  • 05:50 zabe@deploy1002: Backport cancelled.
  • 05:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P51015 and previous config saved to /var/cache/conftool/dbconfig/20230823-055000-ladsgroup.json
  • 05:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1031 (T344589)', diff saved to https://phabricator.wikimedia.org/P51014 and previous config saved to /var/cache/conftool/dbconfig/20230823-054144-ladsgroup.json
  • 05:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 05:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 05:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T343718)', diff saved to https://phabricator.wikimedia.org/P51013 and previous config saved to /var/cache/conftool/dbconfig/20230823-054124-ladsgroup.json
  • 05:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T344589)', diff saved to https://phabricator.wikimedia.org/P51012 and previous config saved to /var/cache/conftool/dbconfig/20230823-053553-ladsgroup.json
  • 05:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T344589)', diff saved to https://phabricator.wikimedia.org/P51011 and previous config saved to /var/cache/conftool/dbconfig/20230823-053454-ladsgroup.json
  • 05:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T344589)', diff saved to https://phabricator.wikimedia.org/P51010 and previous config saved to /var/cache/conftool/dbconfig/20230823-052939-ladsgroup.json
  • 05:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 05:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 05:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T344589)', diff saved to https://phabricator.wikimedia.org/P51009 and previous config saved to /var/cache/conftool/dbconfig/20230823-052915-ladsgroup.json
  • 05:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2162 (T344589)', diff saved to https://phabricator.wikimedia.org/P51008 and previous config saved to /var/cache/conftool/dbconfig/20230823-052834-ladsgroup.json
  • 05:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 05:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 05:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T344589)', diff saved to https://phabricator.wikimedia.org/P51007 and previous config saved to /var/cache/conftool/dbconfig/20230823-052809-ladsgroup.json
  • 05:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P51006 and previous config saved to /var/cache/conftool/dbconfig/20230823-052637-ladsgroup.json
  • 05:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P51005 and previous config saved to /var/cache/conftool/dbconfig/20230823-052618-ladsgroup.json
  • 05:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P51004 and previous config saved to /var/cache/conftool/dbconfig/20230823-051409-ladsgroup.json
  • 05:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T343718)', diff saved to https://phabricator.wikimedia.org/P51003 and previous config saved to /var/cache/conftool/dbconfig/20230823-051312-ladsgroup.json
  • 05:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 05:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P51002 and previous config saved to /var/cache/conftool/dbconfig/20230823-051303-ladsgroup.json
  • 05:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 05:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T343718)', diff saved to https://phabricator.wikimedia.org/P51001 and previous config saved to /var/cache/conftool/dbconfig/20230823-051251-ladsgroup.json
  • 05:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P51000 and previous config saved to /var/cache/conftool/dbconfig/20230823-051131-ladsgroup.json
  • 05:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P50999 and previous config saved to /var/cache/conftool/dbconfig/20230823-051112-ladsgroup.json
  • 04:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P50998 and previous config saved to /var/cache/conftool/dbconfig/20230823-045902-ladsgroup.json
  • 04:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P50997 and previous config saved to /var/cache/conftool/dbconfig/20230823-045757-ladsgroup.json
  • 04:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P50996 and previous config saved to /var/cache/conftool/dbconfig/20230823-045744-ladsgroup.json
  • 04:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1031 (T344589)', diff saved to https://phabricator.wikimedia.org/P50995 and previous config saved to /var/cache/conftool/dbconfig/20230823-045625-ladsgroup.json
  • 04:56 ryankemper@cumin1001: START - Cookbook sre.wdqs.reboot
  • 04:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T343718)', diff saved to https://phabricator.wikimedia.org/P50994 and previous config saved to /var/cache/conftool/dbconfig/20230823-045606-ladsgroup.json
  • 04:53 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster reboot - ryankemper@cumin1001 - T344587
  • 04:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T343718)', diff saved to https://phabricator.wikimedia.org/P50993 and previous config saved to /var/cache/conftool/dbconfig/20230823-045038-ladsgroup.json
  • 04:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 04:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 04:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1031 (T344589)', diff saved to https://phabricator.wikimedia.org/P50992 and previous config saved to /var/cache/conftool/dbconfig/20230823-044741-ladsgroup.json
  • 04:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance
  • 04:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance
  • 04:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028 (T344589)', diff saved to https://phabricator.wikimedia.org/P50991 and previous config saved to /var/cache/conftool/dbconfig/20230823-044717-ladsgroup.json
  • 04:44 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster reboot - ryankemper@cumin1001 - T344587
  • 04:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T344589)', diff saved to https://phabricator.wikimedia.org/P50990 and previous config saved to /var/cache/conftool/dbconfig/20230823-044356-ladsgroup.json
  • 04:43 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster reboot - ryankemper@cumin1001 - T344587
  • 04:43 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster reboot - ryankemper@cumin1001 - T344587
  • 04:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T344589)', diff saved to https://phabricator.wikimedia.org/P50989 and previous config saved to /var/cache/conftool/dbconfig/20230823-044251-ladsgroup.json
  • 04:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P50988 and previous config saved to /var/cache/conftool/dbconfig/20230823-044238-ladsgroup.json
  • 04:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T344589)', diff saved to https://phabricator.wikimedia.org/P50987 and previous config saved to /var/cache/conftool/dbconfig/20230823-043741-ladsgroup.json
  • 04:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 04:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 04:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T344589)', diff saved to https://phabricator.wikimedia.org/P50986 and previous config saved to /var/cache/conftool/dbconfig/20230823-043716-ladsgroup.json
  • 04:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2154 (T344589)', diff saved to https://phabricator.wikimedia.org/P50985 and previous config saved to /var/cache/conftool/dbconfig/20230823-043625-ladsgroup.json
  • 04:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 04:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 04:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T344589)', diff saved to https://phabricator.wikimedia.org/P50984 and previous config saved to /var/cache/conftool/dbconfig/20230823-043600-ladsgroup.json
  • 04:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P50983 and previous config saved to /var/cache/conftool/dbconfig/20230823-043216-ladsgroup.json
  • 04:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P50982 and previous config saved to /var/cache/conftool/dbconfig/20230823-043210-ladsgroup.json
  • 04:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T343718)', diff saved to https://phabricator.wikimedia.org/P50981 and previous config saved to /var/cache/conftool/dbconfig/20230823-042732-ladsgroup.json
  • 04:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P50980 and previous config saved to /var/cache/conftool/dbconfig/20230823-042210-ladsgroup.json
  • 04:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P50979 and previous config saved to /var/cache/conftool/dbconfig/20230823-042054-ladsgroup.json
  • 04:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 04:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 04:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P50978 and previous config saved to /var/cache/conftool/dbconfig/20230823-041712-ladsgroup.json
  • 04:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P50977 and previous config saved to /var/cache/conftool/dbconfig/20230823-041704-ladsgroup.json
  • 04:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P50976 and previous config saved to /var/cache/conftool/dbconfig/20230823-040704-ladsgroup.json
  • 04:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P50975 and previous config saved to /var/cache/conftool/dbconfig/20230823-040548-ladsgroup.json
  • 04:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P50974 and previous config saved to /var/cache/conftool/dbconfig/20230823-040207-ladsgroup.json
  • 04:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028 (T344589)', diff saved to https://phabricator.wikimedia.org/P50973 and previous config saved to /var/cache/conftool/dbconfig/20230823-040158-ladsgroup.json
  • 03:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1028 (T344589)', diff saved to https://phabricator.wikimedia.org/P50972 and previous config saved to /var/cache/conftool/dbconfig/20230823-035707-ladsgroup.json
  • 03:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance
  • 03:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance
  • 03:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T344589)', diff saved to https://phabricator.wikimedia.org/P50971 and previous config saved to /var/cache/conftool/dbconfig/20230823-035157-ladsgroup.json
  • 03:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T344589)', diff saved to https://phabricator.wikimedia.org/P50970 and previous config saved to /var/cache/conftool/dbconfig/20230823-035042-ladsgroup.json
  • 03:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P50969 and previous config saved to /var/cache/conftool/dbconfig/20230823-034656-ladsgroup.json
  • 03:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T343718)', diff saved to https://phabricator.wikimedia.org/P50968 and previous config saved to /var/cache/conftool/dbconfig/20230823-034643-ladsgroup.json
  • 03:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 03:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 03:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 03:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 03:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T344589)', diff saved to https://phabricator.wikimedia.org/P50967 and previous config saved to /var/cache/conftool/dbconfig/20230823-034549-ladsgroup.json
  • 03:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 03:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 03:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 03:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 03:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2152 (T344589)', diff saved to https://phabricator.wikimedia.org/P50966 and previous config saved to /var/cache/conftool/dbconfig/20230823-034519-ladsgroup.json
  • 03:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 03:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 00:35 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-eqiad: Disable legacy SSL port — T339299 - eevans@cumin1001

2023-08-22

  • 23:49 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-eqiad: Disable legacy SSL port — T339299 - eevans@cumin1001
  • 23:45 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-codfw: Disable legacy SSL port — T339299 - eevans@cumin1001
  • 22:59 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-codfw: Disable legacy SSL port — T339299 - eevans@cumin1001
  • 21:21 eileen: config revision changed from 1ea8201f to c2f91f49
  • 21:02 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs20[09-12].codfw.wmnet: Upgrade Cassandra to 4.1.1 — T339299 - eevans@cumin1001
  • 20:48 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_eqsin and A:cp
  • 20:44 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs20[09-12].codfw.wmnet: Upgrade Cassandra to 4.1.1 — T339299 - eevans@cumin1001
  • 20:29 inflatador: bking@cumin1001 enable/run puppet on hosts after rollback T343856
  • 20:27 urbanecm@deploy1002: Finished scap: Backport for Declare v1 of the page_content_change stream. (T307959) (duration: 11m 19s)
  • 20:21 urbanecm@deploy1002: urbanecm and gmodena: Continuing with sync
  • 20:17 urbanecm@deploy1002: urbanecm and gmodena: Backport for Declare v1 of the page_content_change stream. (T307959) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:16 urbanecm@deploy1002: Started scap: Backport for Declare v1 of the page_content_change stream. (T307959)
  • 20:15 urbanecm@deploy1002: Finished scap: Backport for clienthints: Collect Client Hints data on all wikis (T341110) (duration: 12m 29s)
  • 20:09 urbanecm@deploy1002: dreamyjazz and urbanecm: Continuing with sync
  • 20:04 urbanecm@deploy1002: dreamyjazz and urbanecm: Backport for clienthints: Collect Client Hints data on all wikis (T341110) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:02 urbanecm@deploy1002: Started scap: Backport for clienthints: Collect Client Hints data on all wikis (T341110)
  • 19:42 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1006.eqiad.wmnet
  • 19:33 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1006.eqiad.wmnet
  • 19:33 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh3004.wikimedia.org
  • 19:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host doh3004.wikimedia.org with OS bullseye
  • 19:25 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2006.codfw.wmnet
  • 19:17 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet
  • 19:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh3004.wikimedia.org with reason: host reimage
  • 19:16 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus3003.esams.wmnet
  • 19:12 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh3004.wikimedia.org with reason: host reimage
  • 19:10 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus3003.esams.wmnet
  • 19:09 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus4002.ulsfo.wmnet
  • 19:03 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus4002.ulsfo.wmnet
  • 19:02 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus5002.eqsin.wmnet
  • 18:56 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus5002.eqsin.wmnet
  • 18:56 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host doh3004.wikimedia.org with OS bullseye
  • 18:56 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus6002.drmrs.wmnet
  • 18:55 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM doh3004.wikimedia.org - sukhe@cumin2002"
  • 18:55 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM doh3004.wikimedia.org - sukhe@cumin2002"
  • 18:54 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doh3004.wikimedia.org on all recursors
  • 18:54 sukhe@cumin2002: START - Cookbook sre.dns.wipe-cache doh3004.wikimedia.org on all recursors
  • 18:54 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:54 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh3004.wikimedia.org - sukhe@cumin2002"
  • 18:53 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh3004.wikimedia.org - sukhe@cumin2002"
  • 18:51 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 18:51 sukhe@cumin2002: START - Cookbook sre.ganeti.makevm for new host doh3004.wikimedia.org
  • 18:49 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus6002.drmrs.wmnet
  • 18:49 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts doh3004.wikimedia.org
  • 18:48 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:48 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doh3004.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 18:48 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doh3004.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 18:46 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 18:44 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1001.eqiad.wmnet
  • 18:42 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts doh3004.wikimedia.org
  • 18:41 sukhe: decommissioning doh3004 as it was added in the same ganeti cluster as 3003
  • 18:37 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh3004.wikimedia.org
  • 18:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host doh3004.wikimedia.org with OS bullseye
  • 18:33 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1001.eqiad.wmnet
  • 18:32 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1002.eqiad.wmnet
  • 18:28 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.23 refs T343725
  • 18:26 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1002.eqiad.wmnet
  • 18:23 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1003.eqiad.wmnet
  • 18:21 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh3004.wikimedia.org with reason: host reimage
  • 18:17 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh3004.wikimedia.org with reason: host reimage
  • 18:17 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1003.eqiad.wmnet
  • 18:12 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1004.eqiad.wmnet
  • 18:06 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1004.eqiad.wmnet
  • 18:04 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2001.codfw.wmnet
  • 18:02 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host doh3004.wikimedia.org with OS bullseye
  • 18:02 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM doh3004.wikimedia.org - sukhe@cumin2002"
  • 18:01 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM doh3004.wikimedia.org - sukhe@cumin2002"
  • 18:01 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doh3004.wikimedia.org on all recursors
  • 18:01 sukhe@cumin2002: START - Cookbook sre.dns.wipe-cache doh3004.wikimedia.org on all recursors
  • 18:01 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:01 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh3004.wikimedia.org - sukhe@cumin2002"
  • 17:59 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh3004.wikimedia.org - sukhe@cumin2002"
  • 17:58 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2001.codfw.wmnet
  • 17:57 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 17:57 sukhe@cumin2002: START - Cookbook sre.ganeti.makevm for new host doh3004.wikimedia.org
  • 17:56 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh3003.wikimedia.org
  • 17:56 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host doh3003.wikimedia.org with OS bullseye
  • 17:48 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs200[5-8].codfw.wmnet: Upgrade Cassandra to 4.1.1 — T339299 - eevans@cumin1001
  • 17:42 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh3003.wikimedia.org with reason: host reimage
  • 17:39 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh3003.wikimedia.org with reason: host reimage
  • 17:30 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs200[5-8].codfw.wmnet: Upgrade Cassandra to 4.1.1 — T339299 - eevans@cumin1001
  • 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T344589)', diff saved to https://phabricator.wikimedia.org/P50965 and previous config saved to /var/cache/conftool/dbconfig/20230822-172748-ladsgroup.json
  • 17:26 joal@deploy1002: Finished deploy [analytics/refinery@d62f281] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d62f281] (duration: 02m 01s)
  • 17:24 joal@deploy1002: Started deploy [analytics/refinery@d62f281] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d62f281]
  • 17:19 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host doh3003.wikimedia.org with OS bullseye
  • 17:18 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM doh3003.wikimedia.org - sukhe@cumin2002"
  • 17:18 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM doh3003.wikimedia.org - sukhe@cumin2002"
  • 17:17 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doh3003.wikimedia.org on all recursors
  • 17:17 sukhe@cumin2002: START - Cookbook sre.dns.wipe-cache doh3003.wikimedia.org on all recursors
  • 17:17 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:17 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh3003.wikimedia.org - sukhe@cumin2002"
  • 17:16 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh3003.wikimedia.org - sukhe@cumin2002"
  • 17:16 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2002.codfw.wmnet
  • 17:14 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 17:14 sukhe@cumin2002: START - Cookbook sre.ganeti.makevm for new host doh3003.wikimedia.org
  • 17:13 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts doh3003.wikimedia.org
  • 17:13 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:13 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doh3003.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 17:13 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs200[2-4].codfw.wmnet: Upgrade Cassandra to 4.1.1 — T339299 - eevans@cumin1001
  • 17:12 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doh3003.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P50964 and previous config saved to /var/cache/conftool/dbconfig/20230822-171242-ladsgroup.json
  • 17:10 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 17:10 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2002.codfw.wmnet
  • 17:07 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2003.codfw.wmnet
  • 17:06 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts doh3003.wikimedia.org
  • 17:01 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2003.codfw.wmnet
  • 16:58 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs200[2-4].codfw.wmnet: Upgrade Cassandra to 4.1.1 — T339299 - eevans@cumin1001
  • 16:57 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2004.codfw.wmnet
  • 16:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P50963 and previous config saved to /var/cache/conftool/dbconfig/20230822-165736-ladsgroup.json
  • 16:51 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2004.codfw.wmnet
  • 16:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T344589)', diff saved to https://phabricator.wikimedia.org/P50962 and previous config saved to /var/cache/conftool/dbconfig/20230822-164229-ladsgroup.json
  • 16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T344589)', diff saved to https://phabricator.wikimedia.org/P50961 and previous config saved to /var/cache/conftool/dbconfig/20230822-163609-ladsgroup.json
  • 16:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 16:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T344589)', diff saved to https://phabricator.wikimedia.org/P50960 and previous config saved to /var/cache/conftool/dbconfig/20230822-163544-ladsgroup.json
  • 16:34 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs1009.eqiad.wmnet with reason: jnl export
  • 16:34 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs1009.eqiad.wmnet with reason: jnl export
  • 16:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2023.codfw.wmnet with OS bullseye
  • 16:34 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:26 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:26 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: re-add wikidough ips - sukhe@cumin2002"
  • 16:25 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: re-add wikidough ips - sukhe@cumin2002"
  • 16:25 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:23 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 16:21 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P50959 and previous config saved to /var/cache/conftool/dbconfig/20230822-162038-ladsgroup.json
  • 16:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:14 sukhe@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh3003.wikimedia.org
  • 16:14 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doh3003.wikimedia.org on all recursors
  • 16:14 sukhe@cumin2002: START - Cookbook sre.dns.wipe-cache doh3003.wikimedia.org on all recursors
  • 16:14 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:14 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM doh3003.wikimedia.org - sukhe@cumin2002"
  • 16:13 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM doh3003.wikimedia.org - sukhe@cumin2002"
  • 16:12 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 16:11 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doh3003.wikimedia.org on all recursors
  • 16:11 sukhe@cumin2002: START - Cookbook sre.dns.wipe-cache doh3003.wikimedia.org on all recursors
  • 16:11 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:11 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh3003.wikimedia.org - sukhe@cumin2002"
  • 16:11 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh3003.wikimedia.org - sukhe@cumin2002"
  • 16:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2023.codfw.wmnet with reason: host reimage
  • 16:09 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 16:09 sukhe@cumin2002: START - Cookbook sre.ganeti.makevm for new host doh3003.wikimedia.org
  • 16:08 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:08 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: clear wikidough ips - sukhe@cumin2002"
  • 16:07 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: clear wikidough ips - sukhe@cumin2002"
  • 16:06 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2023.codfw.wmnet with reason: host reimage
  • 16:05 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P50958 and previous config saved to /var/cache/conftool/dbconfig/20230822-160532-ladsgroup.json
  • 16:05 sukhe@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh3003.wikimedia.org
  • 16:05 sukhe@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 15:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2023.codfw.wmnet with OS bullseye
  • 15:58 sukhe: sudo cookbook sre.ganeti.makevm --vcpus 2 --memory 8 --disk 15 --network public --os bullseye --cluster esams01 --group BY27 -t T344355 doh3003
  • 15:57 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 15:57 sukhe@cumin2002: START - Cookbook sre.ganeti.makevm for new host doh3003.wikimedia.org
  • 15:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:56 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:54 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:53 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:52 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T344589)', diff saved to https://phabricator.wikimedia.org/P50957 and previous config saved to /var/cache/conftool/dbconfig/20230822-155025-ladsgroup.json
  • 15:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T344589)', diff saved to https://phabricator.wikimedia.org/P50956 and previous config saved to /var/cache/conftool/dbconfig/20230822-154712-ladsgroup.json
  • 15:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T344589)', diff saved to https://phabricator.wikimedia.org/P50955 and previous config saved to /var/cache/conftool/dbconfig/20230822-154621-ladsgroup.json
  • 15:40 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_eqsin and A:cp
  • 15:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P50954 and previous config saved to /var/cache/conftool/dbconfig/20230822-153206-ladsgroup.json
  • 15:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P50953 and previous config saved to /var/cache/conftool/dbconfig/20230822-153115-ladsgroup.json
  • 15:29 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs10[12,15,18,21].eqiad.wmnet: Upgrade Cassandra to 4.1.1 — T339299 - eevans@cumin1001
  • 15:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P50952 and previous config saved to /var/cache/conftool/dbconfig/20230822-151700-ladsgroup.json
  • 15:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P50951 and previous config saved to /var/cache/conftool/dbconfig/20230822-151608-ladsgroup.json
  • 15:12 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host arclamp2001.codfw.wmnet
  • 15:12 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs10[12,15,18,21].eqiad.wmnet: Upgrade Cassandra to 4.1.1 — T339299 - eevans@cumin1001
  • 15:07 moritzm: installing hdf5 security updates
  • 15:07 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs10[11,14,17,20].eqiad.wmnet: Upgrade Cassandra to 4.1.1 — T339299 - eevans@cumin1001
  • 15:07 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2056.codfw.wmnet with OS bullseye
  • 15:06 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host arclamp2001.codfw.wmnet
  • 15:04 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet
  • 15:04 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2055.codfw.wmnet with OS bullseye
  • 15:03 kevinbazira: stat1008: Remove `aswiki` from the published datasets repo `/srv/published/datasets/one-off/research-mwaddlink` (T344319)
  • 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T344589)', diff saved to https://phabricator.wikimedia.org/P50949 and previous config saved to /var/cache/conftool/dbconfig/20230822-150153-ladsgroup.json
  • 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T344589)', diff saved to https://phabricator.wikimedia.org/P50948 and previous config saved to /var/cache/conftool/dbconfig/20230822-150102-ladsgroup.json
  • 15:00 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet
  • 15:00 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2003.codfw.wmnet
  • 14:59 kevinbazira: tools.stashbot stat1008: Remove `aswiki` from `/srv/published/datasets/one-off/research-mwaddlink/wikis.txt` (T344319)
  • 14:56 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host webperf2003.codfw.wmnet
  • 14:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T344589)', diff saved to https://phabricator.wikimedia.org/P50947 and previous config saved to /var/cache/conftool/dbconfig/20230822-145544-ladsgroup.json
  • 14:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1202 (T344589)', diff saved to https://phabricator.wikimedia.org/P50946 and previous config saved to /var/cache/conftool/dbconfig/20230822-145442-ladsgroup.json
  • 14:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 14:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 14:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T344589)', diff saved to https://phabricator.wikimedia.org/P50945 and previous config saved to /var/cache/conftool/dbconfig/20230822-145419-ladsgroup.json
  • 14:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T344589)', diff saved to https://phabricator.wikimedia.org/P50944 and previous config saved to /var/cache/conftool/dbconfig/20230822-145418-ladsgroup.json
  • 14:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 14:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 14:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T344589)', diff saved to https://phabricator.wikimedia.org/P50942 and previous config saved to /var/cache/conftool/dbconfig/20230822-145353-ladsgroup.json
  • 14:53 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1058.eqiad.wmnet with OS bullseye
  • 14:50 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs10[11,14,17,20].eqiad.wmnet: Upgrade Cassandra to 4.1.1 — T339299 - eevans@cumin1001
  • 14:50 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1057.eqiad.wmnet with OS bullseye
  • 14:49 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2056.codfw.wmnet with reason: host reimage
  • 14:49 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2023.codfw.wmnet with OS bullseye
  • 14:46 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2055.codfw.wmnet with reason: host reimage
  • 14:46 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2056.codfw.wmnet with reason: host reimage
  • 14:44 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs101[6,9].eqiad.wmnet: Upgrade Cassandra to 4.1.1 — T339299 - eevans@cumin1001
  • 14:43 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2055.codfw.wmnet with reason: host reimage
  • 14:41 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafkamon1003.eqiad.wmnet
  • 14:40 taavi@deploy1002: Finished scap: Backport for wmf-config: update new esams IP ranges (T329219) (duration: 09m 50s)
  • 14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P50940 and previous config saved to /var/cache/conftool/dbconfig/20230822-143912-ladsgroup.json
  • 14:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P50939 and previous config saved to /var/cache/conftool/dbconfig/20230822-143847-ladsgroup.json
  • 14:37 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafkamon1003.eqiad.wmnet
  • 14:37 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs101[6,9].eqiad.wmnet: Upgrade Cassandra to 4.1.1 — T339299 - eevans@cumin1001
  • 14:36 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafkamon2003.codfw.wmnet
  • 14:36 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1058.eqiad.wmnet with reason: host reimage
  • 14:33 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1057.eqiad.wmnet with reason: host reimage
  • 14:32 taavi@deploy1002: taavi and sukhe: Continuing with sync
  • 14:32 taavi@deploy1002: taavi and sukhe: Backport for wmf-config: update new esams IP ranges (T329219) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 14:31 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1058.eqiad.wmnet with reason: host reimage
  • 14:30 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1057.eqiad.wmnet with reason: host reimage
  • 14:30 taavi@deploy1002: Started scap: Backport for wmf-config: update new esams IP ranges (T329219)
  • 14:27 gmodena@deploy1002: Finished deploy [analytics/refinery@d62f281] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d62f281] (duration: 00m 04s)
  • 14:27 gmodena@deploy1002: Started deploy [analytics/refinery@d62f281] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d62f281]
  • 14:25 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2056.codfw.wmnet with OS bullseye
  • 14:24 stevemunene@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 14:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P50938 and previous config saved to /var/cache/conftool/dbconfig/20230822-142405-ladsgroup.json
  • 14:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P50937 and previous config saved to /var/cache/conftool/dbconfig/20230822-142341-ladsgroup.json
  • 14:22 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2055.codfw.wmnet with OS bullseye
  • 14:20 gmodena@deploy1002: Finished deploy [analytics/refinery@d62f281] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d62f281] (duration: 03m 15s)
  • 14:20 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 14:18 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1058.eqiad.wmnet with OS bullseye
  • 14:18 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1057.eqiad.wmnet with OS bullseye
  • 14:17 gmodena@deploy1002: Started deploy [analytics/refinery@d62f281] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d62f281]
  • 14:16 gmodena@deploy1002: Finished deploy [analytics/refinery@d62f281] (thin): Regular analytics weekly train THIN [analytics/refinery@d62f281] (duration: 00m 04s)
  • 14:16 gmodena@deploy1002: Started deploy [analytics/refinery@d62f281] (thin): Regular analytics weekly train THIN [analytics/refinery@d62f281]
  • 14:16 gmodena@deploy1002: Finished deploy [analytics/refinery@d62f281]: Regular analytics weekly train [analytics/refinery@d62f281] (duration: 05m 39s)
  • 14:15 hnowlan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "refreshing kubernetes205[56] kubernetes105[78] status T343996 T343993 - hnowlan@cumin1001"
  • 14:14 hnowlan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "refreshing kubernetes205[56] kubernetes105[78] status T343996 T343993 - hnowlan@cumin1001"
  • 14:12 hnowlan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:10 hnowlan@cumin1001: START - Cookbook sre.dns.netbox
  • 14:10 gmodena@deploy1002: Started deploy [analytics/refinery@d62f281]: Regular analytics weekly train [analytics/refinery@d62f281]
  • 14:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T344589)', diff saved to https://phabricator.wikimedia.org/P50936 and previous config saved to /var/cache/conftool/dbconfig/20230822-140859-ladsgroup.json
  • 14:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T344589)', diff saved to https://phabricator.wikimedia.org/P50935 and previous config saved to /var/cache/conftool/dbconfig/20230822-140835-ladsgroup.json
  • 14:08 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafkamon2003.codfw.wmnet
  • 14:07 klausman@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1001.eqiad.wmnet
  • 14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T344589)', diff saved to https://phabricator.wikimedia.org/P50934 and previous config saved to /var/cache/conftool/dbconfig/20230822-140417-ladsgroup.json
  • 14:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T344589)', diff saved to https://phabricator.wikimedia.org/P50933 and previous config saved to /var/cache/conftool/dbconfig/20230822-140236-ladsgroup.json
  • 14:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 14:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 14:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T344589)', diff saved to https://phabricator.wikimedia.org/P50932 and previous config saved to /var/cache/conftool/dbconfig/20230822-140213-ladsgroup.json
  • 14:01 klausman@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1001.eqiad.wmnet
  • 14:01 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2023.codfw.wmnet with OS bullseye
  • 13:57 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:57 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:55 klausman@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1002.eqiad.wmnet
  • 13:52 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gerrit2002.wikimedia.org
  • 13:49 klausman@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1002.eqiad.wmnet
  • 13:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2023']
  • 13:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P50930 and previous config saved to /var/cache/conftool/dbconfig/20230822-134911-ladsgroup.json
  • 13:48 klausman@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2002.codfw.wmnet
  • 13:48 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gerrit2002.wikimedia.org
  • 13:48 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 13:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P50929 and previous config saved to /var/cache/conftool/dbconfig/20230822-134703-ladsgroup.json
  • 13:43 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2023']
  • 13:43 klausman@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2002.codfw.wmnet
  • 13:41 klausman@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2001.codfw.wmnet
  • 13:40 urbanecm@deploy1002: Finished scap: Backport for knwiki add import sources (T344573), Update tcywiki logos (T344557), clienthints: Remove server-side check for browser support (T344679), clienthints: Remove server-side check for browser support (T344679) (duration: 19m 44s)
  • 13:35 klausman@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2001.codfw.wmnet
  • 13:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P50928 and previous config saved to /var/cache/conftool/dbconfig/20230822-133405-ladsgroup.json
  • 13:33 urbanecm@deploy1002: urbanecm and dreamyjazz and anzx: Continuing with sync
  • 13:33 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:33 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P50927 and previous config saved to /var/cache/conftool/dbconfig/20230822-133157-ladsgroup.json
  • 13:30 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 13:24 klausman: Draining ml-serve2008 for kubelet partition resize
  • 13:23 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host miscweb1003.eqiad.wmnet
  • 13:21 urbanecm@deploy1002: urbanecm and dreamyjazz and anzx: Backport for knwiki add import sources (T344573), Update tcywiki logos (T344557), clienthints: Remove server-side check for browser support (T344679), clienthints: Remove server-side check for browser support (T344679) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet,
  • 13:20 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host miscweb1003.eqiad.wmnet
  • 13:20 sukhe: [done] finished repooling esams
  • 13:20 urbanecm@deploy1002: Started scap: Backport for knwiki add import sources (T344573), Update tcywiki logos (T344557), clienthints: Remove server-side check for browser support (T344679), clienthints: Remove server-side check for browser support (T344679)
  • 13:19 urbanecm@deploy1002: Finished scap: Backport for Remove unneeded $wgDefaultUserOptions['visualeditor-enable'] settings (T340696), Move visual editor out of Beta Features (without changing prefs) (T335056), Clarify 2017 wikitext editor's Beta Feature status (T344158) (duration: 15m 43s)
  • 13:19 sukhe: repool esams
  • 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T344589)', diff saved to https://phabricator.wikimedia.org/P50926 and previous config saved to /var/cache/conftool/dbconfig/20230822-131859-ladsgroup.json
  • 13:17 klausman: Draining ml-serve2007 for kubelet partition resize
  • 13:17 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:17 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T344589)', diff saved to https://phabricator.wikimedia.org/P50925 and previous config saved to /var/cache/conftool/dbconfig/20230822-131651-ladsgroup.json
  • 13:16 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 13:13 urbanecm@deploy1002: urbanecm and matmarex: Continuing with sync
  • 13:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3318 (T344589)', diff saved to https://phabricator.wikimedia.org/P50924 and previous config saved to /var/cache/conftool/dbconfig/20230822-131250-ladsgroup.json
  • 13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T344589)', diff saved to https://phabricator.wikimedia.org/P50923 and previous config saved to /var/cache/conftool/dbconfig/20230822-131122-ladsgroup.json
  • 13:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 13:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 13:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T344589)', diff saved to https://phabricator.wikimedia.org/P50922 and previous config saved to /var/cache/conftool/dbconfig/20230822-131057-ladsgroup.json
  • 13:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1191 (T344589)', diff saved to https://phabricator.wikimedia.org/P50921 and previous config saved to /var/cache/conftool/dbconfig/20230822-130920-ladsgroup.json
  • 13:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 13:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 13:09 sukhe: [done] authdns-update for old references
  • 13:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T344589)', diff saved to https://phabricator.wikimedia.org/P50919 and previous config saved to /var/cache/conftool/dbconfig/20230822-130856-ladsgroup.json
  • 13:07 sukhe: running authdns-update to remove old references to esams
  • 13:05 urbanecm@deploy1002: urbanecm and matmarex: Backport for Remove unneeded $wgDefaultUserOptions['visualeditor-enable'] settings (T340696), Move visual editor out of Beta Features (without changing prefs) (T335056), Clarify 2017 wikitext editor's Beta Feature status (T344158) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codf
  • 13:05 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:03 urbanecm@deploy1002: Started scap: Backport for Remove unneeded $wgDefaultUserOptions['visualeditor-enable'] settings (T340696), Move visual editor out of Beta Features (without changing prefs) (T335056), Clarify 2017 wikitext editor's Beta Feature status (T344158)
  • 13:01 urbanecm: stat1008: Remove `krcwiki` and `ganwiki` from `/srv/published/datasets/one-off/research-mwaddlink/wikis.txt` (T344686)
  • 12:56 hnowlan@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2056.codfw.wmnet with OS bullseye
  • 12:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P50918 and previous config saved to /var/cache/conftool/dbconfig/20230822-125550-ladsgroup.json
  • 12:54 hnowlan@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2055.codfw.wmnet with OS bullseye
  • 12:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P50917 and previous config saved to /var/cache/conftool/dbconfig/20230822-125350-ladsgroup.json
  • 12:46 urbanecm: mwmaint1002: foreachwikiindblist growthexperiments extensions/GrowthExperiments/maintenance/revalidateLinkRecommendations.php --scoreLessThan=0.6 --verbose | tee growth-T316079-revalidate-0.6.log # T316079
  • 12:42 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dispatch-be2001.codfw.wmnet
  • 12:42 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dispatch-be1001.eqiad.wmnet
  • 12:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P50916 and previous config saved to /var/cache/conftool/dbconfig/20230822-124044-ladsgroup.json
  • 12:39 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1057.eqiad.wmnet with OS bullseye
  • 12:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P50915 and previous config saved to /var/cache/conftool/dbconfig/20230822-123844-ladsgroup.json
  • 12:38 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host dispatch-be2001.codfw.wmnet
  • 12:38 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host dispatch-be1001.eqiad.wmnet
  • 12:29 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host grafana1002.eqiad.wmnet
  • 12:28 fabfur@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_ulsfo and A:cp
  • 12:26 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host releases1003.eqiad.wmnet
  • 12:25 stevemunene@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 12:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T344589)', diff saved to https://phabricator.wikimedia.org/P50914 and previous config saved to /var/cache/conftool/dbconfig/20230822-122538-ladsgroup.json
  • 12:24 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host grafana1002.eqiad.wmnet
  • 12:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T344589)', diff saved to https://phabricator.wikimedia.org/P50913 and previous config saved to /var/cache/conftool/dbconfig/20230822-122338-ladsgroup.json
  • 12:23 eoghan@cumin1001: START - Cookbook sre.hosts.reboot-single for host releases1003.eqiad.wmnet
  • 12:22 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host releases2003.codfw.wmnet
  • 12:22 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 12:22 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1058.eqiad.wmnet with OS bullseye
  • 12:21 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host grafana2001.codfw.wmnet
  • 12:20 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2005.codfw.wmnet
  • 12:19 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1005.eqiad.wmnet
  • 12:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T344589)', diff saved to https://phabricator.wikimedia.org/P50912 and previous config saved to /var/cache/conftool/dbconfig/20230822-121913-ladsgroup.json
  • 12:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 12:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 12:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 12:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 12:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T344589)', diff saved to https://phabricator.wikimedia.org/P50911 and previous config saved to /var/cache/conftool/dbconfig/20230822-121832-ladsgroup.json
  • 12:18 eoghan@cumin1001: START - Cookbook sre.hosts.reboot-single for host releases2003.codfw.wmnet
  • 12:17 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host doc1003.eqiad.wmnet
  • 12:17 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host grafana2001.codfw.wmnet
  • 12:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T344589)', diff saved to https://phabricator.wikimedia.org/P50910 and previous config saved to /var/cache/conftool/dbconfig/20230822-121714-ladsgroup.json
  • 12:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 12:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 12:13 eoghan@cumin1001: START - Cookbook sre.hosts.reboot-single for host doc1003.eqiad.wmnet
  • 12:13 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host doc2002.codfw.wmnet
  • 12:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 12:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 12:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T344589)', diff saved to https://phabricator.wikimedia.org/P50909 and previous config saved to /var/cache/conftool/dbconfig/20230822-121218-ladsgroup.json
  • 12:12 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2005.codfw.wmnet
  • 12:10 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1005.eqiad.wmnet
  • 12:09 eoghan@cumin1001: START - Cookbook sre.hosts.reboot-single for host doc2002.codfw.wmnet
  • 12:07 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aphlict2001.codfw.wmnet
  • 12:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P50908 and previous config saved to /var/cache/conftool/dbconfig/20230822-120326-ladsgroup.json
  • 12:03 eoghan@cumin1001: START - Cookbook sre.hosts.reboot-single for host aphlict2001.codfw.wmnet
  • 12:02 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aphlict1002.eqiad.wmnet
  • 12:00 eoghan@cumin1001: START - Cookbook sre.hosts.reboot-single for host aphlict1002.eqiad.wmnet
  • 11:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P50907 and previous config saved to /var/cache/conftool/dbconfig/20230822-115712-ladsgroup.json
  • 11:51 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host miscweb2003.codfw.wmnet
  • 11:49 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: sync
  • 11:49 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: sync
  • 11:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P50906 and previous config saved to /var/cache/conftool/dbconfig/20230822-114820-ladsgroup.json
  • 11:47 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host miscweb2003.codfw.wmnet
  • 11:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P50905 and previous config saved to /var/cache/conftool/dbconfig/20230822-114206-ladsgroup.json
  • 11:41 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 11:41 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 11:40 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2056.codfw.wmnet with OS bullseye
  • 11:37 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 11:36 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 11:36 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2055.codfw.wmnet with OS bullseye
  • 11:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T344589)', diff saved to https://phabricator.wikimedia.org/P50904 and previous config saved to /var/cache/conftool/dbconfig/20230822-113313-ladsgroup.json
  • 11:32 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moscovium.eqiad.wmnet
  • 11:29 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1058.eqiad.wmnet with OS bullseye
  • 11:28 eoghan@cumin1001: START - Cookbook sre.hosts.reboot-single for host moscovium.eqiad.wmnet
  • 11:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T344589)', diff saved to https://phabricator.wikimedia.org/P50903 and previous config saved to /var/cache/conftool/dbconfig/20230822-112659-ladsgroup.json
  • 11:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T344589)', diff saved to https://phabricator.wikimedia.org/P50902 and previous config saved to /var/cache/conftool/dbconfig/20230822-112650-ladsgroup.json
  • 11:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 11:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 11:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T344589)', diff saved to https://phabricator.wikimedia.org/P50901 and previous config saved to /var/cache/conftool/dbconfig/20230822-112625-ladsgroup.json
  • 11:25 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1057.eqiad.wmnet with OS bullseye
  • 11:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T344589)', diff saved to https://phabricator.wikimedia.org/P50900 and previous config saved to /var/cache/conftool/dbconfig/20230822-112438-ladsgroup.json
  • 11:16 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host planet1002.eqiad.wmnet
  • 11:15 XioNoX: delete RPKI ROAs for 91.198.174.0/24 and 2a02:ec80:500::/48 - T344579
  • 11:13 XioNoX: delete RIPE route6 object for 2a02:ec80:500::/48 - T344579
  • 11:12 eoghan@cumin1001: START - Cookbook sre.hosts.reboot-single for host planet1002.eqiad.wmnet
  • 11:12 XioNoX: delete RIPE route object for 91.198.174.0/24 - T344579
  • 11:12 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host planet2002.codfw.wmnet
  • 11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P50899 and previous config saved to /var/cache/conftool/dbconfig/20230822-111119-ladsgroup.json
  • 11:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P50898 and previous config saved to /var/cache/conftool/dbconfig/20230822-110932-ladsgroup.json
  • 11:08 eoghan@cumin1001: START - Cookbook sre.hosts.reboot-single for host planet2002.codfw.wmnet
  • 11:04 XioNoX: delete old ams-ix circuits from ams-ix potal - T344579
  • 11:03 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts1001.eqiad.wmnet
  • 10:59 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host vrts1001.eqiad.wmnet
  • 10:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P50897 and previous config saved to /var/cache/conftool/dbconfig/20230822-105613-ladsgroup.json
  • 10:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P50896 and previous config saved to /var/cache/conftool/dbconfig/20230822-105425-ladsgroup.json
  • 10:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T344589)', diff saved to https://phabricator.wikimedia.org/P50895 and previous config saved to /var/cache/conftool/dbconfig/20230822-104106-ladsgroup.json
  • 10:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T344589)', diff saved to https://phabricator.wikimedia.org/P50894 and previous config saved to /var/cache/conftool/dbconfig/20230822-103919-ladsgroup.json
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T344589)', diff saved to https://phabricator.wikimedia.org/P50893 and previous config saved to /var/cache/conftool/dbconfig/20230822-103417-ladsgroup.json
  • 10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T344589)', diff saved to https://phabricator.wikimedia.org/P50892 and previous config saved to /var/cache/conftool/dbconfig/20230822-103255-ladsgroup.json
  • 10:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T344589)', diff saved to https://phabricator.wikimedia.org/P50891 and previous config saved to /var/cache/conftool/dbconfig/20230822-103237-ladsgroup.json
  • 10:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 10:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T344589)', diff saved to https://phabricator.wikimedia.org/P50890 and previous config saved to /var/cache/conftool/dbconfig/20230822-103231-ladsgroup.json
  • 10:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T344589)', diff saved to https://phabricator.wikimedia.org/P50889 and previous config saved to /var/cache/conftool/dbconfig/20230822-103212-ladsgroup.json
  • 10:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P50888 and previous config saved to /var/cache/conftool/dbconfig/20230822-101725-ladsgroup.json
  • 10:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P50887 and previous config saved to /var/cache/conftool/dbconfig/20230822-101706-ladsgroup.json
  • 10:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 10:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 10:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P50886 and previous config saved to /var/cache/conftool/dbconfig/20230822-100219-ladsgroup.json
  • 10:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P50885 and previous config saved to /var/cache/conftool/dbconfig/20230822-100200-ladsgroup.json
  • 09:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P50884 and previous config saved to /var/cache/conftool/dbconfig/20230822-095848-ladsgroup.json
  • 09:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T343718)', diff saved to https://phabricator.wikimedia.org/P50883 and previous config saved to /var/cache/conftool/dbconfig/20230822-095632-ladsgroup.json
  • 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1021 (T344589)', diff saved to https://phabricator.wikimedia.org/P50882 and previous config saved to /var/cache/conftool/dbconfig/20230822-095351-ladsgroup.json
  • 09:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 09:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 09:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T343718)', diff saved to https://phabricator.wikimedia.org/P50881 and previous config saved to /var/cache/conftool/dbconfig/20230822-095205-ladsgroup.json
  • 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T344589)', diff saved to https://phabricator.wikimedia.org/P50880 and previous config saved to /var/cache/conftool/dbconfig/20230822-094712-ladsgroup.json
  • 09:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T344589)', diff saved to https://phabricator.wikimedia.org/P50879 and previous config saved to /var/cache/conftool/dbconfig/20230822-094653-ladsgroup.json
  • 09:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T336886)', diff saved to https://phabricator.wikimedia.org/P50878 and previous config saved to /var/cache/conftool/dbconfig/20230822-094555-ladsgroup.json
  • 09:43 effie: depool codfw kartotherian (maps)
  • 09:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P50877 and previous config saved to /var/cache/conftool/dbconfig/20230822-094343-ladsgroup.json
  • 09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P50876 and previous config saved to /var/cache/conftool/dbconfig/20230822-094126-ladsgroup.json
  • 09:40 filippo@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:39 filippo@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 09:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T344589)', diff saved to https://phabricator.wikimedia.org/P50875 and previous config saved to /var/cache/conftool/dbconfig/20230822-093915-ladsgroup.json
  • 09:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 09:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 09:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T344589)', diff saved to https://phabricator.wikimedia.org/P50874 and previous config saved to /var/cache/conftool/dbconfig/20230822-093850-ladsgroup.json
  • 09:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1021', diff saved to https://phabricator.wikimedia.org/P50873 and previous config saved to /var/cache/conftool/dbconfig/20230822-093844-ladsgroup.json
  • 09:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P50872 and previous config saved to /var/cache/conftool/dbconfig/20230822-093659-ladsgroup.json
  • 09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P50871 and previous config saved to /var/cache/conftool/dbconfig/20230822-093227-root.json
  • 09:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T344589)', diff saved to https://phabricator.wikimedia.org/P50870 and previous config saved to /var/cache/conftool/dbconfig/20230822-093147-ladsgroup.json
  • 09:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 09:31 effie: pooling temporarily kartotherian codfw
  • 09:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 09:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 09:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 09:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T344589)', diff saved to https://phabricator.wikimedia.org/P50869 and previous config saved to /var/cache/conftool/dbconfig/20230822-093055-ladsgroup.json
  • 09:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P50868 and previous config saved to /var/cache/conftool/dbconfig/20230822-093049-ladsgroup.json
  • 09:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P50867 and previous config saved to /var/cache/conftool/dbconfig/20230822-092838-ladsgroup.json
  • 09:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P50866 and previous config saved to /var/cache/conftool/dbconfig/20230822-092620-ladsgroup.json
  • 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P50865 and previous config saved to /var/cache/conftool/dbconfig/20230822-092344-ladsgroup.json
  • 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1021', diff saved to https://phabricator.wikimedia.org/P50864 and previous config saved to /var/cache/conftool/dbconfig/20230822-092338-ladsgroup.json
  • 09:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P50863 and previous config saved to /var/cache/conftool/dbconfig/20230822-092153-ladsgroup.json
  • 09:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P50862 and previous config saved to /var/cache/conftool/dbconfig/20230822-091722-root.json
  • 09:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P50861 and previous config saved to /var/cache/conftool/dbconfig/20230822-091549-ladsgroup.json
  • 09:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P50860 and previous config saved to /var/cache/conftool/dbconfig/20230822-091542-ladsgroup.json
  • 09:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P50859 and previous config saved to /var/cache/conftool/dbconfig/20230822-091334-ladsgroup.json
  • 09:11 claime: Redirecting 2% of global traffic to mw-on-k8s - T341780
  • 09:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T343718)', diff saved to https://phabricator.wikimedia.org/P50858 and previous config saved to /var/cache/conftool/dbconfig/20230822-091113-ladsgroup.json
  • 09:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P50857 and previous config saved to /var/cache/conftool/dbconfig/20230822-090838-ladsgroup.json
  • 09:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1021 (T344589)', diff saved to https://phabricator.wikimedia.org/P50856 and previous config saved to /var/cache/conftool/dbconfig/20230822-090832-ladsgroup.json
  • 09:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T343718)', diff saved to https://phabricator.wikimedia.org/P50855 and previous config saved to /var/cache/conftool/dbconfig/20230822-090646-ladsgroup.json
  • 09:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P50854 and previous config saved to /var/cache/conftool/dbconfig/20230822-090217-root.json
  • 09:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2021 (T344589)', diff saved to https://phabricator.wikimedia.org/P50853 and previous config saved to /var/cache/conftool/dbconfig/20230822-090056-ladsgroup.json
  • 09:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P50852 and previous config saved to /var/cache/conftool/dbconfig/20230822-090042-ladsgroup.json
  • 09:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T336886)', diff saved to https://phabricator.wikimedia.org/P50851 and previous config saved to /var/cache/conftool/dbconfig/20230822-090036-ladsgroup.json
  • 09:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T343718)', diff saved to https://phabricator.wikimedia.org/P50850 and previous config saved to /var/cache/conftool/dbconfig/20230822-090026-ladsgroup.json
  • 09:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 09:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 09:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T343718)', diff saved to https://phabricator.wikimedia.org/P50849 and previous config saved to /var/cache/conftool/dbconfig/20230822-090016-ladsgroup.json
  • 08:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T344589)', diff saved to https://phabricator.wikimedia.org/P50848 and previous config saved to /var/cache/conftool/dbconfig/20230822-085332-ladsgroup.json
  • 08:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3315 (T343718)', diff saved to https://phabricator.wikimedia.org/P50847 and previous config saved to /var/cache/conftool/dbconfig/20230822-084724-ladsgroup.json
  • 08:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 08:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 08:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T343718)', diff saved to https://phabricator.wikimedia.org/P50846 and previous config saved to /var/cache/conftool/dbconfig/20230822-084713-ladsgroup.json
  • 08:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P50845 and previous config saved to /var/cache/conftool/dbconfig/20230822-084712-root.json
  • 08:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T344589)', diff saved to https://phabricator.wikimedia.org/P50844 and previous config saved to /var/cache/conftool/dbconfig/20230822-084703-ladsgroup.json
  • 08:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 08:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 08:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T344589)', diff saved to https://phabricator.wikimedia.org/P50843 and previous config saved to /var/cache/conftool/dbconfig/20230822-084638-ladsgroup.json
  • 08:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2021', diff saved to https://phabricator.wikimedia.org/P50842 and previous config saved to /var/cache/conftool/dbconfig/20230822-084550-ladsgroup.json
  • 08:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T344589)', diff saved to https://phabricator.wikimedia.org/P50841 and previous config saved to /var/cache/conftool/dbconfig/20230822-084536-ladsgroup.json
  • 08:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P50840 and previous config saved to /var/cache/conftool/dbconfig/20230822-084510-ladsgroup.json
  • 08:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T336886)', diff saved to https://phabricator.wikimedia.org/P50839 and previous config saved to /var/cache/conftool/dbconfig/20230822-084445-ladsgroup.json
  • 08:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 08:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 08:43 vgutierrez: restart ATS on cp5024 to clean the ATS restart alert - T344674
  • 08:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 08:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 08:42 fabfur@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_ulsfo and A:cp
  • 08:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P50838 and previous config saved to /var/cache/conftool/dbconfig/20230822-084104-ladsgroup.json
  • 08:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1136 (T344589)', diff saved to https://phabricator.wikimedia.org/P50837 and previous config saved to /var/cache/conftool/dbconfig/20230822-083912-ladsgroup.json
  • 08:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 08:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 08:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T344589)', diff saved to https://phabricator.wikimedia.org/P50836 and previous config saved to /var/cache/conftool/dbconfig/20230822-083848-ladsgroup.json
  • 08:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P50835 and previous config saved to /var/cache/conftool/dbconfig/20230822-083207-ladsgroup.json
  • 08:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P50834 and previous config saved to /var/cache/conftool/dbconfig/20230822-083132-ladsgroup.json
  • 08:31 urbanecm: mwmaint1002: Stop frwiki instance of T315510 scripts due to a large volume of T343859 errors
  • 08:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2021', diff saved to https://phabricator.wikimedia.org/P50833 and previous config saved to /var/cache/conftool/dbconfig/20230822-083044-ladsgroup.json
  • 08:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P50832 and previous config saved to /var/cache/conftool/dbconfig/20230822-083004-ladsgroup.json
  • 08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P50831 and previous config saved to /var/cache/conftool/dbconfig/20230822-082559-ladsgroup.json
  • 08:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P50830 and previous config saved to /var/cache/conftool/dbconfig/20230822-082342-ladsgroup.json
  • 08:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P50829 and previous config saved to /var/cache/conftool/dbconfig/20230822-081701-ladsgroup.json
  • 08:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P50828 and previous config saved to /var/cache/conftool/dbconfig/20230822-081626-ladsgroup.json
  • 08:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2021 (T344589)', diff saved to https://phabricator.wikimedia.org/P50827 and previous config saved to /var/cache/conftool/dbconfig/20230822-081537-ladsgroup.json
  • 08:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T343718)', diff saved to https://phabricator.wikimedia.org/P50826 and previous config saved to /var/cache/conftool/dbconfig/20230822-081458-ladsgroup.json
  • 08:12 moritzm: bounce ferm on aux-k8s-ctrl1001
  • 08:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P50825 and previous config saved to /var/cache/conftool/dbconfig/20230822-081054-ladsgroup.json
  • 08:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P50824 and previous config saved to /var/cache/conftool/dbconfig/20230822-080836-ladsgroup.json
  • 08:05 moritzm: installing Linux 4.19.289-2 on Buster hosts
  • 08:05 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1117.eqiad.wmnet with reason: host reimage
  • 08:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2021 (T344589)', diff saved to https://phabricator.wikimedia.org/P50823 and previous config saved to /var/cache/conftool/dbconfig/20230822-080413-ladsgroup.json
  • 08:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2021.codfw.wmnet with reason: Maintenance
  • 08:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2021.codfw.wmnet with reason: Maintenance
  • 08:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1021 (T344589)', diff saved to https://phabricator.wikimedia.org/P50822 and previous config saved to /var/cache/conftool/dbconfig/20230822-080328-ladsgroup.json
  • 08:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance
  • 08:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance
  • 08:02 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1117.eqiad.wmnet with reason: host reimage
  • 08:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T343718)', diff saved to https://phabricator.wikimedia.org/P50821 and previous config saved to /var/cache/conftool/dbconfig/20230822-080155-ladsgroup.json
  • 08:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T344589)', diff saved to https://phabricator.wikimedia.org/P50820 and previous config saved to /var/cache/conftool/dbconfig/20230822-080119-ladsgroup.json
  • 07:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T343718)', diff saved to https://phabricator.wikimedia.org/P50819 and previous config saved to /var/cache/conftool/dbconfig/20230822-075728-ladsgroup.json
  • 07:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 07:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 07:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T343718)', diff saved to https://phabricator.wikimedia.org/P50818 and previous config saved to /var/cache/conftool/dbconfig/20230822-075717-ladsgroup.json
  • 07:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T344589)', diff saved to https://phabricator.wikimedia.org/P50817 and previous config saved to /var/cache/conftool/dbconfig/20230822-075453-ladsgroup.json
  • 07:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 07:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 07:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T344589)', diff saved to https://phabricator.wikimedia.org/P50816 and previous config saved to /var/cache/conftool/dbconfig/20230822-075329-ladsgroup.json
  • 07:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 07:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 07:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 07:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 07:48 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog1002.eqiad.wmnet
  • 07:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 07:48 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1117.eqiad.wmnet with OS bullseye
  • 07:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 07:48 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwlog1002.eqiad.wmnet
  • 07:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T344589)', diff saved to https://phabricator.wikimedia.org/P50815 and previous config saved to /var/cache/conftool/dbconfig/20230822-074725-ladsgroup.json
  • 07:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 07:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 07:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 07:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 07:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1210 (T343718)', diff saved to https://phabricator.wikimedia.org/P50814 and previous config saved to /var/cache/conftool/dbconfig/20230822-074358-ladsgroup.json
  • 07:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 07:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 07:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T343718)', diff saved to https://phabricator.wikimedia.org/P50813 and previous config saved to /var/cache/conftool/dbconfig/20230822-074338-ladsgroup.json
  • 07:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P50812 and previous config saved to /var/cache/conftool/dbconfig/20230822-074210-ladsgroup.json
  • 07:42 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwlog1002.eqiad.wmnet
  • 07:41 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host centrallog1002.eqiad.wmnet
  • 07:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 07:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 07:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T344589)', diff saved to https://phabricator.wikimedia.org/P50811 and previous config saved to /var/cache/conftool/dbconfig/20230822-074106-ladsgroup.json
  • 07:37 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwlog2002.codfw.wmnet
  • 07:37 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog2002.codfw.wmnet
  • 07:33 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite1005.eqiad.wmnet
  • 07:30 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwlog2002.codfw.wmnet
  • 07:30 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host centrallog2002.codfw.wmnet
  • 07:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P50810 and previous config saved to /var/cache/conftool/dbconfig/20230822-072831-ladsgroup.json
  • 07:27 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1116.eqiad.wmnet with OS bullseye
  • 07:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P50809 and previous config saved to /var/cache/conftool/dbconfig/20230822-072704-ladsgroup.json
  • 07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P50808 and previous config saved to /var/cache/conftool/dbconfig/20230822-072600-ladsgroup.json
  • 07:25 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite1005.eqiad.wmnet
  • 07:21 sgimeno@deploy1002: Finished scap: Backport for GrowthExperiments: turn off AddLink in aswiki (T344319) (duration: 11m 41s)
  • 07:14 sgimeno@deploy1002: sgimeno: Continuing with sync
  • 07:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P50807 and previous config saved to /var/cache/conftool/dbconfig/20230822-071325-ladsgroup.json
  • 07:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T343718)', diff saved to https://phabricator.wikimedia.org/P50806 and previous config saved to /var/cache/conftool/dbconfig/20230822-071158-ladsgroup.json
  • 07:11 sgimeno@deploy1002: sgimeno: Backport for GrowthExperiments: turn off AddLink in aswiki (T344319) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P50805 and previous config saved to /var/cache/conftool/dbconfig/20230822-071053-ladsgroup.json
  • 07:09 sgimeno@deploy1002: Started scap: Backport for GrowthExperiments: turn off AddLink in aswiki (T344319)
  • 07:05 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1115.eqiad.wmnet with OS bullseye
  • 07:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 07:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 07:03 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1116.eqiad.wmnet with reason: host reimage
  • 07:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 45356
  • 07:02 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 45356
  • 07:01 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1004.wikimedia.org
  • 07:00 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1116.eqiad.wmnet with reason: host reimage
  • 06:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T343718)', diff saved to https://phabricator.wikimedia.org/P50804 and previous config saved to /var/cache/conftool/dbconfig/20230822-065819-ladsgroup.json
  • 06:57 moritzm: installing intel-microcode security updates on buster hosts
  • 06:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 06:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 06:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T344589)', diff saved to https://phabricator.wikimedia.org/P50803 and previous config saved to /var/cache/conftool/dbconfig/20230822-065547-ladsgroup.json
  • 06:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2103 T344666', diff saved to https://phabricator.wikimedia.org/P50802 and previous config saved to /var/cache/conftool/dbconfig/20230822-065518-ladsgroup.json
  • 06:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T343718)', diff saved to https://phabricator.wikimedia.org/P50801 and previous config saved to /var/cache/conftool/dbconfig/20230822-065440-ladsgroup.json
  • 06:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 06:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 06:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T343718)', diff saved to https://phabricator.wikimedia.org/P50800 and previous config saved to /var/cache/conftool/dbconfig/20230822-065430-ladsgroup.json
  • 06:53 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab1004.wikimedia.org
  • 06:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2112 to s1 primary T344666', diff saved to https://phabricator.wikimedia.org/P50799 and previous config saved to /var/cache/conftool/dbconfig/20230822-065316-ladsgroup.json
  • 06:52 Amir1: Starting s1 codfw failover from db2103 to db2112 - T344666
  • 06:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1223 (T344589)', diff saved to https://phabricator.wikimedia.org/P50798 and previous config saved to /var/cache/conftool/dbconfig/20230822-064828-ladsgroup.json
  • 06:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 06:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 06:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T344589)', diff saved to https://phabricator.wikimedia.org/P50797 and previous config saved to /var/cache/conftool/dbconfig/20230822-064804-ladsgroup.json
  • 06:47 ladsgroup@deploy1002: Finished scap: Backport for Enable URL shortener in sidebar in jawiki and zhwiki (T267921) (duration: 10m 06s)
  • 06:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1200 (T343718)', diff saved to https://phabricator.wikimedia.org/P50796 and previous config saved to /var/cache/conftool/dbconfig/20230822-064716-ladsgroup.json
  • 06:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 06:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 06:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T343718)', diff saved to https://phabricator.wikimedia.org/P50795 and previous config saved to /var/cache/conftool/dbconfig/20230822-064706-ladsgroup.json
  • 06:46 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1116.eqiad.wmnet with OS bullseye
  • 06:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T344589)', diff saved to https://phabricator.wikimedia.org/P50794 and previous config saved to /var/cache/conftool/dbconfig/20230822-064106-ladsgroup.json
  • 06:40 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 06:40 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1115.eqiad.wmnet with reason: host reimage
  • 06:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P50793 and previous config saved to /var/cache/conftool/dbconfig/20230822-063923-ladsgroup.json
  • 06:39 ladsgroup@deploy1002: ladsgroup: Backport for Enable URL shortener in sidebar in jawiki and zhwiki (T267921) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 06:37 ladsgroup@deploy1002: Started scap: Backport for Enable URL shortener in sidebar in jawiki and zhwiki (T267921)
  • 06:36 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1115.eqiad.wmnet with reason: host reimage
  • 06:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P50792 and previous config saved to /var/cache/conftool/dbconfig/20230822-063258-ladsgroup.json
  • 06:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P50791 and previous config saved to /var/cache/conftool/dbconfig/20230822-063200-ladsgroup.json
  • 06:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2112 with weight 0 T344666', diff saved to https://phabricator.wikimedia.org/P50790 and previous config saved to /var/cache/conftool/dbconfig/20230822-062854-ladsgroup.json
  • 06:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 35 hosts with reason: Primary switchover s1 T344666
  • 06:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 35 hosts with reason: Primary switchover s1 T344666
  • 06:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P50789 and previous config saved to /var/cache/conftool/dbconfig/20230822-062600-ladsgroup.json
  • 06:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P50788 and previous config saved to /var/cache/conftool/dbconfig/20230822-062417-ladsgroup.json
  • 06:21 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1115.eqiad.wmnet with OS bullseye
  • 06:21 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1114.eqiad.wmnet with OS bullseye
  • 06:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2112 (T344589)', diff saved to https://phabricator.wikimedia.org/P50787 and previous config saved to /var/cache/conftool/dbconfig/20230822-061956-ladsgroup.json
  • 06:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 06:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 06:17 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1113.eqiad.wmnet with OS bullseye
  • 06:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P50786 and previous config saved to /var/cache/conftool/dbconfig/20230822-061752-ladsgroup.json
  • 06:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P50785 and previous config saved to /var/cache/conftool/dbconfig/20230822-061653-ladsgroup.json
  • 06:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 06:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 06:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P50784 and previous config saved to /var/cache/conftool/dbconfig/20230822-061054-ladsgroup.json
  • 06:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 06:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 06:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T343718)', diff saved to https://phabricator.wikimedia.org/P50783 and previous config saved to /var/cache/conftool/dbconfig/20230822-060911-ladsgroup.json
  • 06:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1163 T344621', diff saved to https://phabricator.wikimedia.org/P50782 and previous config saved to /var/cache/conftool/dbconfig/20230822-060710-ladsgroup.json
  • 06:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T344589)', diff saved to https://phabricator.wikimedia.org/P50781 and previous config saved to /var/cache/conftool/dbconfig/20230822-060246-ladsgroup.json
  • 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T343718)', diff saved to https://phabricator.wikimedia.org/P50780 and previous config saved to /var/cache/conftool/dbconfig/20230822-060147-ladsgroup.json
  • 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1184 to s1 primary and set section read-write T344621', diff saved to https://phabricator.wikimedia.org/P50779 and previous config saved to /var/cache/conftool/dbconfig/20230822-060131-ladsgroup.json
  • 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s1 eqiad as read-only for maintenance - T344621', diff saved to https://phabricator.wikimedia.org/P50778 and previous config saved to /var/cache/conftool/dbconfig/20230822-060104-ladsgroup.json
  • 06:00 Amir1: Starting s1 eqiad failover from db1163 to db1184 - T344621
  • 05:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 12897
  • 05:57 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1114.eqiad.wmnet with reason: host reimage
  • 05:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 12897
  • 05:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T344589)', diff saved to https://phabricator.wikimedia.org/P50777 and previous config saved to /var/cache/conftool/dbconfig/20230822-055548-ladsgroup.json
  • 05:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1212 (T344589)', diff saved to https://phabricator.wikimedia.org/P50776 and previous config saved to /var/cache/conftool/dbconfig/20230822-055528-ladsgroup.json
  • 05:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 05:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 05:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 05:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 05:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T344589)', diff saved to https://phabricator.wikimedia.org/P50775 and previous config saved to /var/cache/conftool/dbconfig/20230822-055457-ladsgroup.json
  • 05:54 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1113.eqiad.wmnet with reason: host reimage
  • 05:53 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1114.eqiad.wmnet with reason: host reimage
  • 05:51 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1113.eqiad.wmnet with reason: host reimage
  • 05:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T343718)', diff saved to https://phabricator.wikimedia.org/P50774 and previous config saved to /var/cache/conftool/dbconfig/20230822-055101-ladsgroup.json
  • 05:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 05:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 05:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T343718)', diff saved to https://phabricator.wikimedia.org/P50773 and previous config saved to /var/cache/conftool/dbconfig/20230822-055050-ladsgroup.json
  • 05:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1185 (T343718)', diff saved to https://phabricator.wikimedia.org/P50772 and previous config saved to /var/cache/conftool/dbconfig/20230822-055040-ladsgroup.json
  • 05:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 05:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 05:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T343718)', diff saved to https://phabricator.wikimedia.org/P50771 and previous config saved to /var/cache/conftool/dbconfig/20230822-055030-ladsgroup.json
  • 05:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T344589)', diff saved to https://phabricator.wikimedia.org/P50770 and previous config saved to /var/cache/conftool/dbconfig/20230822-054920-ladsgroup.json
  • 05:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 05:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 05:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T344589)', diff saved to https://phabricator.wikimedia.org/P50769 and previous config saved to /var/cache/conftool/dbconfig/20230822-054855-ladsgroup.json
  • 05:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P50768 and previous config saved to /var/cache/conftool/dbconfig/20230822-053951-ladsgroup.json
  • 05:38 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1114.eqiad.wmnet with OS bullseye
  • 05:37 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1113.eqiad.wmnet with OS bullseye
  • 05:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P50767 and previous config saved to /var/cache/conftool/dbconfig/20230822-053544-ladsgroup.json
  • 05:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P50766 and previous config saved to /var/cache/conftool/dbconfig/20230822-053524-ladsgroup.json
  • 05:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P50765 and previous config saved to /var/cache/conftool/dbconfig/20230822-053349-ladsgroup.json
  • 05:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P50764 and previous config saved to /var/cache/conftool/dbconfig/20230822-052445-ladsgroup.json
  • 05:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P50763 and previous config saved to /var/cache/conftool/dbconfig/20230822-052038-ladsgroup.json
  • 05:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P50762 and previous config saved to /var/cache/conftool/dbconfig/20230822-052018-ladsgroup.json
  • 05:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P50761 and previous config saved to /var/cache/conftool/dbconfig/20230822-051843-ladsgroup.json
  • 05:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1184 with weight 0 T344621', diff saved to https://phabricator.wikimedia.org/P50760 and previous config saved to /var/cache/conftool/dbconfig/20230822-051347-ladsgroup.json
  • 05:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 35 hosts with reason: Primary switchover s1 T344621
  • 05:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 35 hosts with reason: Primary switchover s1 T344621
  • 05:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T344589)', diff saved to https://phabricator.wikimedia.org/P50759 and previous config saved to /var/cache/conftool/dbconfig/20230822-051204-ladsgroup.json
  • 05:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T344589)', diff saved to https://phabricator.wikimedia.org/P50758 and previous config saved to /var/cache/conftool/dbconfig/20230822-050938-ladsgroup.json
  • 05:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T344589)', diff saved to https://phabricator.wikimedia.org/P50757 and previous config saved to /var/cache/conftool/dbconfig/20230822-050543-ladsgroup.json
  • 05:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 05:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T343718)', diff saved to https://phabricator.wikimedia.org/P50756 and previous config saved to /var/cache/conftool/dbconfig/20230822-050532-ladsgroup.json
  • 05:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 05:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T343718)', diff saved to https://phabricator.wikimedia.org/P50755 and previous config saved to /var/cache/conftool/dbconfig/20230822-050511-ladsgroup.json
  • 05:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T344589)', diff saved to https://phabricator.wikimedia.org/P50754 and previous config saved to /var/cache/conftool/dbconfig/20230822-050337-ladsgroup.json
  • 04:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1198 (T344589)', diff saved to https://phabricator.wikimedia.org/P50753 and previous config saved to /var/cache/conftool/dbconfig/20230822-045716-ladsgroup.json
  • 04:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 04:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T344589)', diff saved to https://phabricator.wikimedia.org/P50752 and previous config saved to /var/cache/conftool/dbconfig/20230822-045707-ladsgroup.json
  • 04:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 04:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 04:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 04:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 04:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T344589)', diff saved to https://phabricator.wikimedia.org/P50751 and previous config saved to /var/cache/conftool/dbconfig/20230822-045651-ladsgroup.json
  • 04:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 04:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T344589)', diff saved to https://phabricator.wikimedia.org/P50750 and previous config saved to /var/cache/conftool/dbconfig/20230822-045638-ladsgroup.json
  • 04:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T343718)', diff saved to https://phabricator.wikimedia.org/P50749 and previous config saved to /var/cache/conftool/dbconfig/20230822-045233-ladsgroup.json
  • 04:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 04:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 04:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 04:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 04:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T343718)', diff saved to https://phabricator.wikimedia.org/P50748 and previous config saved to /var/cache/conftool/dbconfig/20230822-045218-ladsgroup.json
  • 04:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T343718)', diff saved to https://phabricator.wikimedia.org/P50747 and previous config saved to /var/cache/conftool/dbconfig/20230822-044715-ladsgroup.json
  • 04:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 04:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 04:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 04:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 04:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P50746 and previous config saved to /var/cache/conftool/dbconfig/20230822-044145-ladsgroup.json
  • 04:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P50745 and previous config saved to /var/cache/conftool/dbconfig/20230822-044131-ladsgroup.json
  • 04:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P50744 and previous config saved to /var/cache/conftool/dbconfig/20230822-043712-ladsgroup.json
  • 04:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 04:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 04:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T343718)', diff saved to https://phabricator.wikimedia.org/P50743 and previous config saved to /var/cache/conftool/dbconfig/20230822-043058-ladsgroup.json
  • 04:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P50742 and previous config saved to /var/cache/conftool/dbconfig/20230822-042639-ladsgroup.json
  • 04:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P50741 and previous config saved to /var/cache/conftool/dbconfig/20230822-042625-ladsgroup.json
  • 04:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P50740 and previous config saved to /var/cache/conftool/dbconfig/20230822-042206-ladsgroup.json
  • 04:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P50739 and previous config saved to /var/cache/conftool/dbconfig/20230822-041551-ladsgroup.json
  • 04:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T344589)', diff saved to https://phabricator.wikimedia.org/P50738 and previous config saved to /var/cache/conftool/dbconfig/20230822-041132-ladsgroup.json
  • 04:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T344589)', diff saved to https://phabricator.wikimedia.org/P50737 and previous config saved to /var/cache/conftool/dbconfig/20230822-041119-ladsgroup.json
  • 04:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T343718)', diff saved to https://phabricator.wikimedia.org/P50736 and previous config saved to /var/cache/conftool/dbconfig/20230822-040658-ladsgroup.json
  • 04:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T344589)', diff saved to https://phabricator.wikimedia.org/P50735 and previous config saved to /var/cache/conftool/dbconfig/20230822-040451-ladsgroup.json
  • 04:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 04:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 04:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T344589)', diff saved to https://phabricator.wikimedia.org/P50734 and previous config saved to /var/cache/conftool/dbconfig/20230822-040315-ladsgroup.json
  • 04:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 04:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 04:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T344589)', diff saved to https://phabricator.wikimedia.org/P50733 and previous config saved to /var/cache/conftool/dbconfig/20230822-040251-ladsgroup.json
  • 04:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P50732 and previous config saved to /var/cache/conftool/dbconfig/20230822-040045-ladsgroup.json
  • 03:59 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.20 (duration: 02m 06s)
  • 03:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 03:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 03:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T344589)', diff saved to https://phabricator.wikimedia.org/P50731 and previous config saved to /var/cache/conftool/dbconfig/20230822-035831-ladsgroup.json
  • 03:57 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.23 refs T343725 (duration: 54m 34s)
  • 03:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T343718)', diff saved to https://phabricator.wikimedia.org/P50730 and previous config saved to /var/cache/conftool/dbconfig/20230822-035352-ladsgroup.json
  • 03:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 03:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 03:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T343718)', diff saved to https://phabricator.wikimedia.org/P50729 and previous config saved to /var/cache/conftool/dbconfig/20230822-035342-ladsgroup.json
  • 03:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P50728 and previous config saved to /var/cache/conftool/dbconfig/20230822-034745-ladsgroup.json
  • 03:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T343718)', diff saved to https://phabricator.wikimedia.org/P50727 and previous config saved to /var/cache/conftool/dbconfig/20230822-034539-ladsgroup.json
  • 03:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P50726 and previous config saved to /var/cache/conftool/dbconfig/20230822-034325-ladsgroup.json
  • 03:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P50725 and previous config saved to /var/cache/conftool/dbconfig/20230822-033835-ladsgroup.json
  • 03:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P50724 and previous config saved to /var/cache/conftool/dbconfig/20230822-033239-ladsgroup.json
  • 03:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P50723 and previous config saved to /var/cache/conftool/dbconfig/20230822-032819-ladsgroup.json
  • 03:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T343718)', diff saved to https://phabricator.wikimedia.org/P50722 and previous config saved to /var/cache/conftool/dbconfig/20230822-032713-ladsgroup.json
  • 03:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 03:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 03:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T343718)', diff saved to https://phabricator.wikimedia.org/P50721 and previous config saved to /var/cache/conftool/dbconfig/20230822-032703-ladsgroup.json
  • 03:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P50720 and previous config saved to /var/cache/conftool/dbconfig/20230822-032329-ladsgroup.json
  • 03:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T344589)', diff saved to https://phabricator.wikimedia.org/P50719 and previous config saved to /var/cache/conftool/dbconfig/20230822-031733-ladsgroup.json
  • 03:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T344589)', diff saved to https://phabricator.wikimedia.org/P50718 and previous config saved to /var/cache/conftool/dbconfig/20230822-031312-ladsgroup.json
  • 03:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P50717 and previous config saved to /var/cache/conftool/dbconfig/20230822-031156-ladsgroup.json
  • 03:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T344589)', diff saved to https://phabricator.wikimedia.org/P50716 and previous config saved to /var/cache/conftool/dbconfig/20230822-030911-ladsgroup.json
  • 03:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 03:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 03:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T344589)', diff saved to https://phabricator.wikimedia.org/P50715 and previous config saved to /var/cache/conftool/dbconfig/20230822-030847-ladsgroup.json
  • 03:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T343718)', diff saved to https://phabricator.wikimedia.org/P50714 and previous config saved to /var/cache/conftool/dbconfig/20230822-030823-ladsgroup.json
  • 03:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T344589)', diff saved to https://phabricator.wikimedia.org/P50713 and previous config saved to /var/cache/conftool/dbconfig/20230822-030526-ladsgroup.json
  • 03:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 03:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 03:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T344589)', diff saved to https://phabricator.wikimedia.org/P50712 and previous config saved to /var/cache/conftool/dbconfig/20230822-030501-ladsgroup.json
  • 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.23 refs T343725
  • 02:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P50711 and previous config saved to /var/cache/conftool/dbconfig/20230822-025650-ladsgroup.json
  • 02:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P50710 and previous config saved to /var/cache/conftool/dbconfig/20230822-025341-ladsgroup.json
  • 02:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P50709 and previous config saved to /var/cache/conftool/dbconfig/20230822-024954-ladsgroup.json
  • 02:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T343718)', diff saved to https://phabricator.wikimedia.org/P50708 and previous config saved to /var/cache/conftool/dbconfig/20230822-024822-ladsgroup.json
  • 02:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 02:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 02:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T343718)', diff saved to https://phabricator.wikimedia.org/P50707 and previous config saved to /var/cache/conftool/dbconfig/20230822-024144-ladsgroup.json
  • 02:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P50706 and previous config saved to /var/cache/conftool/dbconfig/20230822-023835-ladsgroup.json
  • 02:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P50705 and previous config saved to /var/cache/conftool/dbconfig/20230822-023448-ladsgroup.json
  • 02:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 02:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 02:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1130 (T343718)', diff saved to https://phabricator.wikimedia.org/P50704 and previous config saved to /var/cache/conftool/dbconfig/20230822-022926-ladsgroup.json
  • 02:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 02:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 02:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T344589)', diff saved to https://phabricator.wikimedia.org/P50703 and previous config saved to /var/cache/conftool/dbconfig/20230822-022328-ladsgroup.json
  • 02:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T344589)', diff saved to https://phabricator.wikimedia.org/P50702 and previous config saved to /var/cache/conftool/dbconfig/20230822-021942-ladsgroup.json
  • 02:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T344589)', diff saved to https://phabricator.wikimedia.org/P50701 and previous config saved to /var/cache/conftool/dbconfig/20230822-021715-ladsgroup.json
  • 02:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 02:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 02:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 02:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 02:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T344589)', diff saved to https://phabricator.wikimedia.org/P50700 and previous config saved to /var/cache/conftool/dbconfig/20230822-021307-ladsgroup.json
  • 02:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 02:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 00:45 tzatziki: removing two files for legal compliance
  • 00:37 eileen: config revision changed from 322488b1 to 1ea8201f
  • 00:31 eileen: config revision changed from a05a2a82 to 322488b1 disable jobs
  • 00:09 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_eqsin and A:cp

2023-08-21

  • 23:51 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_esams and A:cp
  • 23:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2025.codfw.wmnet with OS bullseye
  • 23:39 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:26 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:16 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2024.codfw.wmnet with OS bullseye
  • 23:08 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:07 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2025.codfw.wmnet with reason: host reimage
  • 22:58 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2025.codfw.wmnet with reason: host reimage
  • 22:52 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2025.codfw.wmnet with OS bullseye
  • 22:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2024.codfw.wmnet with reason: host reimage
  • 22:48 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2024.codfw.wmnet with reason: host reimage
  • 22:41 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2024.codfw.wmnet with OS bullseye
  • 21:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2025.codfw.wmnet with OS bullseye
  • 21:36 urbanecm: mwmaint1002: foreachwikiindblist 'group2 & s6' extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --current --all --touched-after=20230615000000 (T315510; restart, per T315510#9107776)
  • 21:35 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2024.codfw.wmnet with OS bullseye
  • 21:31 sbassett: Deployed updated security mitigation for T336027
  • 21:25 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:59 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1008.eqiad.wmnet with OS bullseye
  • 20:57 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2025.codfw.wmnet with OS bullseye
  • 20:55 eileen: civicrm upgraded from 4f7f1e68 to bfedbcb9
  • 20:52 eileen: civicrm upgraded from 4f7f1e68 to bfedbcb9
  • 20:48 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2024.codfw.wmnet with OS bullseye
  • 20:45 urbanecm@deploy1002: Finished scap: Backport for Add botadmin group on eswiki, correction patch (T342484) (duration: 12m 22s)
  • 20:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2025']
  • 20:37 urbanecm@deploy1002: hamishz and urbanecm: Continuing with sync
  • 20:36 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1008.eqiad.wmnet with reason: host reimage
  • 20:34 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2025']
  • 20:34 urbanecm@deploy1002: hamishz and urbanecm: Backport for Add botadmin group on eswiki, correction patch (T342484) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2025']
  • 20:33 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1008.eqiad.wmnet with reason: host reimage
  • 20:32 urbanecm@deploy1002: Started scap: Backport for Add botadmin group on eswiki, correction patch (T342484)
  • 20:32 urbanecm@deploy1002: Finished scap: Backport for Growth: Remove wgWelcomeSurveyEnableWithHomepage (T342353 T344619), revalidateLinkRecommendations: Load scoreLessThan correctly (T316079), LinkRecommendationUpdater: Load link-recommendation even if disabled (T344343) (duration: 11m 02s)
  • 20:25 urbanecm@deploy1002: urbanecm: Continuing with sync
  • 20:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T344589)', diff saved to https://phabricator.wikimedia.org/P50699 and previous config saved to /var/cache/conftool/dbconfig/20230821-202433-ladsgroup.json
  • 20:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2024']
  • 20:23 urbanecm@deploy1002: urbanecm: Backport for Growth: Remove wgWelcomeSurveyEnableWithHomepage (T342353 T344619), revalidateLinkRecommendations: Load scoreLessThan correctly (T316079), LinkRecommendationUpdater: Load link-recommendation even if disabled (T344343) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwde
  • 20:21 urbanecm@deploy1002: Started scap: Backport for Growth: Remove wgWelcomeSurveyEnableWithHomepage (T342353 T344619), revalidateLinkRecommendations: Load scoreLessThan correctly (T316079), LinkRecommendationUpdater: Load link-recommendation even if disabled (T344343)
  • 20:21 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1008.eqiad.wmnet with OS bullseye
  • 20:20 urbanecm@deploy1002: Finished scap: Backport for Add botadmin group on eswiki (T342484) (duration: 18m 08s)
  • 20:18 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2025']
  • 20:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2023']
  • 20:18 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2023']
  • 20:17 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2024']
  • 20:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2023']
  • 20:13 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2023']
  • 20:13 urbanecm@deploy1002: hamishz and urbanecm: Continuing with sync
  • 20:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2023']
  • 20:12 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2023']
  • 20:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2023']
  • 20:11 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2023']
  • 20:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2023']
  • 20:09 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2023']
  • 20:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P50698 and previous config saved to /var/cache/conftool/dbconfig/20230821-200927-ladsgroup.json
  • 20:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2024']
  • 20:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2023']
  • 20:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 20:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 20:08 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_esams and A:cp
  • 20:05 herron@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka logging-eqiad cluster: Reboot kafka nodes
  • 20:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 20:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 20:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T344589)', diff saved to https://phabricator.wikimedia.org/P50697 and previous config saved to /var/cache/conftool/dbconfig/20230821-200344-ladsgroup.json
  • 20:03 urbanecm@deploy1002: hamishz and urbanecm: Backport for Add botadmin group on eswiki (T342484) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:02 urbanecm@deploy1002: Started scap: Backport for Add botadmin group on eswiki (T342484)
  • 19:58 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2024']
  • 19:57 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2023']
  • 19:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2023']
  • 19:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P50696 and previous config saved to /var/cache/conftool/dbconfig/20230821-195420-ladsgroup.json
  • 19:53 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2023']
  • 19:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P50695 and previous config saved to /var/cache/conftool/dbconfig/20230821-194838-ladsgroup.json
  • 19:43 fabfur@cumin1001: conftool action : set/pooled=yes; selector: dc=esams,cluster=cache_text
  • 19:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T344589)', diff saved to https://phabricator.wikimedia.org/P50694 and previous config saved to /var/cache/conftool/dbconfig/20230821-193914-ladsgroup.json
  • 19:38 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:38 inflatador: bking@wdqs1008 'depooling for firmware update T343124'
  • 19:37 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs1008.eqiad.wmnet with reason: T343124
  • 19:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2024.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:37 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs1008.eqiad.wmnet with reason: T343124
  • 19:34 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3073.esams.wmnet
  • 19:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T344589)', diff saved to https://phabricator.wikimedia.org/P50693 and previous config saved to /var/cache/conftool/dbconfig/20230821-193402-ladsgroup.json
  • 19:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 19:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 19:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T344589)', diff saved to https://phabricator.wikimedia.org/P50692 and previous config saved to /var/cache/conftool/dbconfig/20230821-193337-ladsgroup.json
  • 19:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P50691 and previous config saved to /var/cache/conftool/dbconfig/20230821-193331-ladsgroup.json
  • 19:29 taavi: run extensions/OATHAuth/maintenance/UpdateForMultipleDevicesSupport.php on checkuserwiki, T242031
  • 19:25 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2024.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:25 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3073.esams.wmnet
  • 19:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2023.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:25 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3072.esams.wmnet
  • 19:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P50689 and previous config saved to /var/cache/conftool/dbconfig/20230821-191830-ladsgroup.json
  • 19:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T344589)', diff saved to https://phabricator.wikimedia.org/P50688 and previous config saved to /var/cache/conftool/dbconfig/20230821-191825-ladsgroup.json
  • 19:17 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs1009.eqiad.wmnet
  • 19:17 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs1009.eqiad.wmnet
  • 19:16 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs1008.eqiad.wmnet
  • 19:16 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs1008.eqiad.wmnet
  • 19:16 bking@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts wdqs1008.eqiad.wmnet
  • 19:16 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs1008.eqiad.wmnet
  • 19:15 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3072.esams.wmnet
  • 19:15 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3071.esams.wmnet
  • 19:11 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aux-k8s-etcd1003.eqiad.wmnet with reason: New kernel, T344587
  • 19:10 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on aux-k8s-etcd1003.eqiad.wmnet with reason: New kernel, T344587
  • 19:10 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aux-k8s-etcd1001.eqiad.wmnet with reason: New kernel, T344587
  • 19:10 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2023.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:10 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on aux-k8s-etcd1001.eqiad.wmnet with reason: New kernel, T344587
  • 19:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T344589)', diff saved to https://phabricator.wikimedia.org/P50687 and previous config saved to /var/cache/conftool/dbconfig/20230821-191013-ladsgroup.json
  • 19:08 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:08 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for wdqs202[3-5] - pt1979@cumin2002"
  • 19:07 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for wdqs202[3-5] - pt1979@cumin2002"
  • 19:06 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3071.esams.wmnet
  • 19:06 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3070.esams.wmnet
  • 19:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 19:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P50686 and previous config saved to /var/cache/conftool/dbconfig/20230821-190324-ladsgroup.json
  • 19:00 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aux-k8s-ctrl1002.eqiad.wmnet with reason: New kernel, T344587
  • 19:00 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on aux-k8s-ctrl1002.eqiad.wmnet with reason: New kernel, T344587
  • 18:59 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aux-k8s-ctrl1001.eqiad.wmnet with reason: New kernel, T344587
  • 18:59 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on aux-k8s-ctrl1001.eqiad.wmnet with reason: New kernel, T344587
  • 18:57 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3070.esams.wmnet
  • 18:57 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3069.esams.wmnet
  • 18:56 urandom: Upgrading aq1010/cassandra-{a,b} (canary) to Cassandra 4.1.1 — T339299
  • 18:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P50685 and previous config saved to /var/cache/conftool/dbconfig/20230821-185506-ladsgroup.json
  • 18:54 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aux-k8s-worker1002.eqiad.wmnet with reason: New kernel, T344587
  • 18:54 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on aux-k8s-worker1002.eqiad.wmnet with reason: New kernel, T344587
  • 18:50 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_eqsin and A:cp
  • 18:49 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3069.esams.wmnet
  • 18:49 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3068.esams.wmnet
  • 18:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T344589)', diff saved to https://phabricator.wikimedia.org/P50684 and previous config saved to /var/cache/conftool/dbconfig/20230821-184818-ladsgroup.json
  • 18:46 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aux-k8s-worker1001.eqiad.wmnet with reason: New kernel, T344587
  • 18:46 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on aux-k8s-worker1001.eqiad.wmnet with reason: New kernel, T344587
  • 18:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T344589)', diff saved to https://phabricator.wikimedia.org/P50683 and previous config saved to /var/cache/conftool/dbconfig/20230821-184305-ladsgroup.json
  • 18:40 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3068.esams.wmnet
  • 18:40 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3067.esams.wmnet
  • 18:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P50682 and previous config saved to /var/cache/conftool/dbconfig/20230821-184000-ladsgroup.json
  • 18:31 fabfur: reboot cp[3067-3073].esams.wmnet for T344587
  • 18:31 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3067.esams.wmnet
  • 18:28 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3081.esams.wmnet
  • 18:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P50681 and previous config saved to /var/cache/conftool/dbconfig/20230821-182759-ladsgroup.json
  • 18:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T344589)', diff saved to https://phabricator.wikimedia.org/P50680 and previous config saved to /var/cache/conftool/dbconfig/20230821-182452-ladsgroup.json
  • 18:23 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host durum3004.esams.wmnet
  • 18:21 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host durum3003.esams.wmnet
  • 18:19 fabfur: reboot cp3081 for T344587
  • 18:19 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host durum3004.esams.wmnet
  • 18:19 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3081.esams.wmnet
  • 18:18 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host durum3003.esams.wmnet
  • 18:18 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cp3081.esams.wmnet
  • 18:17 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3081.esams.wmnet
  • 18:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3316 (T344589)', diff saved to https://phabricator.wikimedia.org/P50679 and previous config saved to /var/cache/conftool/dbconfig/20230821-181752-ladsgroup.json
  • 18:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3315 (T344589)', diff saved to https://phabricator.wikimedia.org/P50678 and previous config saved to /var/cache/conftool/dbconfig/20230821-181629-ladsgroup.json
  • 18:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 18:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 18:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T344589)', diff saved to https://phabricator.wikimedia.org/P50677 and previous config saved to /var/cache/conftool/dbconfig/20230821-181615-ladsgroup.json
  • 18:14 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3066.esams.wmnet
  • 18:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P50676 and previous config saved to /var/cache/conftool/dbconfig/20230821-181252-ladsgroup.json
  • 18:06 herron@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka logging-eqiad cluster: Reboot kafka nodes
  • 18:05 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3066.esams.wmnet
  • 18:05 fabfur: reboot cp3066 for T344587
  • 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P50675 and previous config saved to /var/cache/conftool/dbconfig/20230821-180108-ladsgroup.json
  • 17:59 herron@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka logging-codfw cluster: Reboot kafka nodes
  • 17:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T344589)', diff saved to https://phabricator.wikimedia.org/P50674 and previous config saved to /var/cache/conftool/dbconfig/20230821-175746-ladsgroup.json
  • 17:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T344589)', diff saved to https://phabricator.wikimedia.org/P50673 and previous config saved to /var/cache/conftool/dbconfig/20230821-175240-ladsgroup.json
  • 17:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T344589)', diff saved to https://phabricator.wikimedia.org/P50672 and previous config saved to /var/cache/conftool/dbconfig/20230821-175115-ladsgroup.json
  • 17:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 17:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 17:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T344589)', diff saved to https://phabricator.wikimedia.org/P50671 and previous config saved to /var/cache/conftool/dbconfig/20230821-175050-ladsgroup.json
  • 17:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P50670 and previous config saved to /var/cache/conftool/dbconfig/20230821-174602-ladsgroup.json
  • 17:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P50669 and previous config saved to /var/cache/conftool/dbconfig/20230821-173544-ladsgroup.json
  • 17:31 ladsgroup@deploy1002: Finished scap: Backport for Fix update for WRITE stage of extlinks migration (duration: 09m 50s)
  • 17:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T344589)', diff saved to https://phabricator.wikimedia.org/P50668 and previous config saved to /var/cache/conftool/dbconfig/20230821-173056-ladsgroup.json
  • 17:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1200 (T344589)', diff saved to https://phabricator.wikimedia.org/P50667 and previous config saved to /var/cache/conftool/dbconfig/20230821-172456-ladsgroup.json
  • 17:24 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 17:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 17:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 17:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T344589)', diff saved to https://phabricator.wikimedia.org/P50666 and previous config saved to /var/cache/conftool/dbconfig/20230821-172432-ladsgroup.json
  • 17:22 ladsgroup@deploy1002: ladsgroup: Backport for Fix update for WRITE stage of extlinks migration synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 17:21 ladsgroup@deploy1002: Started scap: Backport for Fix update for WRITE stage of extlinks migration
  • 17:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P50665 and previous config saved to /var/cache/conftool/dbconfig/20230821-172037-ladsgroup.json
  • 17:18 ladsgroup@deploy1002: Sync cancelled.
  • 17:17 ladsgroup@deploy1002: ladsgroup: Backport for Stop writing to the old columns of extlinks everywhere except s1, s4 (T342683) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 17:16 ladsgroup@deploy1002: Started scap: Backport for Stop writing to the old columns of extlinks everywhere except s1, s4 (T342683)
  • 17:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P50664 and previous config saved to /var/cache/conftool/dbconfig/20230821-170926-ladsgroup.json
  • 17:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T344589)', diff saved to https://phabricator.wikimedia.org/P50663 and previous config saved to /var/cache/conftool/dbconfig/20230821-170531-ladsgroup.json
  • 17:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T344589)', diff saved to https://phabricator.wikimedia.org/P50662 and previous config saved to /var/cache/conftool/dbconfig/20230821-170016-ladsgroup.json
  • 17:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 16:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T344589)', diff saved to https://phabricator.wikimedia.org/P50661 and previous config saved to /var/cache/conftool/dbconfig/20230821-165951-ladsgroup.json
  • 16:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P50660 and previous config saved to /var/cache/conftool/dbconfig/20230821-165420-ladsgroup.json
  • 16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P50659 and previous config saved to /var/cache/conftool/dbconfig/20230821-164445-ladsgroup.json
  • 16:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T344589)', diff saved to https://phabricator.wikimedia.org/P50658 and previous config saved to /var/cache/conftool/dbconfig/20230821-163913-ladsgroup.json
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P50657 and previous config saved to /var/cache/conftool/dbconfig/20230821-162939-ladsgroup.json
  • 16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T344589)', diff saved to https://phabricator.wikimedia.org/P50656 and previous config saved to /var/cache/conftool/dbconfig/20230821-161432-ladsgroup.json
  • 16:14 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 07m 11s)
  • 16:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T344589)', diff saved to https://phabricator.wikimedia.org/P50655 and previous config saved to /var/cache/conftool/dbconfig/20230821-161216-ladsgroup.json
  • 16:06 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 07m 18s)
  • 16:00 herron@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka logging-codfw cluster: Reboot kafka nodes
  • 15:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1185 (T344589)', diff saved to https://phabricator.wikimedia.org/P50654 and previous config saved to /var/cache/conftool/dbconfig/20230821-155950-ladsgroup.json
  • 15:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 15:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 15:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T344589)', diff saved to https://phabricator.wikimedia.org/P50653 and previous config saved to /var/cache/conftool/dbconfig/20230821-155927-ladsgroup.json
  • 15:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P50652 and previous config saved to /var/cache/conftool/dbconfig/20230821-155710-ladsgroup.json
  • 15:54 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx1001.wikimedia.org with reason: New kernel, T344587
  • 15:54 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx1001.wikimedia.org with reason: New kernel, T344587
  • 15:53 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 08m 18s)
  • 15:48 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx2001.wikimedia.org with reason: New kernel, T344587
  • 15:48 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx2001.wikimedia.org with reason: New kernel, T344587
  • 15:45 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 07m 20s)
  • 15:45 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs1009.eqiad.wmnet with reason: jnl export
  • 15:45 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs1009.eqiad.wmnet with reason: jnl export
  • 15:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P50651 and previous config saved to /var/cache/conftool/dbconfig/20230821-154420-ladsgroup.json
  • 15:43 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mirror1001.wikimedia.org with reason: New kernel, T344587
  • 15:43 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mirror1001.wikimedia.org with reason: New kernel, T344587
  • 15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P50650 and previous config saved to /var/cache/conftool/dbconfig/20230821-154203-ladsgroup.json
  • 15:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P50649 and previous config saved to /var/cache/conftool/dbconfig/20230821-153316-ladsgroup.json
  • 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P50648 and previous config saved to /var/cache/conftool/dbconfig/20230821-152914-ladsgroup.json
  • 15:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T344589)', diff saved to https://phabricator.wikimedia.org/P50647 and previous config saved to /var/cache/conftool/dbconfig/20230821-152657-ladsgroup.json
  • 15:26 vgutierrez: cleaning confd stale ncredir errors in config-master2001
  • 15:25 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 141081
  • 15:25 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 141081
  • 15:25 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 45356
  • 15:24 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 45356
  • 15:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 26801
  • 15:24 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 26801
  • 15:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12897
  • 15:23 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12897
  • 15:22 ladsgroup@deploy1002: Finished scap: Backport for Enable url shortener in sidebar in testwiki (T267921) (duration: 09m 46s)
  • 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T344589)', diff saved to https://phabricator.wikimedia.org/P50646 and previous config saved to /var/cache/conftool/dbconfig/20230821-151931-ladsgroup.json
  • 15:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P50645 and previous config saved to /var/cache/conftool/dbconfig/20230821-151810-ladsgroup.json
  • 15:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3314 (T344589)', diff saved to https://phabricator.wikimedia.org/P50644 and previous config saved to /var/cache/conftool/dbconfig/20230821-151805-ladsgroup.json
  • 15:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 15:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 15:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T344589)', diff saved to https://phabricator.wikimedia.org/P50643 and previous config saved to /var/cache/conftool/dbconfig/20230821-151740-ladsgroup.json
  • 15:14 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T344589)', diff saved to https://phabricator.wikimedia.org/P50642 and previous config saved to /var/cache/conftool/dbconfig/20230821-151408-ladsgroup.json
  • 15:14 ladsgroup@deploy1002: ladsgroup: Backport for Enable url shortener in sidebar in testwiki (T267921) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 15:13 btullis@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:dse-k8s-worker
  • 15:12 ladsgroup@deploy1002: Started scap: Backport for Enable url shortener in sidebar in testwiki (T267921)
  • 15:08 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host graphite2004.codfw.wmnet
  • 15:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T344589)', diff saved to https://phabricator.wikimedia.org/P50641 and previous config saved to /var/cache/conftool/dbconfig/20230821-150755-ladsgroup.json
  • 15:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 15:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 15:05 taavi@deploy1002: Finished scap: Backport for SpecialGlobalGroupMembership: Normalize usernames (T344495) (duration: 08m 47s)
  • 15:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P50640 and previous config saved to /var/cache/conftool/dbconfig/20230821-150304-ladsgroup.json
  • 15:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 15:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 15:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T344589)', diff saved to https://phabricator.wikimedia.org/P50639 and previous config saved to /var/cache/conftool/dbconfig/20230821-150244-ladsgroup.json
  • 15:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P50638 and previous config saved to /var/cache/conftool/dbconfig/20230821-150234-ladsgroup.json
  • 15:00 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host arclamp1001.eqiad.wmnet
  • 14:59 taavi@deploy1002: taavi and urbanecm: Continuing with sync
  • 14:58 taavi@deploy1002: taavi and urbanecm: Backport for SpecialGlobalGroupMembership: Normalize usernames (T344495) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 14:57 taavi@deploy1002: Started scap: Backport for SpecialGlobalGroupMembership: Normalize usernames (T344495)
  • 14:56 taavi@deploy1002: Finished scap: Backport for Revert "throttle: add rules for Wikimania 2023" (duration: 08m 11s)
  • 14:55 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite2004.codfw.wmnet
  • 14:55 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host arclamp1001.eqiad.wmnet
  • 14:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns3004.wikimedia.org
  • 14:49 taavi@deploy1002: taavi: Continuing with sync
  • 14:49 taavi@deploy1002: taavi: Backport for Revert "throttle: add rules for Wikimania 2023" synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 14:48 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts2001.codfw.wmnet
  • 14:48 taavi@deploy1002: Started scap: Backport for Revert "throttle: add rules for Wikimania 2023"
  • 14:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P50637 and previous config saved to /var/cache/conftool/dbconfig/20230821-144757-ladsgroup.json
  • 14:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P50636 and previous config saved to /var/cache/conftool/dbconfig/20230821-144738-ladsgroup.json
  • 14:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P50635 and previous config saved to /var/cache/conftool/dbconfig/20230821-144728-ladsgroup.json
  • 14:46 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host dns3004.wikimedia.org
  • 14:46 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns3003.wikimedia.org
  • 14:45 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host vrts2001.codfw.wmnet
  • 14:39 urandom: Upgrading aq2001/cassandra-b (canary) to Cassandra 4.1.1 — T339299
  • 14:38 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host dns3003.wikimedia.org
  • 14:35 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir3004.esams.wmnet
  • 14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T342617)', diff saved to https://phabricator.wikimedia.org/P50634 and previous config saved to /var/cache/conftool/dbconfig/20230821-143449-ladsgroup.json
  • 14:34 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir3003.esams.wmnet
  • 14:32 urandom: Upgrading aq2001/cassandra-a (canary) to Cassandra 4.1.1 — T339299
  • 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P50633 and previous config saved to /var/cache/conftool/dbconfig/20230821-143226-ladsgroup.json
  • 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T344589)', diff saved to https://phabricator.wikimedia.org/P50632 and previous config saved to /var/cache/conftool/dbconfig/20230821-143221-ladsgroup.json
  • 14:31 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir3004.esams.wmnet
  • 14:30 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir3003.esams.wmnet
  • 14:26 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host lvs3010.esams.wmnet
  • 14:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T344589)', diff saved to https://phabricator.wikimedia.org/P50631 and previous config saved to /var/cache/conftool/dbconfig/20230821-142448-ladsgroup.json
  • 14:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 14:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 14:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 14:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 14:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T344589)', diff saved to https://phabricator.wikimedia.org/P50630 and previous config saved to /var/cache/conftool/dbconfig/20230821-142408-ladsgroup.json
  • 14:23 hnowlan@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes2056
  • 14:23 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host lvs3008.esams.wmnet
  • 14:22 hnowlan@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes2056
  • 14:21 hnowlan@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes2055
  • 14:20 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3009.esams.wmnet
  • 14:19 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 14:19 hnowlan@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes2055
  • 14:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P50629 and previous config saved to /var/cache/conftool/dbconfig/20230821-141942-ladsgroup.json
  • 14:19 hnowlan@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1057
  • 14:19 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 14:18 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 14:18 hnowlan@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1057
  • 14:17 hnowlan@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1058
  • 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T344589)', diff saved to https://phabricator.wikimedia.org/P50628 and previous config saved to /var/cache/conftool/dbconfig/20230821-141719-ladsgroup.json
  • 14:16 hnowlan@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1058
  • 14:15 hnowlan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:15 hnowlan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hnowlan: updating records for reuse of thumbor servers for k8s nodes T343996 T343993 - hnowlan@cumin1001"
  • 14:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T344589)', diff saved to https://phabricator.wikimedia.org/P50627 and previous config saved to /var/cache/conftool/dbconfig/20230821-141504-ladsgroup.json
  • 14:14 hnowlan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hnowlan: updating records for reuse of thumbor servers for k8s nodes T343996 T343993 - hnowlan@cumin1001"
  • 14:14 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs3010.esams.wmnet
  • 14:13 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs3009.esams.wmnet
  • 14:11 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs3008.esams.wmnet
  • 14:11 hnowlan@cumin1001: START - Cookbook sre.dns.netbox
  • 14:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs[3008-3009].esams.wmnet with reason: rebooting for microcode updates
  • 14:10 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs[3008-3009].esams.wmnet with reason: rebooting for microcode updates
  • 14:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P50626 and previous config saved to /var/cache/conftool/dbconfig/20230821-140901-ladsgroup.json
  • 14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P50625 and previous config saved to /var/cache/conftool/dbconfig/20230821-140433-ladsgroup.json
  • 14:02 Amir1: killed cebwiki's dump gen on snapshot1010
  • 14:01 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 14:01 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 14:01 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 14:01 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 13:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P50624 and previous config saved to /var/cache/conftool/dbconfig/20230821-135958-ladsgroup.json
  • 13:55 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 13:55 claime: Re-enforcing limits and requests for mw-misc - T342748
  • 13:55 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P50623 and previous config saved to /var/cache/conftool/dbconfig/20230821-135355-ladsgroup.json
  • 13:53 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch2002.codfw.wmnet
  • 13:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P50622 and previous config saved to /var/cache/conftool/dbconfig/20230821-134949-ladsgroup.json
  • 13:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 13:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 13:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T342617)', diff saved to https://phabricator.wikimedia.org/P50621 and previous config saved to /var/cache/conftool/dbconfig/20230821-134926-ladsgroup.json
  • 13:48 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 13:47 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 13:47 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-stretch2002.codfw.wmnet
  • 13:47 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch2001.codfw.wmnet
  • 13:46 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 13:46 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 13:45 cgoubert@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 13:45 cgoubert@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 13:45 cgoubert@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 13:45 cgoubert@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 13:45 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P50620 and previous config saved to /var/cache/conftool/dbconfig/20230821-134452-ladsgroup.json
  • 13:44 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 13:44 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 13:43 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 13:43 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 13:42 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 13:42 claime: Enabling memory limit autocompute for all mw-on-k8s deployments - T342748
  • 13:41 btullis@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker
  • 13:40 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-stretch2001.codfw.wmnet
  • 13:39 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1112.eqiad.wmnet with OS bullseye
  • 13:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T344589)', diff saved to https://phabricator.wikimedia.org/P50618 and previous config saved to /var/cache/conftool/dbconfig/20230821-133849-ladsgroup.json
  • 13:35 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1111.eqiad.wmnet with OS bullseye
  • 13:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T344589)', diff saved to https://phabricator.wikimedia.org/P50617 and previous config saved to /var/cache/conftool/dbconfig/20230821-133113-ladsgroup.json
  • 13:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 13:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 13:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T344589)', diff saved to https://phabricator.wikimedia.org/P50616 and previous config saved to /var/cache/conftool/dbconfig/20230821-133048-ladsgroup.json
  • 13:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T344589)', diff saved to https://phabricator.wikimedia.org/P50615 and previous config saved to /var/cache/conftool/dbconfig/20230821-132945-ladsgroup.json
  • 13:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T344589)', diff saved to https://phabricator.wikimedia.org/P50614 and previous config saved to /var/cache/conftool/dbconfig/20230821-132341-ladsgroup.json
  • 13:23 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1003.wikimedia.org
  • 13:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T344589)', diff saved to https://phabricator.wikimedia.org/P50613 and previous config saved to /var/cache/conftool/dbconfig/20230821-132214-ladsgroup.json
  • 13:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 13:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 13:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T344589)', diff saved to https://phabricator.wikimedia.org/P50612 and previous config saved to /var/cache/conftool/dbconfig/20230821-132150-ladsgroup.json
  • 13:18 zabe@deploy1002: Finished scap: Backport for SiteMatrix config: Add actual (non-deprecated) language code for deprecated language codes (T172035 T111876) (duration: 07m 53s)
  • 13:17 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3075.esams.wmnet
  • 13:16 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab1003.wikimedia.org
  • 13:16 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1112.eqiad.wmnet with reason: host reimage
  • 13:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P50611 and previous config saved to /var/cache/conftool/dbconfig/20230821-131542-ladsgroup.json
  • 13:13 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1111.eqiad.wmnet with reason: host reimage
  • 13:12 zabe@deploy1002: zabe and wsung: Continuing with sync
  • 13:12 zabe@deploy1002: zabe and wsung: Backport for SiteMatrix config: Add actual (non-deprecated) language code for deprecated language codes (T172035 T111876) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:12 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3075.esams.wmnet
  • 13:11 zabe@deploy1002: Started scap: Backport for SiteMatrix config: Add actual (non-deprecated) language code for deprecated language codes (T172035 T111876)
  • 13:10 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1112.eqiad.wmnet with reason: host reimage
  • 13:10 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1111.eqiad.wmnet with reason: host reimage
  • 13:10 fabfur: disabling puppet and rebooting cp3075 to test network configuration (T344604)
  • 13:09 klausman@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-ctrl2002.codfw.wmnet
  • 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P50610 and previous config saved to /var/cache/conftool/dbconfig/20230821-130643-ladsgroup.json
  • 13:04 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab2002.wikimedia.org
  • 13:03 klausman@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-ctrl2002.codfw.wmnet
  • 13:02 klausman@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-ctrl2001.codfw.wmnet
  • 13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T342617)', diff saved to https://phabricator.wikimedia.org/P50609 and previous config saved to /var/cache/conftool/dbconfig/20230821-130118-ladsgroup.json
  • 13:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 13:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 13:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P50608 and previous config saved to /var/cache/conftool/dbconfig/20230821-130036-ladsgroup.json
  • 12:58 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab2002.wikimedia.org
  • 12:58 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch1002.eqiad.wmnet
  • 12:58 klausman@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-ctrl2001.codfw.wmnet
  • 12:57 klausman@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-staging-worker
  • 12:56 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1112.eqiad.wmnet with OS bullseye
  • 12:55 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1111.eqiad.wmnet with OS bullseye
  • 12:53 topranks: setting Lumen transport esams eqiad to default OSPF cost of 800 (bring circuit into normal usage)
  • 12:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P50607 and previous config saved to /var/cache/conftool/dbconfig/20230821-125137-ladsgroup.json
  • 12:51 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1005.eqiad.wmnet
  • 12:51 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-stretch1002.eqiad.wmnet
  • 12:51 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch1001.eqiad.wmnet
  • 12:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T344589)', diff saved to https://phabricator.wikimedia.org/P50606 and previous config saved to /var/cache/conftool/dbconfig/20230821-124529-ladsgroup.json
  • 12:44 jelto@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host gitlab2002.wikimedia.org
  • 12:43 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-stretch1001.eqiad.wmnet
  • 12:42 topranks: enabling BGP over Lumen transport cr2-eqiad to cr1-esams
  • 12:41 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
  • 12:41 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1005.eqiad.wmnet
  • 12:41 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1004.eqiad.wmnet
  • 12:40 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-jumbo1015.eqiad.wmnet
  • 12:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T344589)', diff saved to https://phabricator.wikimedia.org/P50605 and previous config saved to /var/cache/conftool/dbconfig/20230821-123906-ladsgroup.json
  • 12:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 12:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 12:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T344589)', diff saved to https://phabricator.wikimedia.org/P50604 and previous config saved to /var/cache/conftool/dbconfig/20230821-123631-ladsgroup.json
  • 12:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 12:35 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
  • 12:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 12:35 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-jumbo1015.eqiad.wmnet
  • 12:35 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-jumbo1014.eqiad.wmnet
  • 12:31 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1004.eqiad.wmnet
  • 12:31 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1003.eqiad.wmnet
  • 12:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1130 (T344589)', diff saved to https://phabricator.wikimedia.org/P50603 and previous config saved to /var/cache/conftool/dbconfig/20230821-123123-ladsgroup.json
  • 12:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 12:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 12:29 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-jumbo1014.eqiad.wmnet
  • 12:29 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-jumbo1013.eqiad.wmnet
  • 12:23 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-jumbo1013.eqiad.wmnet
  • 12:23 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-jumbo1012.eqiad.wmnet
  • 12:22 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1003.eqiad.wmnet
  • 12:22 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1002.eqiad.wmnet
  • 12:16 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-jumbo1012.eqiad.wmnet
  • 12:16 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-jumbo1011.eqiad.wmnet
  • 12:13 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
  • 12:13 klausman@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker
  • 12:12 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1002.eqiad.wmnet
  • 12:12 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1001.eqiad.wmnet
  • 12:09 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-jumbo1011.eqiad.wmnet
  • 12:09 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-jumbo1010.eqiad.wmnet
  • 12:07 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
  • 12:06 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
  • 12:04 klausman@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:ml-cache-codfw
  • 12:03 zabe@deploy1002: Finished scap: Backport for Revert "Revert "add su namespace translations"" (duration: 08m 13s)
  • 12:03 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1001.eqiad.wmnet
  • 12:02 klausman@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Restart to pick up OpenJDK 11 security updates - klausman@cumin1001
  • 12:01 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-jumbo1010.eqiad.wmnet
  • 12:00 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
  • 11:57 zabe@deploy1002: zabe: Continuing with sync
  • 11:57 zabe@deploy1002: zabe: Backport for Revert "Revert "add su namespace translations"" synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 11:56 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.wikimedia.org on all recursors
  • 11:56 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache config-master.wikimedia.org on all recursors
  • 11:56 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab2002.wikimedia.org
  • 11:56 jbond@cumin1001: END (ERROR) - Cookbook sre.dns.wipe-cache (exit_code=97) puppetboard.wikimedia.org on all recursors
  • 11:56 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard.wikimedia.org on all recursors
  • 11:55 zabe@deploy1002: Started scap: Backport for Revert "Revert "add su namespace translations""
  • 11:55 moritzm: installing openjdk-11 security updates
  • 11:55 jbond: switch config-master.w.o to config-master hosts
  • 11:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-mariadb1002.eqiad.wmnet
  • 11:44 klausman@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Restart to pick up OpenJDK 11 security updates - klausman@cumin1001
  • 11:40 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-mariadb1002.eqiad.wmnet
  • 11:35 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts thumbor[2003-2006].codfw.wmnet
  • 11:35 hnowlan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:35 hnowlan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: thumbor[2003-2006].codfw.wmnet decommissioned, removing all IPs except the asset tag one - hnowlan@cumin1001"
  • 11:34 klausman@cumin1001: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:ml-cache-codfw
  • 11:34 klausman@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:ml-cache-eqiad
  • 11:33 hnowlan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: thumbor[2003-2006].codfw.wmnet decommissioned, removing all IPs except the asset tag one - hnowlan@cumin1001"
  • 11:32 urbanecm@deploy1002: Finished scap: Backport for Revert "Growth: Temporarily disable link-recommendation FE on arwiki" (T316079) (duration: 08m 42s)
  • 11:31 hnowlan@cumin1001: START - Cookbook sre.dns.netbox
  • 11:25 urbanecm@deploy1002: urbanecm: Continuing with sync
  • 11:25 urbanecm@deploy1002: urbanecm: Backport for Revert "Growth: Temporarily disable link-recommendation FE on arwiki" (T316079) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 11:24 btullis@cumin1001: END (PASS) - Cookbook sre.presto.reboot-workers (exit_code=0) for Presto analytics cluster: Reboot Presto nodes
  • 11:23 urbanecm@deploy1002: Started scap: Backport for Revert "Growth: Temporarily disable link-recommendation FE on arwiki" (T316079)
  • 11:23 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 11:23 btullis@cumin1001: Added views for new wiki: blkwiktionary T343541
  • 11:12 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1110.eqiad.wmnet with OS bullseye
  • 11:07 klausman@cumin1001: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:ml-cache-eqiad
  • 11:04 klausman@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Reboot to activate microcode and security updates for T344587 - klausman@cumin1001
  • 11:04 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 11:04 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts thumbor[2003-2006].codfw.wmnet
  • 11:03 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 11:03 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 11:02 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts thumbor[1001-1002,1005-1006].eqiad.wmnet
  • 11:02 hnowlan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:02 hnowlan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: thumbor[1001-1002,1005-1006].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - hnowlan@cumin1001"
  • 11:02 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 11:02 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-mariadb1001.eqiad.wmnet
  • 11:02 claime: Enabling memory limit autocompute for mw-api-int - T342748
  • 11:01 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.reboot-runner (exit_code=0) rolling reboot on A:gitlab-runner
  • 11:01 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 11:01 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 11:00 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 11:00 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 11:00 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 11:00 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 11:00 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 11:00 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 11:00 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 11:00 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 11:00 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 11:00 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 11:00 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 11:00 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 11:00 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 11:00 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 11:00 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:00 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 10:59 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 10:59 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:59 claime: Deploying memory limit autocompute for mw-on-k8s - T342748
  • 10:59 hnowlan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: thumbor[1001-1002,1005-1006].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - hnowlan@cumin1001"
  • 10:57 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-mariadb1001.eqiad.wmnet
  • 10:57 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 10:56 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3073.esams.wmnet
  • 10:51 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1110.eqiad.wmnet with reason: host reimage
  • 10:48 claime: mw-on-k8s up to date with bare metal
  • 10:48 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1110.eqiad.wmnet with reason: host reimage
  • 10:48 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 10:48 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3073.esams.wmnet
  • 10:48 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3072.esams.wmnet
  • 10:47 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 10:47 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 10:47 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 10:47 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 10:46 klausman@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Reboot to activate microcode and security updates for T344587 - klausman@cumin1001
  • 10:46 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 10:46 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 10:45 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 10:45 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 10:45 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache1001.eqiad.wmnet
  • 10:44 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 10:44 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 10:44 hnowlan@cumin1001: START - Cookbook sre.dns.netbox
  • 10:43 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 10:43 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 10:43 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-coord1004.eqiad.wmnet
  • 10:43 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 10:42 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 10:42 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 10:42 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 10:42 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 10:41 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 10:41 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:41 claime: Redeploying mw-on-k8s
  • 10:39 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache1001.eqiad.wmnet
  • 10:38 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3072.esams.wmnet
  • 10:38 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cp3071.esams.wmnet
  • 10:38 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 10:38 btullis@cumin1001: Added views for new wiki: suwikisource T343547
  • 10:38 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1109.eqiad.wmnet with OS bullseye
  • 10:36 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-coord1004.eqiad.wmnet
  • 10:36 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-coord1003.eqiad.wmnet
  • 10:33 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1110.eqiad.wmnet with OS bullseye
  • 10:29 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-coord1003.eqiad.wmnet
  • 10:27 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts thumbor[1001-1002,1005-1006].eqiad.wmnet
  • 10:24 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3071.esams.wmnet
  • 10:24 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3070.esams.wmnet
  • 10:20 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1003.eqiad.wmnet
  • 10:15 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1109.eqiad.wmnet with reason: host reimage
  • 10:15 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3070.esams.wmnet
  • 10:15 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3069.esams.wmnet
  • 10:14 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1003.eqiad.wmnet
  • 10:13 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 10:12 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1109.eqiad.wmnet with reason: host reimage
  • 10:10 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1002.eqiad.wmnet
  • 10:06 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3069.esams.wmnet
  • 10:06 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cp3068.esams.wmnet
  • 10:03 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1002.eqiad.wmnet
  • 10:01 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 10:01 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 10:01 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 10:00 jelto@cumin1001: START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:gitlab-runner
  • 09:59 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab2003.wikimedia.org
  • 09:58 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1109.eqiad.wmnet with OS bullseye
  • 09:52 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab2003.wikimedia.org
  • 09:52 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3068.esams.wmnet
  • 09:52 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3067.esams.wmnet
  • 09:51 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1001.eqiad.wmnet
  • 09:51 jayme: restarted prometheus@k8s on prometheus100[56] - T343529
  • 09:50 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 09:50 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 09:50 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 09:50 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 09:48 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 09:48 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 09:47 btullis@cumin1001: START - Cookbook sre.presto.reboot-workers for Presto analytics cluster: Reboot Presto nodes
  • 09:46 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 09:46 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 09:45 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1001.eqiad.wmnet
  • 09:45 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 09:44 cgoubert@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:44 cgoubert@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 09:42 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3067.esams.wmnet
  • 09:42 cgoubert@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:41 cgoubert@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:40 cgoubert@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:40 cgoubert@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:36 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet
  • 09:36 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 09:36 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 09:36 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 09:36 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 09:36 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 09:26 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3074.esams.wmnet
  • 09:25 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 09:25 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 09:25 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 09:25 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 09:25 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 09:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet
  • 09:24 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3066.esams.wmnet
  • 09:24 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 09:23 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 09:22 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 09:21 cgoubert@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:20 cgoubert@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 09:19 cgoubert@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:18 cgoubert@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:17 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 09:17 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 09:17 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3074.esams.wmnet
  • 09:15 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3066.esams.wmnet
  • 09:05 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1108.eqiad.wmnet with OS bullseye
  • 09:04 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 08:56 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 08:54 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 08:53 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 08:52 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 08:52 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:51 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 08:51 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 08:50 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 08:44 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1108.eqiad.wmnet with reason: host reimage
  • 08:43 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 08:43 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 08:43 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 08:43 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 08:42 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 08:42 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 08:41 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1108.eqiad.wmnet with reason: host reimage
  • 08:31 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 08:27 godog: restart prometheus@beta - T344582
  • 08:26 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1108.eqiad.wmnet with OS bullseye
  • 08:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15600
  • 08:19 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 15600
  • 08:10 klausman: Draining ml-serve2006 for Kubelet partition resize
  • 08:02 klausman: Draining ml-serve2005 for Kubelet partition resize
  • 07:49 klausman: Draining ml-serve2004 for Kubelet partition resize
  • 07:03 godog: prometheus ops eqiad +300G on the filesystem
  • 07:01 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: ganeti3003.esams.wmnet
  • 07:01 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: ganeti3003.esams.wmnet
  • 07:01 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: ganeti3002.esams.wmnet
  • 07:01 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: ganeti3002.esams.wmnet
  • 07:01 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: ganeti3001.esams.wmnet
  • 07:01 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: ganeti3001.esams.wmnet
  • 06:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Update Homer wheels - ayounsi@cumin1001
  • 06:53 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Update Homer wheels - ayounsi@cumin1001
  • 06:38 zabe@deploy1002: Started scap: Backport for add su namespace translations (T344314)
  • 06:30 moritzm: installing Linux 5.10.191 kernel updates
  • 06:28 kart_: Update MinT to 2023-08-14-091403-production (T336683)
  • 06:27 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 06:22 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 06:19 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 06:13 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 06:12 zabe@deploy1002: Started scap: Backport for add su namespace translations (T344314)
  • 06:09 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 06:06 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 01:37 ryankemper: [WDQS] `ryankemper@wdqs1006:~$ sudo systemctl restart wdqs-blazegraph wdqs-categories` (free allocators decreasing rapidly -> solution is a simple restart of query service on host)

2023-08-19

  • 08:38 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 29 hosts with reason: Downtime esams hosts prior to migration week.
  • 08:38 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 29 hosts with reason: Downtime esams hosts prior to migration week.
  • 08:37 topranks: downtiming esams hosts ahead of core router (cr1-esams) reboot T344546
  • 08:26 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 16 hosts with reason: Downtime esams hosts prior to cr1-esams reboot
  • 08:26 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 16 hosts with reason: Downtime esams hosts prior to cr1-esams reboot

2023-08-18

  • 18:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs[3008-3009].esams.wmnet
  • 18:09 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs[3008-3009].esams.wmnet
  • 18:08 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3009.esams.wmnet
  • 18:02 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs3009.esams.wmnet
  • 18:01 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3010.esams.wmnet
  • 17:54 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs3010.esams.wmnet
  • 17:50 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1010']
  • 17:49 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1010']
  • 17:40 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs[3008-3009].esams.wmnet with reason: rebooting to flush broken IPv6 routes
  • 17:40 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs[3008-3009].esams.wmnet with reason: rebooting to flush broken IPv6 routes
  • 17:38 sukhe: reboot LVSes in esams to flush broken IPv6 routes
  • 17:37 topranks: bouncing OSPF on cr1-esams to attempt to resolve BFD/OSPF glitch
  • 17:25 inflatador: bking@ganeti1024 shutting off flink-zk1001 to check alerting T341792
  • 17:18 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for flink-zk[2001,2003].codfw.wmnet,flink-zk[1001-1003].eqiad.wmnet
  • 17:18 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for flink-zk[2001,2003].codfw.wmnet,flink-zk[1001-1003].eqiad.wmnet
  • 17:18 inflatador: bking@cumin1001 temporarily enabling alerts for flink-zk hosts to see if they work T341792
  • 17:13 bking@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts flink-zk2002.codfw.wmnet
  • 17:13 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:13 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flink-zk2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin1001"
  • 17:12 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flink-zk2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin1001"
  • 16:56 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 16:51 bking@cumin1001: START - Cookbook sre.hosts.decommission for hosts flink-zk2002.codfw.wmnet
  • 16:36 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1010.eqiad.wmnet with OS bullseye
  • 16:30 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1010.eqiad.wmnet with OS bullseye
  • 15:43 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2025.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding new host kubernetes2025 to CODFW - jhancock@cumin2002"
  • 15:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding new host kubernetes2025 to CODFW - jhancock@cumin2002"
  • 15:28 sukhe: ipvsadm -Dt IPs in 91.198.174.0/24 IPs from A:lvs and A:esams
  • 15:28 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:10 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 15:09 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: sync
  • 14:29 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
  • 14:24 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
  • 11:31 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 11:30 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new bastion - jmm@cumin2002"
  • 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test2002.wikimedia.org
  • 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new bastion - jmm@cumin2002"
  • 10:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2002.wikimedia.org
  • 10:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast3007.wikimedia.org
  • 10:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast3007.wikimedia.org with OS bookworm
  • 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast3007.wikimedia.org with reason: host reimage
  • 10:32 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast3007.wikimedia.org with reason: host reimage
  • 10:11 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast3007.wikimedia.org with OS bookworm
  • 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast3007.wikimedia.org - jmm@cumin2002"
  • 10:07 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast3007.wikimedia.org - jmm@cumin2002"
  • 10:06 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) bast3007.wikimedia.org on all recursors
  • 10:06 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast3007.wikimedia.org on all recursors
  • 10:06 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:06 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast3007.wikimedia.org - jmm@cumin2002"
  • 10:05 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast3007.wikimedia.org - jmm@cumin2002"
  • 10:03 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 10:03 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast3007.wikimedia.org
  • 10:03 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:03 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add reverses for new link addressing esams - cmooney@cumin1001"
  • 10:01 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add reverses for new link addressing esams - cmooney@cumin1001"
  • 09:56 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 09:55 cmooney@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 09:55 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 09:54 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 09:53 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 09:53 moritzm: upgrade idp-test to OpenJDK 11.0.20
  • 09:52 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:52 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add reverses for new link addressing esams - cmooney@cumin1001"
  • 09:51 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 09:50 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 09:50 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add reverses for new link addressing esams - cmooney@cumin1001"
  • 09:44 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 09:44 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:44 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add reverses for new link addressing esams - cmooney@cumin1001"
  • 09:32 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add reverses for new link addressing esams - cmooney@cumin1001"
  • 09:31 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 09:20 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 09:20 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 09:20 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 09:20 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe
  • 09:19 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 09:13 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
  • 09:13 Emperor: roll-restart thanos swift frontends to add user T342620
  • 08:36 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti-test2002.codfw.wmnet
  • 08:25 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 08:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
  • 08:24 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 08:24 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 08:23 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 08:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
  • 08:21 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:21 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 08:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
  • 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1002.wikimedia.org
  • 08:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1002.wikimedia.org
  • 07:47 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:46 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 07:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
  • 07:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 06:58 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:57 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 06:56 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 06:51 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 05:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host bast3007.wikimedia.org
  • 05:48 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 05:43 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 05:43 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast3007.wikimedia.org
  • 01:42 taavi@deploy1002: Finished scap: Backport for Set WRITE_BOTH for OAuth multiple devices to checkuserwiki (T242031) (duration: 07m 48s)
  • 01:36 taavi@deploy1002: taavi: Continuing with sync
  • 01:36 taavi@deploy1002: taavi: Backport for Set WRITE_BOTH for OAuth multiple devices to checkuserwiki (T242031) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 01:34 taavi@deploy1002: Started scap: Backport for Set WRITE_BOTH for OAuth multiple devices to checkuserwiki (T242031)

2023-08-17

  • 23:27 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 22:22 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:54 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 21:41 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:40 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124 (duration: 00m 16s)
  • 21:40 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124
  • 21:34 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:34 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1011.eqiad.wmnet with OS bullseye
  • 21:21 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 21:21 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:11 ebernhardson@deploy1002: Finished deploy [airflow-dags/search@1d60a29]: make wikibase ttl imports to hdfs world readable (duration: 00m 11s)
  • 21:11 ebernhardson@deploy1002: Started deploy [airflow-dags/search@1d60a29]: make wikibase ttl imports to hdfs world readable
  • 21:10 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 21:09 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:03 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 21:03 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 20:58 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1011.eqiad.wmnet with reason: host reimage
  • 20:56 urandom: Rolling Cassandra restart eqiad/d (RESTBase cluster) — T339298
  • 20:55 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1011.eqiad.wmnet with reason: host reimage
  • 20:50 thcipriani@deploy1002: Finished scap: Backport for Revert "Add newline to README for backport training" (duration: 10m 43s)
  • 20:49 urandom: Rolling Cassandra restart eqiad/b (RESTBase cluster) — T339298
  • 20:48 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 20:46 sukhe: restart pybal on lvs3008
  • 20:44 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1010.eqiad.wmnet with OS bullseye
  • 20:44 thcipriani@deploy1002: thcipriani: Continuing with sync
  • 20:41 sukhe: restart pybal on lvs3010
  • 20:41 thcipriani@deploy1002: thcipriani: Backport for Revert "Add newline to README for backport training" synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:40 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1011.eqiad.wmnet with OS bullseye
  • 20:40 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1010.eqiad.wmnet with OS bullseye
  • 20:40 thcipriani@deploy1002: Started scap: Backport for Revert "Add newline to README for backport training"
  • 20:34 thcipriani@deploy1002: Finished scap: Backport for Add newline to README for backport training (duration: 13m 29s)
  • 20:30 urandom: Rolling Cassandra restart eqiad/a (RESTBase cluster) — T339298
  • 20:28 thcipriani@deploy1002: thcipriani: Continuing with sync
  • 20:22 thcipriani@deploy1002: thcipriani: Backport for Add newline to README for backport training synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:21 thcipriani@deploy1002: Started scap: Backport for Add newline to README for backport training
  • 20:20 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:20 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add reverses for Lumen transport esams eqiad - cmooney@cumin1001"
  • 20:20 urandom: Rolling Cassandra restart codfw/d (RESTBase cluster) — T339298
  • 20:19 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add reverses for Lumen transport esams eqiad - cmooney@cumin1001"
  • 20:12 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum4001.ulsfo.wmnet with OS bookworm
  • 20:06 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 20:03 urandom: Rolling Cassandra restart codfw/c (RESTBase cluster) — T339298
  • 19:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir3004.esams.wmnet with OS bullseye
  • 19:25 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum4001.ulsfo.wmnet with reason: host reimage
  • 19:22 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum4001.ulsfo.wmnet with reason: host reimage
  • 19:22 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5002.eqsin.wmnet with OS bookworm
  • 19:20 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5001.eqsin.wmnet with OS bookworm
  • 19:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir3004.esams.wmnet with reason: host reimage
  • 19:07 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir3004.esams.wmnet with reason: host reimage
  • 19:02 brett@cumin2002: START - Cookbook sre.hosts.reimage for host durum4001.ulsfo.wmnet with OS bookworm
  • 19:01 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host durum4001.ulsfo.wmnet with OS bookworm
  • 18:54 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum6002.drmrs.wmnet with OS bookworm
  • 18:54 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir3003.esams.wmnet with OS bullseye
  • 18:53 urandom: Rolling Cassandra restart codfw/b — T339298
  • 18:51 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir3004.esams.wmnet with OS bullseye
  • 18:49 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum6001.drmrs.wmnet with OS bookworm
  • 18:41 sukhe: force agent run on A:acmechief for CR 950005
  • 18:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum1002.eqiad.wmnet with OS bookworm
  • 18:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5002.eqsin.wmnet with reason: host reimage
  • 18:33 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5001.eqsin.wmnet with reason: host reimage
  • 18:30 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5002.eqsin.wmnet with reason: host reimage
  • 18:29 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5001.eqsin.wmnet with reason: host reimage
  • 18:26 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum4002.ulsfo.wmnet with OS bookworm
  • 18:25 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum2001.codfw.wmnet with OS bookworm
  • 18:21 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum2002.codfw.wmnet with OS bookworm
  • 18:17 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir3003.esams.wmnet with reason: host reimage
  • 18:15 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
  • 18:13 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum1001.eqiad.wmnet with OS bookworm
  • 18:12 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum6002.drmrs.wmnet with reason: host reimage
  • 18:12 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk2003.codfw.wmnet
  • 18:12 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host flink-zk2003.codfw.wmnet with OS bookworm
  • 18:12 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir3003.esams.wmnet with reason: host reimage
  • 18:10 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum4002.ulsfo.wmnet with reason: host reimage
  • 18:09 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.22 refs T343724
  • 18:07 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
  • 18:07 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum6002.drmrs.wmnet with reason: host reimage
  • 18:07 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum4002.ulsfo.wmnet with reason: host reimage
  • 18:04 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum2002.codfw.wmnet with reason: host reimage
  • 18:02 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum2001.codfw.wmnet with reason: host reimage
  • 17:57 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir3003.esams.wmnet with OS bullseye
  • 17:57 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum2002.codfw.wmnet with reason: host reimage
  • 17:56 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum2001.codfw.wmnet with reason: host reimage
  • 17:55 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum1002.eqiad.wmnet with reason: host reimage
  • 17:52 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum1002.eqiad.wmnet with reason: host reimage
  • 17:50 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum1001.eqiad.wmnet with reason: host reimage
  • 17:47 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum1001.eqiad.wmnet with reason: host reimage
  • 17:45 brett@cumin2002: START - Cookbook sre.hosts.reimage for host durum6002.drmrs.wmnet with OS bookworm
  • 17:45 brett@cumin2002: START - Cookbook sre.hosts.reimage for host durum6001.drmrs.wmnet with OS bookworm
  • 17:45 brett@cumin2002: START - Cookbook sre.hosts.reimage for host durum5002.eqsin.wmnet with OS bookworm
  • 17:45 brett@cumin2002: START - Cookbook sre.hosts.reimage for host durum5001.eqsin.wmnet with OS bookworm
  • 17:45 brett@cumin2002: START - Cookbook sre.hosts.reimage for host durum4002.ulsfo.wmnet with OS bookworm
  • 17:44 brett@cumin2002: START - Cookbook sre.hosts.reimage for host durum4001.ulsfo.wmnet with OS bookworm
  • 17:39 brett@cumin2002: START - Cookbook sre.hosts.reimage for host durum2002.codfw.wmnet with OS bookworm
  • 17:39 brett@cumin2002: START - Cookbook sre.hosts.reimage for host durum2001.codfw.wmnet with OS bookworm
  • 17:39 brett@cumin2002: START - Cookbook sre.hosts.reimage for host durum1002.eqiad.wmnet with OS bookworm
  • 17:37 brett@cumin2002: START - Cookbook sre.hosts.reimage for host durum1001.eqiad.wmnet with OS bookworm
  • 17:25 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 17:24 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 17:23 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 17:23 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 17:23 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 17:22 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 17:22 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 17:21 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 17:21 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 17:20 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 17:19 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 17:18 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 17:18 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 17:17 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 17:17 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 17:16 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 17:16 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 17:15 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 17:15 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 17:14 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 17:11 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 17:10 bd808@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 17:10 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 17:10 bd808@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 17:10 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 17:09 bd808@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 17:09 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 17:09 bd808@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 17:08 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 17:08 bd808@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 16:54 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk2003.codfw.wmnet with OS bookworm
  • 16:54 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk2003.codfw.wmnet - bking@cumin1001"
  • 16:53 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk2003.codfw.wmnet - bking@cumin1001"
  • 16:53 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk2003.codfw.wmnet on all recursors
  • 16:53 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk2003.codfw.wmnet on all recursors
  • 16:53 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:52 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2003.codfw.wmnet - bking@cumin1001"
  • 16:51 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2003.codfw.wmnet - bking@cumin1001"
  • 16:48 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 16:48 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk2003.codfw.wmnet
  • 16:48 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts flink-zk2002.codfw.wmnet
  • 16:47 bking@cumin1001: START - Cookbook sre.hosts.decommission for hosts flink-zk2002.codfw.wmnet
  • 16:14 sukhe: force agent run on A:lvs and A:esams
  • 16:11 XioNoX: merging Puppet change 949934 - Remove all mentions of old-esams, replace with new esams
  • 16:02 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host flink-zk2002.codfw.wmnet with OS bookworm
  • 16:00 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=97) for new host doh3003.wikimedia.org
  • 16:00 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host doh3003.wikimedia.org with OS bullseye
  • 15:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:54 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 15:50 sukhe: sukhe@alert1001:~$ sudo systemctl reload icinga.service
  • 15:35 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@2d3a0b7] (releasing): (no justification provided) (duration: 00m 43s)
  • 15:34 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@2d3a0b7] (releasing): (no justification provided)
  • 15:34 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum3004.esams.wmnet
  • 15:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum3004.esams.wmnet with OS bullseye
  • 15:26 jnuche@deploy1002: Installation of scap version "4.58.0" completed for 596 hosts
  • 15:26 jnuche@deploy1002: Installing scap version "4.58.0" for 596 hosts
  • 15:22 jnuche@deploy1002: Installing scap version "4.58.0" for 597 hosts
  • 15:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum3004.esams.wmnet with reason: host reimage
  • 15:17 urbanecm@deploy1002: Finished scap: Backport for suwikisource remove NamespaceAliases and ExtraNamespaces for Page and Index namespace (T344314), Growth: Temporarily disable link-recommendation FE on arwiki (T316079) (duration: 14m 56s)
  • 15:17 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum3004.esams.wmnet with reason: host reimage
  • 15:10 urbanecm@deploy1002: urbanecm and anzx: Continuing with sync
  • 15:09 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host doh3003.wikimedia.org with OS bullseye
  • 15:09 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM doh3003.wikimedia.org - jmm@cumin2002"
  • 15:08 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM doh3003.wikimedia.org - jmm@cumin2002"
  • 15:07 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doh3003.wikimedia.org on all recursors
  • 15:07 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache doh3003.wikimedia.org on all recursors
  • 15:07 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:07 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh3003.wikimedia.org - jmm@cumin2002"
  • 15:07 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh3003.wikimedia.org - jmm@cumin2002"
  • 15:04 urbanecm@deploy1002: urbanecm and anzx: Backport for suwikisource remove NamespaceAliases and ExtraNamespaces for Page and Index namespace (T344314), Growth: Temporarily disable link-recommendation FE on arwiki (T316079) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (ac
  • 15:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum3004.esams.wmnet with OS bullseye
  • 15:03 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 15:03 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host doh3003.wikimedia.org
  • 15:02 urbanecm@deploy1002: Started scap: Backport for suwikisource remove NamespaceAliases and ExtraNamespaces for Page and Index namespace (T344314), Growth: Temporarily disable link-recommendation FE on arwiki (T316079)
  • 15:00 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum3004.esams.wmnet - jmm@cumin2002"
  • 14:59 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum3004.esams.wmnet - jmm@cumin2002"
  • 14:59 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum3004.esams.wmnet on all recursors
  • 14:59 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum3004.esams.wmnet on all recursors
  • 14:59 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:59 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum3004.esams.wmnet - jmm@cumin2002"
  • 14:57 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 14:57 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 14:57 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 14:57 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 14:57 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 14:56 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 14:56 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 14:55 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 14:55 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 14:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum3003.esams.wmnet
  • 14:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum3003.esams.wmnet with OS bullseye
  • 14:54 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 14:54 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 14:52 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 14:52 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 14:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum3004.esams.wmnet - jmm@cumin2002"
  • 14:51 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 14:51 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 14:49 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 14:49 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:49 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:49 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:49 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:49 claime: Re-deploying mw-on-k8s T344438
  • 14:49 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 14:49 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum3004.esams.wmnet
  • 14:48 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:48 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:48 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 14:47 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk2002.codfw.wmnet with OS bookworm
  • 14:46 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:46 cgoubert@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:45 cgoubert@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 14:45 cgoubert@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:44 claime: Rolling back 949583 for T344438
  • 14:44 cgoubert@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum3003.esams.wmnet with reason: host reimage
  • 14:37 oblivian@deploy1002: Finished scap: (no justification provided) (duration: 26m 03s)
  • 14:37 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum3003.esams.wmnet with reason: host reimage
  • 14:36 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 14:34 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ncredir3004.esams.wmnet
  • 14:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir3004.esams.wmnet with OS bullseye
  • 14:29 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 14:29 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 14:29 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 14:21 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs2007.codfw.wmnet with reason: canary for T342361
  • 14:21 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs2007.codfw.wmnet with reason: canary for T342361
  • 14:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir3004.esams.wmnet with reason: host reimage
  • 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum3003.esams.wmnet with OS bullseye
  • 14:17 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir3004.esams.wmnet with reason: host reimage
  • 14:11 oblivian@deploy1002: Started scap: (no justification provided)
  • 14:06 oblivian@deploy1002: sync-world aborted: (no justification provided) (duration: 00m 46s)
  • 14:05 oblivian@deploy1002: Started scap: (no justification provided)
  • 14:05 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum3003.esams.wmnet - jmm@cumin2002"
  • 14:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum3003.esams.wmnet - jmm@cumin2002"
  • 14:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum3003.esams.wmnet on all recursors
  • 14:04 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum3003.esams.wmnet on all recursors
  • 14:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum3003.esams.wmnet - jmm@cumin2002"
  • 14:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum3003.esams.wmnet - jmm@cumin2002"
  • 14:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir3004.esams.wmnet with OS bullseye
  • 13:59 inflatador: bking@cumin1001 'disabling puppet on wcqs/wdqs to test 949503'
  • 13:59 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir3004.esams.wmnet - jmm@cumin2002"
  • 13:58 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir3004.esams.wmnet - jmm@cumin2002"
  • 13:58 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ncredir3004.esams.wmnet on all recursors
  • 13:58 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ncredir3004.esams.wmnet on all recursors
  • 13:58 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:58 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir3004.esams.wmnet - jmm@cumin2002"
  • 13:57 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 13:57 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum3003.esams.wmnet
  • 13:50 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir3004.esams.wmnet - jmm@cumin2002"
  • 13:47 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ncredir3004.esams.wmnet
  • 13:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ncredir3003.esams.wmnet
  • 13:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir3003.esams.wmnet with OS bullseye
  • 13:45 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 13:44 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: sync
  • 13:40 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
  • 13:36 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
  • 13:36 effie: pooling kartotherian (maps) on codfw - T344324
  • 13:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir3003.esams.wmnet with reason: host reimage
  • 13:31 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 13:28 urbanecm@deploy1002: Finished scap: Backport for cross-wiki userrights: Add SpecialUserRights::getDisplayUsername (T344391 T255309), revalidateLinkRecommendations: Make it possible to revalidate based on score (T316079), revalidateLinkRecommendations: Make it possible to revalidate based on score (T316079) (duration: 05m 46s)
  • 13:27 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir3003.esams.wmnet with reason: host reimage
  • 13:24 urbanecm@deploy1002: urbanecm: Continuing with sync
  • 13:23 urbanecm@deploy1002: urbanecm: Backport for cross-wiki userrights: Add SpecialUserRights::getDisplayUsername (T344391 T255309), revalidateLinkRecommendations: Make it possible to revalidate based on score (T316079), revalidateLinkRecommendations: Make it possible to revalidate based on score (T316079) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.c
  • 13:23 urbanecm@deploy1002: Started scap: Backport for cross-wiki userrights: Add SpecialUserRights::getDisplayUsername (T344391 T255309), revalidateLinkRecommendations: Make it possible to revalidate based on score (T316079), revalidateLinkRecommendations: Make it possible to revalidate based on score (T316079)
  • 13:19 urbanecm@deploy1002: Finished scap: Backport for Enable VisualEditor in Project and Draft namespaces on shwiki (T344432) (duration: 15m 16s)
  • 13:15 urbanecm@deploy1002: urbanecm and aleksandar: Continuing with sync
  • 13:12 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir3003.esams.wmnet with OS bullseye
  • 13:12 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir3003.esams.wmnet - jmm@cumin2002"
  • 13:10 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir3003.esams.wmnet - jmm@cumin2002"
  • 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ncredir3003.esams.wmnet on all recursors
  • 13:10 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ncredir3003.esams.wmnet on all recursors
  • 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:09 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 13:09 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ncredir3003.esams.wmnet
  • 13:05 urbanecm@deploy1002: urbanecm and aleksandar: Backport for Enable VisualEditor in Project and Draft namespaces on shwiki (T344432) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:04 urbanecm@deploy1002: Started scap: Backport for Enable VisualEditor in Project and Draft namespaces on shwiki (T344432)
  • 12:55 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 12:46 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 12:27 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 12:26 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 12:26 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 12:26 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 12:25 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 12:25 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 12:04 jelto: restart jwt-authorizer service (docker-registry-ha-jwt.service) on registry nodes - T337474
  • 12:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ncredir3003.esams.wmnet
  • 12:03 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ncredir3003.esams.wmnet with OS bullseye
  • 11:35 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 11:34 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 11:33 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 11:32 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 11:32 cgoubert@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:31 cgoubert@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 11:31 cgoubert@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:29 cgoubert@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:21 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 11:16 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 11:16 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 11:13 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 11:12 cgoubert@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:12 cgoubert@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 11:11 cgoubert@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:11 cgoubert@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:10 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir3003.esams.wmnet with OS bullseye
  • 11:08 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir3003.esams.wmnet - jmm@cumin2002"
  • 11:08 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir3003.esams.wmnet - jmm@cumin2002"
  • 11:07 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ncredir3003.esams.wmnet on all recursors
  • 11:07 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ncredir3003.esams.wmnet on all recursors
  • 11:07 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:07 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir3003.esams.wmnet - jmm@cumin2002"
  • 11:06 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir3003.esams.wmnet - jmm@cumin2002"
  • 11:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:04 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ncredir3003.esams.wmnet
  • 10:56 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:56 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: removing DNS names for ae1-103.cr2-esams and vrrp-gw-103 - sukhe@cumin2002"
  • 10:55 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: removing DNS names for ae1-103.cr2-esams and vrrp-gw-103 - sukhe@cumin2002"
  • 10:54 sukhe: run authdns-update for CR 949938
  • 10:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti3007.esams.wmnet to cluster esams01 and group BY27
  • 10:53 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti3007.esams.wmnet to cluster esams01 and group BY27
  • 10:49 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1107.eqiad.wmnet with OS bullseye
  • 10:35 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 10:26 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1107.eqiad.wmnet with reason: host reimage
  • 10:23 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1107.eqiad.wmnet with reason: host reimage
  • 10:23 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
  • 10:23 effie: depool kartotherian (maps) codfw - T344324
  • 10:23 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti3007.esams.wmnet to cluster esams01 and group BY27
  • 10:22 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti3007.esams.wmnet to cluster esams01 and group BY27
  • 10:22 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 10:21 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 10:16 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 10:15 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 10:13 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 10:13 jiji@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 10:10 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1107.eqiad.wmnet with OS bullseye
  • 10:10 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1106.eqiad.wmnet with OS bullseye
  • 10:10 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 10:09 moritzm: installing ghostscript security updates
  • 10:09 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1107.eqiad.wmnet with OS bullseye
  • 10:08 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
  • 09:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1106.eqiad.wmnet with reason: host reimage
  • 09:43 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1106.eqiad.wmnet with reason: host reimage
  • 09:39 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
  • 09:36 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host prometheus3003.esams.wmnet
  • 09:36 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus3003.esams.wmnet with OS bullseye
  • 09:35 effie: temporarily pooling kartotherian on codfw
  • 09:31 btullis@deploy1002: Finished deploy [airflow-dags/analytics@ff0a21b]: (no justification provided) (duration: 00m 22s)
  • 09:30 btullis@deploy1002: Started deploy [airflow-dags/analytics@ff0a21b]: (no justification provided)
  • 09:29 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1107.eqiad.wmnet with OS bullseye
  • 09:28 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1106.eqiad.wmnet with OS bullseye
  • 09:27 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:27 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for ganeti01.svc.esams.wmnet - cmooney@cumin1001"
  • 09:26 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for ganeti01.svc.esams.wmnet - cmooney@cumin1001"
  • 09:24 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 09:23 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:23 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for ganeti01.svc.esams.wmnet - cmooney@cumin1001"
  • 09:22 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for ganeti01.svc.esams.wmnet - cmooney@cumin1001"
  • 09:22 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on prometheus3003.esams.wmnet with reason: host reimage
  • 09:20 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 09:19 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on prometheus3003.esams.wmnet with reason: host reimage
  • 09:03 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti3003.esams.wmnet
  • 09:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti3003.esams.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 09:02 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti3003.esams.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:55 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:51 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti3003.esams.wmnet
  • 08:51 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti3002.esams.wmnet
  • 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti3002.esams.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:49 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti3002.esams.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:47 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:45 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host prometheus3003.esams.wmnet with OS bullseye
  • 08:45 filippo@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM prometheus3003.esams.wmnet - filippo@cumin1001"
  • 08:44 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM prometheus3003.esams.wmnet - filippo@cumin1001"
  • 08:44 filippo@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) prometheus3003.esams.wmnet on all recursors
  • 08:44 filippo@cumin1001: START - Cookbook sre.dns.wipe-cache prometheus3003.esams.wmnet on all recursors
  • 08:43 filippo@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:43 filippo@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus3003.esams.wmnet - filippo@cumin1001"
  • 08:43 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus3003.esams.wmnet - filippo@cumin1001"
  • 08:41 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti3002.esams.wmnet
  • 08:41 filippo@cumin1001: START - Cookbook sre.dns.netbox
  • 08:41 filippo@cumin1001: START - Cookbook sre.ganeti.makevm for new host prometheus3003.esams.wmnet
  • 08:41 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti3001.esams.wmnet
  • 08:41 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:40 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti3001.esams.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:38 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti3001.esams.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:34 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:31 kart_: Updated cxserver to 2023-08-14-091804-production (T336683, T343211)
  • 08:29 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts prometheus3002.esams.wmnet
  • 08:29 filippo@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:29 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti3001.esams.wmnet
  • 08:28 filippo@cumin1001: START - Cookbook sre.dns.netbox
  • 08:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ncredir3001.esams.wmnet
  • 08:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ncredir3001.esams.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:24 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 08:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ncredir3001.esams.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:23 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 08:23 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts prometheus3002.esams.wmnet
  • 08:21 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:17 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 08:17 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ncredir3001.esams.wmnet
  • 08:16 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 08:16 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ncredir3002.esams.wmnet
  • 08:16 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:16 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ncredir3002.esams.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:15 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ncredir3002.esams.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:14 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 08:14 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 08:13 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:08 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ncredir3002.esams.wmnet
  • 08:07 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 08:07 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 08:01 apergos: UTC morning backport and config window done
  • 07:59 ariel@deploy1002: Finished scap: Backport for zhwiki: Create abusefilter-helper group (T344398) (duration: 11m 18s)
  • 07:51 ariel@deploy1002: stang and ariel: Continuing with sync
  • 07:49 ariel@deploy1002: stang and ariel: Backport for zhwiki: Create abusefilter-helper group (T344398) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts install3002.wikimedia.org
  • 07:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install3002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 07:47 ariel@deploy1002: Started scap: Backport for zhwiki: Create abusefilter-helper group (T344398)
  • 07:45 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install3002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 07:44 taavi@deploy1002: Finished scap: Backport for Use UserIdentity::LOCAL in PreliminaryCheckService when appropriate (T344403), Use UserIdentity::LOCAL in PreliminaryCheckService when appropriate (T344403) (duration: 11m 25s)
  • 07:40 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 07:39 gehel@cumin1001: conftool action : set/pooled=yes; selector: name=cloudelastic1006.wikimedia.org
  • 07:37 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 07:36 taavi@deploy1002: dreamyjazz and taavi: Continuing with sync
  • 07:35 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 07:35 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install3002.wikimedia.org
  • 07:34 taavi@deploy1002: dreamyjazz and taavi: Backport for Use UserIdentity::LOCAL in PreliminaryCheckService when appropriate (T344403), Use UserIdentity::LOCAL in PreliminaryCheckService when appropriate (T344403) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible vi
  • 07:33 taavi@deploy1002: Started scap: Backport for Use UserIdentity::LOCAL in PreliminaryCheckService when appropriate (T344403), Use UserIdentity::LOCAL in PreliminaryCheckService when appropriate (T344403)
  • 07:32 gehel: restarting elasticsearch on cloudelastic1006 (high GC)
  • 07:32 gehel@cumin1001: conftool action : set/pooled=no; selector: name=cloudelastic1006.wikimedia.org
  • 07:31 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 07:28 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 07:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM install3003.wikimedia.org
  • 07:23 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM install3003.wikimedia.org
  • 07:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on install3002.wikimedia.org with reason: decom in progress
  • 07:12 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on install3002.wikimedia.org with reason: decom in progress
  • 06:59 _joe_: updated vopsbot on the icinga hosts T344316
  • 01:59 fab@deploy1002: Finished deploy [airflow-dags/research@ff0a21b]: (no justification provided) (duration: 00m 22s)
  • 01:58 eileen: civicrm upgraded from 0f981bc4 to 4f7f1e68
  • 01:58 fab@deploy1002: Started deploy [airflow-dags/research@ff0a21b]: (no justification provided)
  • 01:55 fab@deploy1002: Finished deploy [airflow-dags/research@ff0a21b]: (no justification provided) (duration: 00m 19s)
  • 01:55 fab@deploy1002: Started deploy [airflow-dags/research@ff0a21b]: (no justification provided)
  • 01:54 fab@deploy1002: Finished deploy [airflow-dags/research@ff0a21b]: (no justification provided) (duration: 00m 20s)
  • 01:53 fab@deploy1002: Started deploy [airflow-dags/research@ff0a21b]: (no justification provided)
  • 00:10 eileen: civicrm upgraded from 5e631101 to 0f981bc4 - add email custom fields

2023-08-16

  • 21:52 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk2002.codfw.wmnet
  • 21:52 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host flink-zk2002.codfw.wmnet with OS bookworm
  • 21:49 ryankemper: T343124 [WDQS] Pooled `wdqs1012` and `wdqs1013` (passing checks after reimage/data transfer)
  • 20:37 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk2002.codfw.wmnet with OS bookworm
  • 20:36 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk2002.codfw.wmnet - bking@cumin1001"
  • 20:35 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk2002.codfw.wmnet - bking@cumin1001"
  • 20:35 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk2002.codfw.wmnet on all recursors
  • 20:35 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk2002.codfw.wmnet on all recursors
  • 20:35 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:35 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2002.codfw.wmnet - bking@cumin1001"
  • 20:34 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2002.codfw.wmnet - bking@cumin1001"
  • 20:32 urbanecm@deploy1002: Finished scap: Backport for Some initial configurations for suwikisource (T344314) (duration: 12m 11s)
  • 20:32 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 20:32 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk2002.codfw.wmnet
  • 20:25 urbanecm@deploy1002: urbanecm and anzx: Continuing with sync
  • 20:22 urbanecm@deploy1002: urbanecm and anzx: Backport for Some initial configurations for suwikisource (T344314) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:20 urbanecm@deploy1002: Started scap: Backport for Some initial configurations for suwikisource (T344314)
  • 20:14 urbanecm@deploy1002: Finished scap: Backport for Remove unusual VisualEditor config for Wikitech (T241961), Disable upcoming wgMFShowEditNotices in production (T312587), Explicitly set DiscussionToolsAutoTopicSubEditor to discussiontoolsapi (duration: 09m 27s)
  • 20:07 urbanecm@deploy1002: esanders and urbanecm and matmarex: Continuing with sync
  • 20:06 urbanecm@deploy1002: esanders and urbanecm and matmarex: Backport for Remove unusual VisualEditor config for Wikitech (T241961), Disable upcoming wgMFShowEditNotices in production (T312587), Explicitly set DiscussionToolsAutoTopicSubEditor to discussiontoolsapi synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwde
  • 20:04 urbanecm@deploy1002: Started scap: Backport for Remove unusual VisualEditor config for Wikitech (T241961), Disable upcoming wgMFShowEditNotices in production (T312587), Explicitly set DiscussionToolsAutoTopicSubEditor to discussiontoolsapi
  • 19:48 dancy@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.22 refs T343724 (duration: 07m 14s)
  • 19:40 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.22 refs T343724
  • 18:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs[3005-3007].esams.wmnet
  • 18:10 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:10 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs[3005-3007].esams.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 18:10 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.22 refs T343724
  • 18:09 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs[3005-3007].esams.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 18:06 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 17:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp[3062-3065].esams.wmnet
  • 17:58 brett@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:58 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp[3062-3065].esams.wmnet decommissioned, removing all IPs except the asset tag one - brett@cumin2002"
  • 17:58 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs[3005-3007].esams.wmnet
  • 17:56 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp[3062-3065].esams.wmnet decommissioned, removing all IPs except the asset tag one - brett@cumin2002"
  • 17:54 brett@cumin2002: START - Cookbook sre.dns.netbox
  • 17:52 fabfur: restart ntp on A:dns-rec and A:edges'
  • 17:50 fabfur: run puppet-agent on A:dns-rec to restart ntp service
  • 17:45 brett@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp[3062-3065].esams.wmnet
  • 17:43 brett@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp[3058-3061].esams.wmnet
  • 17:43 brett@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:43 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp[3058-3061].esams.wmnet decommissioned, removing all IPs except the asset tag one - brett@cumin2002"
  • 17:40 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp[3058-3061].esams.wmnet decommissioned, removing all IPs except the asset tag one - brett@cumin2002"
  • 17:38 brett@cumin2002: START - Cookbook sre.dns.netbox
  • 17:26 brett@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp[3058-3061].esams.wmnet
  • 17:23 brett@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp[3054-3057].esams.wmnet
  • 17:23 brett@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:23 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp[3054-3057].esams.wmnet decommissioned, removing all IPs except the asset tag one - brett@cumin2002"
  • 17:22 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp[3054-3057].esams.wmnet decommissioned, removing all IPs except the asset tag one - brett@cumin2002"
  • 17:20 brett@cumin2002: START - Cookbook sre.dns.netbox
  • 17:10 brett@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp[3054-3057].esams.wmnet
  • 17:07 brett@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp[3050-3053].esams.wmnet
  • 17:07 brett@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:07 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp[3050-3053].esams.wmnet decommissioned, removing all IPs except the asset tag one - brett@cumin2002"
  • 17:06 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1105.eqiad.wmnet with OS bullseye
  • 17:06 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp[3050-3053].esams.wmnet decommissioned, removing all IPs except the asset tag one - brett@cumin2002"
  • 17:06 btullis@deploy1002: Finished deploy [analytics/aqs/deploy@cf0e57d] (aqs): T342213 (duration: 09m 04s)
  • 17:04 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1104.eqiad.wmnet with OS bullseye
  • 17:02 brett@cumin2002: START - Cookbook sre.dns.netbox
  • 16:57 btullis@deploy1002: Started deploy [analytics/aqs/deploy@cf0e57d] (aqs): T342213
  • 16:57 btullis@deploy1002: Finished deploy [analytics/aqs/deploy@cf0e57d] (aqs): T342213 (duration: 04m 00s)
  • 16:53 btullis@deploy1002: Started deploy [analytics/aqs/deploy@cf0e57d] (aqs): T342213
  • 16:52 sukhe: restarting ntp service in core sites
  • 16:51 btullis@deploy1002: Started deploy [analytics/aqs/deploy@cf0e57d] (aqs): T342213
  • 16:47 btullis@deploy1002: Finished deploy [analytics/aqs/deploy@ec5d4cd] (aqs): T342213 (duration: 01m 48s)
  • 16:47 brett@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp[3050-3053].esams.wmnet
  • 16:45 btullis@deploy1002: Started deploy [analytics/aqs/deploy@ec5d4cd] (aqs): T342213
  • 16:44 btullis@deploy1002: deploy aborted: T342213 (duration: 00m 59s)
  • 16:43 btullis@deploy1002: Started deploy [analytics/aqs/deploy@ec5d4cd]: T342213
  • 16:43 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1105.eqiad.wmnet with reason: host reimage
  • 16:41 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1104.eqiad.wmnet with reason: host reimage
  • 16:41 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns3002.wikimedia.org
  • 16:41 fabfur@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:41 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns3002.wikimedia.org decommissioned, removing all IPs except the asset tag one - fabfur@cumin1001"
  • 16:40 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns3002.wikimedia.org decommissioned, removing all IPs except the asset tag one - fabfur@cumin1001"
  • 16:40 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts ncredir[3001-3002].esams.wmnet
  • 16:39 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts ncredir[3001-3002].esams.wmnet
  • 16:38 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1105.eqiad.wmnet with reason: host reimage
  • 16:38 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1104.eqiad.wmnet with reason: host reimage
  • 16:36 fabfur@cumin1001: START - Cookbook sre.dns.netbox
  • 16:34 jbond: mv /var/lib/puppet/volatile/squid /home/jbond on puppetmaster1001 as it appears unused
  • 16:31 fabfur@cumin1001: START - Cookbook sre.hosts.decommission for hosts dns3002.wikimedia.org
  • 16:31 jnuche: restarting CI Jenkins to update plugins
  • 16:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum[3001-3002].esams.wmnet
  • 16:31 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:31 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum[3001-3002].esams.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 16:30 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum[3001-3002].esams.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 16:28 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 16:25 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dns3001.wikimedia.org
  • 16:25 fabfur@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 16:24 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1105.eqiad.wmnet with OS bullseye
  • 16:24 fabfur@cumin1001: START - Cookbook sre.dns.netbox
  • 16:23 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1104.eqiad.wmnet with OS bullseye
  • 16:23 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum[3001-3002].esams.wmnet
  • 16:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts doh[3001-3002].wikimedia.org
  • 16:22 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:22 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doh[3001-3002].wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 16:21 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doh[3001-3002].wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 16:21 jbond: mv /var/lib/puppet/volatile/misc /home/jbond on puppetmaster1001 as it (legacy geoip data) appears unused
  • 16:19 fabfur@cumin1001: START - Cookbook sre.hosts.decommission for hosts dns3001.wikimedia.org
  • 16:17 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 16:12 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts doh[3001-3002].wikimedia.org
  • 16:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install3003.wikimedia.org
  • 16:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install3003.wikimedia.org with OS bullseye
  • 16:00 fabfur: running puppet-agent on A:cumin A:dns-rec A:netbox to remove dns3001 and dns3002
  • 15:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install3003.wikimedia.org with reason: host reimage
  • 15:52 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on install3003.wikimedia.org with reason: host reimage
  • 15:48 sukhe: restart pybal on new lvses in esams
  • 15:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts netflow3002.esams.wmnet
  • 15:46 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:46 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: netflow3002.esams.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 15:45 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: netflow3002.esams.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 15:43 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1102.eqiad.wmnet with OS bullseye
  • 15:43 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 15:41 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1103.eqiad.wmnet with OS bullseye
  • 15:39 sukhe: homer "mr*" commit "add ntp_servers add dns300[34]"
  • 15:39 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts netflow3002.esams.wmnet
  • 15:38 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host install3003.wikimedia.org with OS bullseye
  • 15:35 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install3003.wikimedia.org - jmm@cumin2002"
  • 15:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install3003.wikimedia.org - jmm@cumin2002"
  • 15:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install3003.wikimedia.org on all recursors
  • 15:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install3003.wikimedia.org on all recursors
  • 15:33 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:33 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install3003.wikimedia.org - jmm@cumin2002"
  • 15:33 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install3003.wikimedia.org - jmm@cumin2002"
  • 15:29 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 15:28 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install3003.wikimedia.org
  • 15:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping3003.esams.wmnet
  • 15:20 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping3003.esams.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 15:19 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1102.eqiad.wmnet with reason: host reimage
  • 15:18 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping3003.esams.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 15:17 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1103.eqiad.wmnet with reason: host reimage
  • 15:14 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1102.eqiad.wmnet with reason: host reimage
  • 15:14 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1103.eqiad.wmnet with reason: host reimage
  • 15:14 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host netflow3003.esams.wmnet
  • 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netflow3003.esams.wmnet with OS bookworm
  • 15:10 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ping3003.esams.wmnet
  • 15:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast3006.wikimedia.org
  • 15:09 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:09 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3006.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 15:08 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3006.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 15:04 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 15:00 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1103.eqiad.wmnet with OS bullseye
  • 15:00 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1102.eqiad.wmnet with OS bullseye
  • 15:00 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast3006.wikimedia.org
  • 14:58 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@155299c] (releasing): (no justification provided) (duration: 00m 41s)
  • 14:57 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@155299c] (releasing): (no justification provided)
  • 14:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netflow3003.esams.wmnet with reason: host reimage
  • 14:52 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netflow3003.esams.wmnet with reason: host reimage
  • 14:51 jelto: registry* - upgrade jwt-authorizer package on all 4 hosts to version 1.1.1-1 - T337474
  • 14:30 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1101.eqiad.wmnet with OS bullseye
  • 14:28 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host netflow3003.esams.wmnet with OS bookworm
  • 14:27 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netflow3003.esams.wmnet - jmm@cumin2002"
  • 14:26 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netflow3003.esams.wmnet - jmm@cumin2002"
  • 14:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netflow3003.esams.wmnet on all recursors
  • 14:25 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netflow3003.esams.wmnet on all recursors
  • 14:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:25 sukhe: ssh pcc-db1001.puppet-diffs.eqiad1.wikimedia.cloud sudo -u jenkins-deploy /usr/local/sbin/pcc_facts_processor
  • 14:24 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 14:24 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host netflow3003.esams.wmnet
  • 14:23 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1100.eqiad.wmnet with OS bullseye
  • 14:19 urbanecm@deploy1002: Finished scap: Backport for Revert "Growth: Temporarily disable link-recommendation frontend" (T344034) (duration: 09m 06s)
  • 14:18 sukhe: sukhe@puppetmaster1001:~$ sudo /usr/local/sbin/puppet-facts-upload --proxy http://webproxy.eqiad.wmnet:8080
  • 14:13 urbanecm@deploy1002: urbanecm: Continuing with sync
  • 14:12 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=93) for new host netflow3003.esams.wmnet
  • 14:12 urbanecm@deploy1002: urbanecm: Backport for Revert "Growth: Temporarily disable link-recommendation frontend" (T344034) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 14:12 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:10 urbanecm@deploy1002: Started scap: Backport for Revert "Growth: Temporarily disable link-recommendation frontend" (T344034)
  • 14:10 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 14:10 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netflow3003.esams.wmnet on all recursors
  • 14:10 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netflow3003.esams.wmnet on all recursors
  • 14:10 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:10 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netflow3003.esams.wmnet - jmm@cumin2002"
  • 14:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netflow3003.esams.wmnet - jmm@cumin2002"
  • 14:06 urbanecm@deploy1002: Finished scap: Backport for jobqueue: Disallow cross-wiki JobQueueGroup calls that require JobClasses (T344223 T343291), Growth: Enable new Impact backend on large Wikipedias (T344143) (duration: 14m 13s)
  • 14:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 14:05 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host netflow3003.esams.wmnet
  • 14:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1101.eqiad.wmnet with reason: host reimage
  • 14:00 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1101.eqiad.wmnet with reason: host reimage
  • 14:00 urbanecm@deploy1002: urbanecm and d3r1ck01: Continuing with sync
  • 13:53 urbanecm@deploy1002: urbanecm and d3r1ck01: Backport for jobqueue: Disallow cross-wiki JobQueueGroup calls that require JobClasses (T344223 T343291), Growth: Enable new Impact backend on large Wikipedias (T344143) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessibl
  • 13:52 sukhe: restart pybal on lvs3008
  • 13:52 urbanecm@deploy1002: Started scap: Backport for jobqueue: Disallow cross-wiki JobQueueGroup calls that require JobClasses (T344223 T343291), Growth: Enable new Impact backend on large Wikipedias (T344143)
  • 13:51 sukhe: running authdns-update for CR 949113
  • 13:51 urbanecm@deploy1002: Finished scap: Backport for Add blkwiktionary logo (T344310), add suwikisource logo (T344314) (duration: 10m 37s)
  • 13:46 sukhe: running homer on asw1-b*27-esams* for CR 949100: T329219
  • 13:44 urbanecm@deploy1002: urbanecm and anzx: Continuing with sync
  • 13:43 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1101.eqiad.wmnet with OS bullseye
  • 13:42 urbanecm@deploy1002: urbanecm and anzx: Backport for Add blkwiktionary logo (T344310), add suwikisource logo (T344314) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:41 urbanecm@deploy1002: Started scap: Backport for Add blkwiktionary logo (T344310), add suwikisource logo (T344314)
  • 13:40 urbanecm@deploy1002: Finished scap: Backport for testwikidatawiki: always show MUL in Termbox (T343409) (duration: 09m 43s)
  • 13:33 urbanecm@deploy1002: migr and urbanecm: Continuing with sync
  • 13:32 urbanecm@deploy1002: migr and urbanecm: Backport for testwikidatawiki: always show MUL in Termbox (T343409) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:30 urbanecm@deploy1002: Started scap: Backport for testwikidatawiki: always show MUL in Termbox (T343409)
  • 13:29 urbanecm@deploy1002: Finished scap: Backport for GrowthExperiments: enable add a link in 11th round of wikis (T308136) (duration: 11m 32s)
  • 13:29 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti3008.esams.wmnet to cluster esams02 and group BW27
  • 13:28 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti3008.esams.wmnet to cluster esams02 and group BW27
  • 13:23 urbanecm@deploy1002: sgimeno and urbanecm: Continuing with sync
  • 13:19 urbanecm@deploy1002: sgimeno and urbanecm: Backport for GrowthExperiments: enable add a link in 11th round of wikis (T308136) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:18 urbanecm@deploy1002: Started scap: Backport for GrowthExperiments: enable add a link in 11th round of wikis (T308136)
  • 12:59 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti3008.esams.wmnet to cluster esams02 and group BW27
  • 12:59 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti3008.esams.wmnet to cluster esams02 and group BW27
  • 12:58 urbanecm: mwscript extensions/GrowthExperiments/maintenance/revalidateLinkRecommendations.php --wiki=ruwiki --olderThan=1651960800 --verbose # T344034
  • 12:54 urbanecm@deploy1002: Finished scap: Backport for Growth: Temporarily disable link-recommendation frontend (T344034) (duration: 08m 04s)
  • 12:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti3008.esams.wmnet to cluster esams02 and group BW27
  • 12:53 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti3008.esams.wmnet to cluster esams02 and group BW27
  • 12:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti3008.esams.wmnet to cluster esams02 and group BW27
  • 12:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti3008.esams.wmnet to cluster esams02 and group BW27
  • 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3008.esams.wmnet
  • 12:46 urbanecm@deploy1002: Started scap: Backport for Growth: Temporarily disable link-recommendation frontend (T344034)
  • 12:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3008.esams.wmnet
  • 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti3008.esams.wmnet to cluster esams02 and group BW27
  • 12:39 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti3008.esams.wmnet to cluster esams02 and group BW27
  • 12:37 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 12:37 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti3008.mgmt.esams.wmnet with reboot policy GRACEFUL
  • 12:37 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti3007.mgmt.esams.wmnet with reboot policy GRACEFUL
  • 12:36 jiji@deploy1002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
  • 12:36 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti3006.mgmt.esams.wmnet with reboot policy GRACEFUL
  • 12:34 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti3005.mgmt.esams.wmnet with reboot policy GRACEFUL
  • 12:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1100.eqiad.wmnet with reason: host reimage
  • 12:30 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1100.eqiad.wmnet with reason: host reimage
  • 12:29 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host ganeti3008.mgmt.esams.wmnet with reboot policy GRACEFUL
  • 12:29 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host ganeti3007.mgmt.esams.wmnet with reboot policy GRACEFUL
  • 12:28 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host ganeti3006.mgmt.esams.wmnet with reboot policy GRACEFUL
  • 12:26 urbanecm@deploy1002: Finished scap: Backport for revalidateLinkRecommendations: Make it possible to revalidate tasks older than (T344034), revalidateLinkRecommendations: Make it possible to revalidate tasks older than (T344034) (duration: 08m 20s)
  • 12:26 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host ganeti3005.mgmt.esams.wmnet with reboot policy GRACEFUL
  • 12:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3006.esams.wmnet
  • 12:20 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti3008.esams.wmnet to cluster esams02 and group BW27
  • 12:20 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti3008.esams.wmnet to cluster esams02 and group BW27
  • 12:18 urbanecm@deploy1002: Started scap: Backport for revalidateLinkRecommendations: Make it possible to revalidate tasks older than (T344034), revalidateLinkRecommendations: Make it possible to revalidate tasks older than (T344034)
  • 12:16 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1100.eqiad.wmnet with OS bullseye
  • 12:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3006.esams.wmnet
  • 11:45 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti3006.esams.wmnet
  • 11:20 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti3006.mgmt.esams.wmnet with reboot policy GRACEFUL
  • 11:13 moritzm: imported 0.9.0-3~wmf12u1 for bookworm-wikimedia and 0.9.0-3~wmf11u1 for bullseye-wikimedia T340045
  • 11:12 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host ganeti3006.mgmt.esams.wmnet with reboot policy GRACEFUL
  • 10:51 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
  • 10:45 effie: depooling maps on codfw - T344110
  • 10:39 fabfur: restarting haproxy service on all knams cp hosts to silence alerts
  • 10:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3006.esams.wmnet
  • 10:24 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:24 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Swap esams ganeti0[1|2] cluster IPs due to subnet/rack mis-allocation - cmooney@cumin1001"
  • 10:23 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Swap esams ganeti0[1|2] cluster IPs due to subnet/rack mis-allocation - cmooney@cumin1001"
  • 10:16 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 09:57 jiji@deploy1002: Finished deploy [kartotherian/deploy@3325683] (codfw): Cleanup stale config references (duration: 00m 23s)
  • 09:57 jiji@deploy1002: Started deploy [kartotherian/deploy@3325683] (codfw): Cleanup stale config references
  • 09:57 jiji@deploy1002: Finished deploy [kartotherian/deploy@3325683] (codfw): Cleanup stale config references (duration: 00m 23s)
  • 09:56 jiji@deploy1002: Started deploy [kartotherian/deploy@3325683] (codfw): Cleanup stale config references
  • 09:56 jiji@deploy1002: Finished deploy [kartotherian/deploy@3325683] (codfw): Cleanup stale config references (duration: 00m 23s)
  • 09:56 jiji@deploy1002: Started deploy [kartotherian/deploy@3325683] (codfw): Cleanup stale config references
  • 09:54 jiji@deploy1002: Finished deploy [kartotherian/deploy@3325683] (codfw): Cleanup stale config references (duration: 00m 23s)
  • 09:54 jiji@deploy1002: Started deploy [kartotherian/deploy@3325683] (codfw): Cleanup stale config references
  • 09:54 jiji@deploy1002: Finished deploy [kartotherian/deploy@3325683] (codfw): Cleanup stale config references (duration: 04m 53s)
  • 09:49 jiji@deploy1002: Started deploy [kartotherian/deploy@3325683] (codfw): Cleanup stale config references
  • 09:45 jiji@deploy1002: Finished deploy [kartotherian/deploy@3325683] (codfw): Cleanup stale config references (duration: 00m 23s)
  • 09:44 jiji@deploy1002: Started deploy [kartotherian/deploy@3325683] (codfw): Cleanup stale config references
  • 09:43 jiji@deploy1002: Finished deploy [kartotherian/deploy@3325683] (eqiad): Cleanup stale config references (duration: 01m 06s)
  • 09:42 jiji@deploy1002: Started deploy [kartotherian/deploy@3325683] (eqiad): Cleanup stale config references
  • 09:41 jiji@deploy1002: Finished deploy [kartotherian/deploy@3325683] (eqiad): Cleanup stale config references (duration: 00m 17s)
  • 09:41 jiji@deploy1002: Started deploy [kartotherian/deploy@3325683] (eqiad): Cleanup stale config references
  • 09:35 jiji@deploy1002: Finished deploy [kartotherian/deploy@3325683] (eqiad): Cleanup stale config references (duration: 00m 16s)
  • 09:35 jiji@deploy1002: Started deploy [kartotherian/deploy@3325683] (eqiad): Cleanup stale config references
  • 09:35 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:35 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Change reverse for IPs on cr2-esams that had old cr3-knams in dns names - cmooney@cumin1001"
  • 09:34 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Change reverse for IPs on cr2-esams that had old cr3-knams in dns names - cmooney@cumin1001"
  • 09:31 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 09:29 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1099.eqiad.wmnet with OS bullseye
  • 09:07 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1099.eqiad.wmnet with reason: host reimage
  • 09:04 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1099.eqiad.wmnet with reason: host reimage
  • 08:48 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1099.eqiad.wmnet with OS bullseye
  • 08:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3008.esams.wmnet
  • 08:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3008.esams.wmnet
  • 08:07 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti3006.esams.wmnet
  • 08:03 taavi: mwscript extensions/OATHAuth/maintenance/UpdateForMultipleDevicesSupport.php techconductwiki # T242031
  • 07:53 taavi@deploy1002: Finished scap: Backport for Added extended confirmed on nlwiki (T329642) (duration: 18m 15s)
  • 07:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3006.esams.wmnet
  • 07:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti3008.esams.wmnet with OS bullseye
  • 07:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
  • 07:46 taavi@deploy1002: bmdehaan and taavi: Continuing with sync
  • 07:45 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
  • 07:36 taavi@deploy1002: bmdehaan and taavi: Backport for Added extended confirmed on nlwiki (T329642) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:34 taavi@deploy1002: Started scap: Backport for Added extended confirmed on nlwiki (T329642)
  • 07:33 taavi@deploy1002: Finished scap: Backport for Some initial configurations for blkwiktionary (T344310) (duration: 09m 17s)
  • 07:26 taavi@deploy1002: taavi and zabe: Continuing with sync
  • 07:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti3008.esams.wmnet with reason: host reimage
  • 07:25 taavi@deploy1002: taavi and zabe: Backport for Some initial configurations for blkwiktionary (T344310) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:24 taavi@deploy1002: Started scap: Backport for Some initial configurations for blkwiktionary (T344310)
  • 07:23 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti3008.esams.wmnet with reason: host reimage
  • 07:21 taavi@deploy1002: Finished scap: Backport for clienthints: Collect client hints on group1 wikis except two wikis (T341110) (duration: 11m 38s)
  • 07:14 taavi@deploy1002: taavi and dreamyjazz: Continuing with sync
  • 07:11 taavi@deploy1002: taavi and dreamyjazz: Backport for clienthints: Collect client hints on group1 wikis except two wikis (T341110) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:09 taavi@deploy1002: Started scap: Backport for clienthints: Collect client hints on group1 wikis except two wikis (T341110)
  • 07:04 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti3008.esams.wmnet with OS bullseye
  • 07:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti3006.esams.wmnet with OS bullseye
  • 07:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
  • 06:47 jforrester@deploy1002: Finished scap: Backport for wikifunctions: add / to the route for wikifunctions (duration: 08m 24s)
  • 06:46 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
  • 06:41 jforrester@deploy1002: oblivian and jforrester: Continuing with sync
  • 06:41 jforrester@deploy1002: oblivian and jforrester: Backport for wikifunctions: add / to the route for wikifunctions synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 06:39 jforrester@deploy1002: Started scap: Backport for wikifunctions: add / to the route for wikifunctions
  • 06:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti3006.esams.wmnet with reason: host reimage
  • 06:24 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti3006.esams.wmnet with reason: host reimage
  • 06:05 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti3006.esams.wmnet with OS bullseye
  • 04:49 zabe@deploy1002: Finished scap: update interwiki cache (duration: 08m 12s)
  • 04:41 zabe@deploy1002: Started scap: update interwiki cache
  • 04:38 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 04:37 zabe@deploy1002: Finished scap: T343540 (duration: 09m 15s)
  • 04:31 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 04:31 zabe@deploy1002: zabe: Continuing with sync
  • 04:29 zabe@deploy1002: zabe: T343540 synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 04:29 zabe: create Wiktionary Pa'O # T343540
  • 04:28 zabe@deploy1002: Started scap: T343540
  • 04:19 zabe@deploy1002: Finished scap: update interwiki cache (duration: 08m 17s)
  • 04:11 zabe@deploy1002: Started scap: update interwiki cache
  • 04:07 zabe@deploy1002: Finished scap: T343539 (duration: 08m 49s)
  • 04:01 zabe@deploy1002: zabe: Continuing with sync
  • 04:00 zabe@deploy1002: zabe: T343539 synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 03:58 zabe@deploy1002: Started scap: T343539
  • 03:58 zabe: create Wikisource Sundanese # T343539
  • 03:53 taavi@deploy1002: Finished scap: Backport for Set WRITE_BOTH for OAuth multiple devices to techconductwiki (T242031) (duration: 09m 07s)
  • 03:47 taavi@deploy1002: taavi: Continuing with sync
  • 03:46 taavi@deploy1002: taavi: Backport for Set WRITE_BOTH for OAuth multiple devices to techconductwiki (T242031) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 03:44 taavi@deploy1002: Started scap: Backport for Set WRITE_BOTH for OAuth multiple devices to techconductwiki (T242031)
  • 03:40 taavi@deploy1002: Finished scap: Backport for Keep both tables up-to-date on WRITE_BOTH (T242031), Keep both tables up-to-date on WRITE_BOTH (T242031) (duration: 10m 58s)
  • 03:33 taavi@deploy1002: taavi: Continuing with sync
  • 03:31 taavi@deploy1002: taavi: Backport for Keep both tables up-to-date on WRITE_BOTH (T242031), Keep both tables up-to-date on WRITE_BOTH (T242031) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 03:30 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 03:29 taavi@deploy1002: Started scap: Backport for Keep both tables up-to-date on WRITE_BOTH (T242031), Keep both tables up-to-date on WRITE_BOTH (T242031)
  • 03:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 03:24 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 03:22 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 02:20 taavi: create oathauth_devices and oathauth_types tables on wikitech, private.dblist, fishbowl.dblist, centralauth T242031
  • 01:57 taavi@deploy1002: Finished scap: Backport for OAuthUserRepository: Ensure we don't end up with duplicate rows (T242031), OAuthUserRepository: Ensure we don't end up with duplicate rows (T242031) (duration: 10m 59s)
  • 01:54 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 01:51 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 01:50 taavi@deploy1002: taavi: Continuing with sync
  • 01:47 taavi@deploy1002: taavi: Backport for OAuthUserRepository: Ensure we don't end up with duplicate rows (T242031), OAuthUserRepository: Ensure we don't end up with duplicate rows (T242031) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD
  • 01:46 taavi@deploy1002: Started scap: Backport for OAuthUserRepository: Ensure we don't end up with duplicate rows (T242031), OAuthUserRepository: Ensure we don't end up with duplicate rows (T242031)
  • 01:41 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 01:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 01:35 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 01:34 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 01:30 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 01:30 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 01:22 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: deploy to freshly reimaged host (duration: 00m 10s)
  • 01:22 ryankemper@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: deploy to freshly reimaged host
  • 01:21 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: deploy to freshly reimaged host (duration: 00m 10s)
  • 01:21 ryankemper@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: deploy to freshly reimaged host

Other archives

2000s

2010s

2020s