Jump to content

Server Admin Log/Archive 63

From Wikitech
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

2023-02-28

  • 23:58 zabe@deploy2002: Started scap: T198673
  • 23:45 ejegg: civicrm upgraded from ffc16d2d to d199694e
  • 23:43 zabe@deploy2002: Synchronized wmf-config/InitialiseSettings.php: T213295 (duration: 06m 56s)
  • 23:24 mutante: miscweb2002 rm -rf /srv/org/wikimedia/design/blog/ - this has moved to /srv/org/wikimedia/design-blog but was not deleted in codfw - bringing both to the same state before switching design.wikimedia.org over T330090
  • 23:20 zabe@deploy2002: Finished scap: Backport for Drop custom testcommonswiki groups (T213295) (duration: 07m 57s)
  • 23:14 zabe@deploy2002: zabe: Backport for Drop custom testcommonswiki groups (T213295) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 23:12 zabe@deploy2002: Started scap: Backport for Drop custom testcommonswiki groups (T213295)
  • 22:46 zabe@deploy2002: Synchronized dblists/: close testcommonswiki T213295 (duration: 07m 11s)
  • 22:31 zabe@deploy2002: Synchronized dblists/: close testcommonswiki T213295 (duration: 06m 40s)
  • 22:24 brennen@deploy2002: Finished deploy [phabricator/deployment@3f2dd1b]: debug deploy to aphlict2001 (duration: 00m 37s)
  • 22:23 brennen@deploy2002: Started deploy [phabricator/deployment@3f2dd1b]: debug deploy to aphlict2001
  • 22:01 apergos: started rsync from dumpsdata1001 to dumpsdata1004 of /data/otherdumps, running in ariel screen session, no bandwidth cap
  • 22:00 jclark@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:57 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 21:50 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 21:42 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 21:32 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@fc4e023]: Deploying section_image_recommendations DAG to platform_eng Airflow instance (duration: 00m 21s)
  • 21:32 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@fc4e023]: Deploying section_image_recommendations DAG to platform_eng Airflow instance
  • 21:26 urbanecm@deploy2002: Finished scap: Backport for GrowthExperiments: Enable Growth features by default on testwikis (T330748) (duration: 07m 43s)
  • 21:19 urbanecm@deploy2002: Started scap: Backport for GrowthExperiments: Enable Growth features by default on testwikis (T330748)
  • 21:14 samtar@deploy2002: Finished scap: Backport for Disable VectorPromoteAddTopic on production wikis initially (T267444) (duration: 10m 36s)
  • 21:05 samtar@deploy2002: esanders and samtar: Backport for Disable VectorPromoteAddTopic on production wikis initially (T267444) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:03 samtar@deploy2002: Started scap: Backport for Disable VectorPromoteAddTopic on production wikis initially (T267444)
  • 20:54 zabe@deploy2002: Finished scap: Backport for MessagesGuc: Remove trailing space from NS_TEMPLATE_TALK translation (T330746 T321881), MessagesGuc: Remove trailing space from NS_TEMPLATE_TALK translation (T330746 T321881) (duration: 10m 06s)
  • 20:45 zabe@deploy2002: zabe: Backport for MessagesGuc: Remove trailing space from NS_TEMPLATE_TALK translation (T330746 T321881), MessagesGuc: Remove trailing space from NS_TEMPLATE_TALK translation (T330746 T321881) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:43 zabe@deploy2002: Started scap: Backport for MessagesGuc: Remove trailing space from NS_TEMPLATE_TALK translation (T330746 T321881), MessagesGuc: Remove trailing space from NS_TEMPLATE_TALK translation (T330746 T321881)
  • 20:20 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:19 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 20:16 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:16 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 20:15 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:14 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:54 dancy@deploy2002: Started scap: testing
  • 19:51 dancy@deploy2002: Installing scap version "latest" for 550 hosts
  • 19:28 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2073']
  • 19:24 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2073']
  • 19:21 ryankemper: [WDQS] (Current time) T301167 Re-enabled icinga notifications for `wdqs20[09-12]`
  • 19:21 ryankemper: [WDQS] (The following was ~20 hours ago, forgot to press enter) T301167 Transferred `/srv/wdqs/categories.jnl` from `wdqs2001` (in-service host) to `wdqs20[09-12]` (new hosts being brought into service)
  • 18:55 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be2073']
  • 18:26 ladsgroup@deploy2002: Finished scap: Backport for Revert "mwscript: Switch to use run.php" (duration: 42m 21s)
  • 18:23 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: name=mw1418.eqiad.wmnet
  • 18:23 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: name=mw1417.eqiad.wmnet
  • 18:23 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: name=mw1416.eqiad.wmnet
  • 18:23 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: name=mw1415.eqiad.wmnet
  • 18:23 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: name=mw1448.eqiad.wmnet
  • 18:23 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: name=mw1447.eqiad.wmnet
  • 18:22 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: name=mw1450.eqiad.wmnet
  • 18:22 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: name=mw1414.eqiad.wmnet
  • 18:22 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: name=mw1449.eqiad.wmnet
  • 18:21 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: name=mw1418.eqiad.wmnet
  • 18:20 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2073']
  • 18:20 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2072']
  • 18:13 ladsgroup@deploy2002: trainbranchbot and ladsgroup: Backport for Revert "mwscript: Switch to use run.php" synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 18:12 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2072']
  • 18:08 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2071']
  • 18:04 ladsgroup@cumin1001: conftool action : set/pooled=no; selector: name=mw1418.eqiad.wmnet
  • 18:04 ladsgroup@cumin1001: conftool action : set/pooled=no; selector: name=mw1417.eqiad.wmnet
  • 18:04 ladsgroup@cumin1001: conftool action : set/pooled=no; selector: name=mw1416.eqiad.wmnet
  • 18:03 ladsgroup@cumin1001: conftool action : set/pooled=no; selector: name=mw1415.eqiad.wmnet
  • 18:03 ladsgroup@cumin1001: conftool action : set/pooled=no; selector: name=mw1448.eqiad.wmnet
  • 18:03 ladsgroup@cumin1001: conftool action : set/pooled=no; selector: name=mw1447.eqiad.wmnet
  • 18:03 ladsgroup@cumin1001: conftool action : set/pooled=no; selector: name=mw1450.eqiad.wmnet
  • 18:03 ladsgroup@cumin1001: conftool action : set/pooled=no; selector: name=mw1414.eqiad.wmnet
  • 18:03 ladsgroup@cumin1001: conftool action : set/pooled=no; selector: name=mw1449.eqiad.wmnet
  • 18:00 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2071']
  • 17:57 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:57 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 17:44 ladsgroup@deploy2002: Started scap: Backport for Revert "mwscript: Switch to use run.php"
  • 17:38 ladsgroup@deploy2002: scap failed: RuntimeError Scap failed!: 9/9 canaries failed their endpoint checks(https://en.wikipedia.org). WARNING: canaries have not been rolled back. (duration: 24m 05s)
  • 17:38 ladsgroup@deploy2002: Scap failed!: 9/9 canaries failed their endpoint checks(https://en.wikipedia.org). WARNING: canaries have not been rolled back.
  • 17:33 ladsgroup@deploy2002: ladsgroup: Backport for mwscript: Switch to use run.php (T326800) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 17:18 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2070']
  • 17:14 ladsgroup@deploy2002: Started scap: Backport for mwscript: Switch to use run.php (T326800)
  • 17:11 ladsgroup@deploy2002: Finished scap: Backport for Convert eval script to Maintenance class (duration: 07m 47s)
  • 17:07 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2070']
  • 17:07 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be2070']
  • 17:05 ladsgroup@deploy2002: ladsgroup: Backport for Convert eval script to Maintenance class synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 17:03 ladsgroup@deploy2002: Started scap: Backport for Convert eval script to Maintenance class
  • 17:00 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 17:00 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin1001"
  • 16:59 ejegg: payments-wiki upgraded from 871c4e5c to b9ea2130
  • 16:52 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin1001"
  • 16:45 claime: Traffic and Service switchovers to codfw finished - T330651 - T330650
  • 16:38 claime: stale discovery files wiped for netbox - T330651
  • 16:36 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dumpsdata1006.eqiad.wmnet with reason: host reimage
  • 16:33 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dumpsdata1006.eqiad.wmnet with reason: host reimage
  • 16:20 cgoubert@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99) depool netbox in codfw: T330651
  • 16:16 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 16:16 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox.discovery.wmnet on all recursors
  • 16:16 cgoubert@cumin1001: START - Cookbook sre.dns.wipe-cache netbox.discovery.wmnet on all recursors
  • 16:16 cgoubert@cumin1001: START - Cookbook sre.discovery.service-route depool netbox in codfw: T330651
  • 16:15 cgoubert@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99) pool netbox in eqiad: T330651
  • 16:11 Lucas_WMDE: changed data source of https://grafana-rw.wikimedia.org/alerting/grafana/MF0FSjJ4z/view from “eqiad prometheus/k8s” to “thanos” to query both eqiad and codfw after dc switch
  • 16:11 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox.discovery.wmnet on all recursors
  • 16:10 cgoubert@cumin1001: START - Cookbook sre.dns.wipe-cache netbox.discovery.wmnet on all recursors
  • 16:10 cgoubert@cumin1001: START - Cookbook sre.discovery.service-route pool netbox in eqiad: T330651
  • 16:09 claime: Switching netbox back to eqiad - T330651
  • 16:06 hnowlan@deploy2002: Finished deploy [restbase/deploy@5271b8f]: New wikis: gucwp, gurwp, vewikimedia T320899 T326237 T327843 (duration: 15m 38s)
  • 15:59 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 15:50 hnowlan@deploy2002: Started deploy [restbase/deploy@5271b8f]: New wikis: gucwp, gurwp, vewikimedia T320899 T326237 T327843
  • 15:49 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 15:47 cgoubert@deploy2002: Synchronized README: check the deployment server after switchover - T330651 (duration: 20m 56s)
  • 15:47 claime: Traffic: eqiad depooled - T330650
  • 15:45 claime: Running authdns-update - T330650
  • 15:45 claime: Traffic: depool eqiad from user traffic - T330650
  • 15:31 moritzm: installing tiff security updates
  • 15:26 claime: Testing scap deployment from deploy2002.codfw.wmnet - T330651
  • 15:25 claime: Removing scap lock on deploy2002.codfw.wmnet
  • 15:23 _joe_: oblivian@deploy2002:~ $ sudo chown imagecatalog:imagecatalog /srv/deployment/imagecatalog/catalog.sqlite
  • 15:21 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 28 days, 0:00:00 on an-airflow1005.eqiad.wmnet with reason: new OS but some puppet stuff doesn't work yet
  • 15:20 claime: Disregard running puppet on fleet-wide - T330651
  • 15:18 claime: Running puppet on fleet-wide - T330651
  • 15:16 claime: Running puppet on all deployment servers - T330651
  • 15:10 claime: Running authdns-update for deployment server switch - T330651
  • 15:05 dcaro@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudcephosd1005.eqiad.wmnet
  • 15:05 dcaro@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:05 dcaro@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dcaro@cumin1001"
  • 15:04 dcaro@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dcaro@cumin1001"
  • 15:04 zabe: zabe@mwmaint1002:~$ mwscript extensions/Flow/maintenance/FlowFixInconsistentBoards.php --wiki=zhwiki --namespaceName='USER_TALK' # T330761
  • 14:56 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 14:53 dcaro@cumin1001: START - Cookbook sre.dns.netbox
  • 14:51 claime: Switch deployment server to deploy2002.codfw.wmnet - T330651
  • 14:48 dcaro@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1005.eqiad.wmnet
  • 14:44 oblivian@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in eqiad: None - None
  • 14:44 oblivian@cumin1001: START - Cookbook sre.discovery.datacenter status all services in eqiad: None - None
  • 14:44 claime: Services switched over to codfw - T329193
  • 14:44 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) depool all services in eqiad: Datacenter Switchover - T330651
  • 14:42 cgoubert@cumin1001: START - Cookbook sre.discovery.datacenter depool all services in eqiad: Datacenter Switchover - T330651
  • 14:42 cgoubert@cumin1001: END (ERROR) - Cookbook sre.discovery.datacenter (exit_code=93) depool all services in eqiad: Datacenter Switchover - T330651
  • 14:42 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 14:25 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 14:24 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 14:22 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 14:22 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 14:21 cgoubert@cumin1001: START - Cookbook sre.discovery.datacenter depool all services in eqiad: Datacenter Switchover - T330651
  • 14:21 claime: switching services over to codfw - T330651
  • 14:21 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 14:21 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 14:08 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 13:59 marostegui: Create dummy and empty enwiki.text table on db2186:3311 to test check_private_data T326596
  • 13:49 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 13:32 elukey@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=inference,name=codfw
  • 13:24 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 13:24 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 13:20 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:20 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 13:18 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 13:15 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 13:12 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:11 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:10 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 13:08 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 13:05 jnuche@deploy1002: Installation of scap version "latest" completed for 550 hosts
  • 13:05 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 13:04 jnuche@deploy1002: Installing scap version "latest" for 550 hosts
  • 13:04 claime: Locking scap deployments for service switchover - T330651
  • 13:04 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 13:04 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 13:02 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 13:01 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 13:00 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 12:58 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 12:56 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:56 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:55 root@cumin1001: END (PASS) - Cookbook sre.k8s.upgrade-cluster (exit_code=0) Upgrade K8s version: Upgrade to k8s 1.23
  • 12:54 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:53 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:53 root@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2001.codfw.wmnet with OS bullseye
  • 12:49 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:48 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:48 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:48 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:47 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:47 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:46 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:46 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:46 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:46 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:45 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:45 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:45 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:45 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:42 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:41 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:40 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2007.codfw.wmnet with OS bullseye
  • 12:39 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:39 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:39 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:38 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2004.codfw.wmnet with OS bullseye
  • 12:38 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:38 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2006.codfw.wmnet with OS bullseye
  • 12:38 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:38 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:37 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:37 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:37 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:36 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:36 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:36 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:35 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:35 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:35 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:35 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:34 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2005.codfw.wmnet with OS bullseye
  • 12:31 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2008.codfw.wmnet with OS bullseye
  • 12:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2003.codfw.wmnet with OS bullseye
  • 12:27 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2002.codfw.wmnet with OS bullseye
  • 12:24 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2001.codfw.wmnet with reason: host reimage
  • 12:22 klausman@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ml-serve2007.codfw.wmnet with reason: host reimage
  • 12:21 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2006.codfw.wmnet with reason: host reimage
  • 12:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2004.codfw.wmnet with reason: host reimage
  • 12:17 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2001.codfw.wmnet with reason: host reimage
  • 12:17 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2005.codfw.wmnet with reason: host reimage
  • 12:15 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2008.codfw.wmnet with reason: host reimage
  • 12:12 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2003.codfw.wmnet with reason: host reimage
  • 12:12 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2007.codfw.wmnet with reason: host reimage
  • 12:11 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2008.codfw.wmnet with reason: host reimage
  • 12:11 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2005.codfw.wmnet with reason: host reimage
  • 12:11 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2006.codfw.wmnet with reason: host reimage
  • 12:10 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2002.codfw.wmnet with reason: host reimage
  • 12:09 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2004.codfw.wmnet with reason: host reimage
  • 12:07 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2003.codfw.wmnet with reason: host reimage
  • 12:07 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2002.codfw.wmnet with reason: host reimage
  • 11:56 klausman@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve2005.codfw.wmnet with OS bullseye
  • 11:55 klausman@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve2006.codfw.wmnet with OS bullseye
  • 11:55 klausman@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve2007.codfw.wmnet with OS bullseye
  • 11:55 klausman@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve2008.codfw.wmnet with OS bullseye
  • 11:51 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve2004.codfw.wmnet with OS bullseye
  • 11:50 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve2003.codfw.wmnet with OS bullseye
  • 11:49 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve2002.codfw.wmnet with OS bullseye
  • 11:48 root@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve2001.codfw.wmnet with OS bullseye
  • 11:39 root@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ml-serve-ctrl2002.codfw.wmnet with OS bullseye
  • 11:24 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve-ctrl2002.codfw.wmnet with reason: host reimage
  • 11:21 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve-ctrl2002.codfw.wmnet with reason: host reimage
  • 11:21 marostegui: Install MariaDB 11.0.1 on db1106 T330643
  • 11:11 root@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-serve-ctrl2002.codfw.wmnet with OS bullseye
  • 11:11 root@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ml-serve-ctrl2001.codfw.wmnet with OS bullseye
  • 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling restart_daemons on A:ncredir
  • 10:55 jmm@cumin2002: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling restart_daemons on A:ncredir
  • 10:54 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve-ctrl2001.codfw.wmnet with reason: host reimage
  • 10:51 jmm@cumin2002: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling restart_daemons on A:ncredir-ulsfo
  • 10:51 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve-ctrl2001.codfw.wmnet with reason: host reimage
  • 10:50 jmm@cumin2002: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling restart_daemons on A:ncredir-ulsfo
  • 10:49 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: sync
  • 10:49 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: sync
  • 10:38 root@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-serve-ctrl2001.codfw.wmnet with OS bullseye
  • 10:33 root@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: Upgrade to k8s 1.23
  • 10:32 root@cumin1001: END (FAIL) - Cookbook sre.k8s.upgrade-cluster (exit_code=99) Upgrade K8s version: Upgrade to k8s 1.23
  • 10:32 root@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: Upgrade to k8s 1.23
  • 10:29 jnuche@deploy1002: Installation of scap version "latest" completed for 8 hosts
  • 10:29 jnuche@deploy1002: Installing scap version "latest" for 8 hosts
  • 10:26 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wdqs-all
  • 10:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 23951
  • 10:22 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 23951
  • 10:21 moritzm: installing apr-util security updates on buster
  • 10:19 jnuche@deploy1002: Installation of scap version "latest" completed for 1 hosts
  • 10:19 jnuche@deploy1002: Installing scap version "latest" for 1 hosts
  • 10:15 elukey@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host ml-etcd2002.codfw.wmnet with OS bullseye
  • 10:14 elukey@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host ml-etcd2003.codfw.wmnet with OS bullseye
  • 10:13 elukey@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host ml-etcd2001.codfw.wmnet with OS bullseye
  • 10:10 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ml-etcd2002.codfw.wmnet with reason: host reimage
  • 10:08 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ml-etcd2003.codfw.wmnet with reason: host reimage
  • 10:07 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ml-etcd2001.codfw.wmnet with reason: host reimage
  • 10:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-etcd2003.codfw.wmnet with reason: host reimage
  • 10:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-etcd2002.codfw.wmnet with reason: host reimage
  • 10:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-etcd2001.codfw.wmnet with reason: host reimage
  • 10:00 klausman@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=inference,name=codfw
  • 09:55 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-etcd2003.codfw.wmnet with OS bullseye
  • 09:55 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-etcd2002.codfw.wmnet with OS bullseye
  • 09:54 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-etcd2001.codfw.wmnet with OS bullseye
  • 09:50 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on 16 hosts with reason: etcd cluster upgrade failed, waiting for k8s upgrade
  • 09:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on 16 hosts with reason: etcd cluster upgrade failed, waiting for k8s upgrade
  • 09:46 zabe: zabe@mwmaint1002:~$ mwscript createAndPromote.php --wiki azwikimedia --bureaucrat Zabe REDACTED
  • 09:13 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wdqs-all
  • 09:13 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.25 refs T325588
  • 09:11 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wcqs-public
  • 09:09 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
  • 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry
  • 09:04 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry
  • 08:52 moritzm: restarting r/w slapd to pick up openssl security updates
  • 08:43 vgutierrez: enable system hardening for haproxy in ulsfo - T323944
  • 08:11 moritzm: installing openssl security updates on buster
  • 07:53 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:52 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 07:52 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:52 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 07:51 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:41 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 07:40 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:40 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 07:39 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:39 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 07:04 marostegui: Stop mysql on db2094 T326596
  • 07:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 1820
  • 07:01 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 1820
  • 07:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 56099
  • 07:01 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 56099
  • 07:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4621
  • 06:57 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4621
  • 06:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 18187
  • 06:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 18187
  • 06:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 23951
  • 06:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 23951
  • 06:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 1267
  • 06:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 1267
  • 06:51 marostegui@deploy1002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc1 master" (duration: 07m 46s)
  • 06:45 marostegui@deploy1002: marostegui: Backport for Revert "ProductionServices.php: Promote pc1014 to pc1 master" synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 06:43 marostegui@deploy1002: Started scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc1 master"
  • 06:41 marostegui@deploy1002: Finished scap: Backport for ProductionServices.php: Promote pc1014 to pc1 master (T330653) (duration: 07m 54s)
  • 06:35 marostegui@deploy1002: marostegui: Backport for ProductionServices.php: Promote pc1014 to pc1 master (T330653) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 06:33 marostegui@deploy1002: Started scap: Backport for ProductionServices.php: Promote pc1014 to pc1 master (T330653)
  • 04:57 mwpresync@deploy1002: Pruned MediaWiki: 1.40.0-wmf.23 (duration: 02m 18s)
  • 04:55 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.25 refs T325588 (duration: 53m 02s)
  • 04:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.25 refs T325588

2023-02-27

  • 23:54 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:51 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:32 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:26 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 23:25 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:19 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 23:19 zabe@deploy1002: Finished scap: Backport for Add `guc` and `gur` to InterwikiSortOrders (duration: 07m 41s)
  • 23:15 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:13 zabe@deploy1002: jhsoby and zabe: Backport for Add `guc` and `gur` to InterwikiSortOrders synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 23:12 zabe@deploy1002: Started scap: Backport for Add `guc` and `gur` to InterwikiSortOrders
  • 23:09 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 23:07 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:01 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 23:00 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 23:00 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 22:42 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2070']
  • 22:36 mutante: switching https://annual.wikimedia.org from eqiad to codfw T330090
  • 22:36 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2073']
  • 22:31 ryankemper: [apifeatureusage] T329957 Restarted `logstash` on `apifeatureusage[1-2]001`
  • 22:19 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:19 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 22:18 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2073']
  • 22:18 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2072']
  • 22:16 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:16 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 22:15 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:15 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:09 zabe@deploy1002: Finished scap: Backport for Remove vewikimedia from deleted wikis (T320890) (duration: 07m 30s)
  • 22:04 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2072']
  • 22:04 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 22:03 zabe@deploy1002: zabe: Backport for Remove vewikimedia from deleted wikis (T320890) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 22:02 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be2071']
  • 22:01 zabe@deploy1002: Started scap: Backport for Remove vewikimedia from deleted wikis (T320890)
  • 21:58 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 21:53 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2071']
  • 21:53 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2070']
  • 21:36 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2070']
  • 21:25 TheresNoTime: close UTC late backport window
  • 21:21 samtar@deploy1002: Finished scap: Backport for [extwiki] Change wordmark and tagline (T330588) (duration: 08m 14s)
  • 21:15 samtar@deploy1002: samtar and superpes: Backport for [extwiki] Change wordmark and tagline (T330588) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 21:13 samtar@deploy1002: Started scap: Backport for [extwiki] Change wordmark and tagline (T330588)
  • 21:12 samtar@deploy1002: Finished scap: Backport for [eswiki] Create new 'templateeditor' usergroup and protection level (T330470) (duration: 08m 42s)
  • 21:05 samtar@deploy1002: superpes and samtar: Backport for [eswiki] Create new 'templateeditor' usergroup and protection level (T330470) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 21:04 zabe: zabe@mwmaint1002:~$ mwscript createAndPromote.php --wiki vewikimedia --bureaucrat Zabe REDACTED
  • 21:03 samtar@deploy1002: Started scap: Backport for [eswiki] Create new 'templateeditor' usergroup and protection level (T330470)
  • 21:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2073.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:57 zabe@deploy1002: Finished scap: install Translate on vewikimedia and update interwiki cache T320890 (duration: 07m 26s)
  • 20:52 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2073.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2072.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:50 zabe@deploy1002: Started scap: install Translate on vewikimedia and update interwiki cache T320890
  • 20:50 zabe@deploy1002: sync-world aborted: install Translate on vewikimedia and update interwiki cache (duration: 00m 06s)
  • 20:50 zabe@deploy1002: Started scap: install Translate on vewikimedia and update interwiki cache
  • 20:49 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 20:48 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 20:48 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 20:48 zabe@deploy1002: Finished scap: create vewikimedia T320890 (duration: 07m 29s)
  • 20:42 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:42 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:40 zabe@deploy1002: Started scap: create vewikimedia T320890
  • 20:39 zabe: create Wikimedia Venezuela wiki # T320890
  • 20:38 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2072.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2071.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:30 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2071.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:29 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2071.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:24 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2071.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:17 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:17 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new ms-fe and thanos nodes - pt1979@cumin2002"
  • 20:16 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:11 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new ms-fe and thanos nodes - pt1979@cumin2002"
  • 20:11 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:10 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 20:07 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 20:06 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:05 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 20:01 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:55 urandom: power cycling restbase1026
  • 19:42 zabe@deploy1002: Finished scap: create gurwiki T327813 (duration: 07m 19s)
  • 19:38 samtar@deploy1002: Backport cancelled.
  • 19:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2070.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:35 zabe@deploy1002: Started scap: create gurwiki T327813
  • 19:34 zabe: create Wikipedia Farefare (Gurene) # T327813
  • 19:18 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 19:09 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2070.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:59 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 18:39 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:39 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new ms-be nodes - pt1979@cumin2002"
  • 18:37 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1004.eqiad.wmnet with OS bullseye
  • 18:37 dcaro@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - dcaro@cumin1001"
  • 18:31 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new ms-be nodes - pt1979@cumin2002"
  • 18:29 zabe: start running "foreachwikiindblist s3.dblist migrateRevisionCommentTemp.php --sleep 2" in screen # T275246
  • 18:29 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 18:24 dcaro@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - dcaro@cumin1001"
  • 18:09 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@c8dc6d5]: cirrus namespaces: Work arround missing domain_name in upstream (duration: 02m 29s)
  • 18:07 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@c8dc6d5]: cirrus namespaces: Work arround missing domain_name in upstream
  • 18:03 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 18:03 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 18:01 zabe@deploy1002: Synchronized wmf-config/interwiki.php: (no justification provided) (duration: 06m 54s)
  • 17:38 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host dse-k8s-worker1008.eqiad.wmnet with OS bullseye
  • 17:38 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 17:38 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 17:37 zabe@deploy1002: Finished scap: create gucwiki T321880 (duration: 11m 05s)
  • 17:37 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 17:37 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 17:37 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 17:36 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 17:36 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 17:36 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 17:36 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 17:36 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 17:35 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 17:35 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 17:35 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 17:33 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 17:29 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 17:29 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 17:28 zabe@deploy1002: zabe: create gucwiki T321880 synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 17:26 zabe@deploy1002: Started scap: create gucwiki T321880
  • 17:22 zabe: create Wikipedia Wayuu # T321880
  • 17:12 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1004.eqiad.wmnet with reason: host reimage
  • 17:09 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1004.eqiad.wmnet with reason: host reimage
  • 16:54 dcaro@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1004.eqiad.wmnet with OS bullseye
  • 16:54 dcaro@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1004.eqiad.wmnet with OS bullseye
  • 16:47 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1005.eqiad.wmnet with OS bullseye
  • 16:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2022.codfw.wmnet with OS bullseye
  • 16:46 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2015.codfw.wmnet with OS bullseye
  • 16:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:32 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:31 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1005.eqiad.wmnet with reason: host reimage
  • 16:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1005.eqiad.wmnet with reason: host reimage
  • 16:25 jgleeson: payments-wiki updated from c13b8d26 to 871c4e5c
  • 16:25 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 16:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2022.codfw.wmnet with reason: host reimage
  • 16:13 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2022.codfw.wmnet with reason: host reimage
  • 16:08 dcaro@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1004.eqiad.wmnet with OS bullseye
  • 16:08 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1008.eqiad.wmnet with reason: host reimage
  • 16:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2015.codfw.wmnet with reason: host reimage
  • 16:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1008.eqiad.wmnet with reason: host reimage
  • 16:02 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2015.codfw.wmnet with reason: host reimage
  • 16:02 hashar@deploy1002: Finished deploy [integration/docroot@cd7c263]: build: Pin PHPUnit to 9.5.28 like in other repos (duration: 00m 12s)
  • 16:02 hashar@deploy1002: Started deploy [integration/docroot@cd7c263]: build: Pin PHPUnit to 9.5.28 like in other repos
  • 15:58 root@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudcephosd1004']
  • 15:56 elukey@cumin1001: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host ml-etcd2001.codfw.wmnet with OS bullseye
  • 15:52 root@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1004']
  • 15:52 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2022.codfw.wmnet with OS bullseye
  • 15:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ml-etcd2001.codfw.wmnet with reason: etcd cluster upgrade failed, waiting for k8s upgrade
  • 15:52 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ml-etcd2001.codfw.wmnet with reason: etcd cluster upgrade failed, waiting for k8s upgrade
  • 15:48 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 15:48 root@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1004']
  • 15:44 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1008.eqiad.wmnet with OS bullseye
  • 15:43 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1005.eqiad.wmnet with OS bullseye
  • 15:41 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2015.codfw.wmnet with OS bullseye
  • 15:41 root@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1004']
  • 15:41 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 15:41 dcaro@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - dcaro@cumin1001"
  • 15:40 urbanecm@deploy1002: Finished scap: Backport for cswiki: Grant changetags only to bots/sysops (T330383) (duration: 07m 39s)
  • 15:36 dcaro@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - dcaro@cumin1001"
  • 15:35 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host dse-k8s-worker1007.eqiad.wmnet with OS bullseye
  • 15:34 urbanecm@deploy1002: urbanecm: Backport for cswiki: Grant changetags only to bots/sysops (T330383) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44888 and previous config saved to /var/cache/conftool/dbconfig/20230227-153324-root.json
  • 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44887 and previous config saved to /var/cache/conftool/dbconfig/20230227-153318-root.json
  • 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44886 and previous config saved to /var/cache/conftool/dbconfig/20230227-153313-root.json
  • 15:32 urbanecm@deploy1002: Started scap: Backport for cswiki: Grant changetags only to bots/sysops (T330383)
  • 15:24 cgoubert@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:24 cgoubert@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:21 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1003.eqiad.wmnet with reason: host reimage
  • 15:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44884 and previous config saved to /var/cache/conftool/dbconfig/20230227-151836-root.json
  • 15:18 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1003.eqiad.wmnet with reason: host reimage
  • 15:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44883 and previous config saved to /var/cache/conftool/dbconfig/20230227-151826-root.json
  • 15:18 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44882 and previous config saved to /var/cache/conftool/dbconfig/20230227-151819-root.json
  • 15:18 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44881 and previous config saved to /var/cache/conftool/dbconfig/20230227-151813-root.json
  • 15:18 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44880 and previous config saved to /var/cache/conftool/dbconfig/20230227-151808-root.json
  • 15:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44878 and previous config saved to /var/cache/conftool/dbconfig/20230227-151434-root.json
  • 15:13 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:13 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:12 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:12 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:11 inflatador: bking@deploy1002 applying https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/891577 on dse-k8s-cluster via helmfile
  • 15:11 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1007.eqiad.wmnet with reason: host reimage
  • 15:08 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1007.eqiad.wmnet with reason: host reimage
  • 15:06 dcaro@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 15:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44877 and previous config saved to /var/cache/conftool/dbconfig/20230227-150535-root.json
  • 15:04 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 15:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44876 and previous config saved to /var/cache/conftool/dbconfig/20230227-150331-root.json
  • 15:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44875 and previous config saved to /var/cache/conftool/dbconfig/20230227-150322-root.json
  • 15:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44874 and previous config saved to /var/cache/conftool/dbconfig/20230227-150315-root.json
  • 15:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44873 and previous config saved to /var/cache/conftool/dbconfig/20230227-150309-root.json
  • 15:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44872 and previous config saved to /var/cache/conftool/dbconfig/20230227-150304-root.json
  • 15:01 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-etcd2001.codfw.wmnet with reason: host reimage
  • 14:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44871 and previous config saved to /var/cache/conftool/dbconfig/20230227-145929-root.json
  • 14:56 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-etcd2001.codfw.wmnet with reason: host reimage
  • 14:54 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1007.eqiad.wmnet with OS bullseye
  • 14:52 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 14:52 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44870 and previous config saved to /var/cache/conftool/dbconfig/20230227-145030-root.json
  • 14:49 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye
  • 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44869 and previous config saved to /var/cache/conftool/dbconfig/20230227-144826-root.json
  • 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44868 and previous config saved to /var/cache/conftool/dbconfig/20230227-144816-root.json
  • 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44867 and previous config saved to /var/cache/conftool/dbconfig/20230227-144810-root.json
  • 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44866 and previous config saved to /var/cache/conftool/dbconfig/20230227-144804-root.json
  • 14:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44865 and previous config saved to /var/cache/conftool/dbconfig/20230227-144759-root.json
  • 14:45 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-etcd2001.codfw.wmnet with OS bullseye
  • 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44864 and previous config saved to /var/cache/conftool/dbconfig/20230227-144424-root.json
  • 14:35 claime: done live testing sre.switchdc.mediawiki.03-set-db-readonly and sre.switchdc.mediawiki.06-set-db-readwrite back to back - T330302
  • 14:35 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44863 and previous config saved to /var/cache/conftool/dbconfig/20230227-143525-root.json
  • 14:35 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
  • 14:35 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
  • 14:34 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
  • 14:34 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
  • 14:34 claime: live testing sre.switchdc.mediawiki.03-set-db-readonly and sre.switchdc.mediawiki.06-set-db-readwrite back to back - T330302
  • 14:33 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 14:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44862 and previous config saved to /var/cache/conftool/dbconfig/20230227-143321-root.json
  • 14:33 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on idm2001.wikimedia.org with reason: host still been configuered - T320797
  • 14:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44861 and previous config saved to /var/cache/conftool/dbconfig/20230227-143311-root.json
  • 14:33 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on idm2001.wikimedia.org with reason: host still been configuered - T320797
  • 14:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44860 and previous config saved to /var/cache/conftool/dbconfig/20230227-143305-root.json
  • 14:33 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on idm2001.wikimedia.org with reason: host still been configuered - T320797
  • 14:33 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on idm2001.wikimedia.org with reason: host still been configuered - T320797
  • 14:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44859 and previous config saved to /var/cache/conftool/dbconfig/20230227-143259-root.json
  • 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44858 and previous config saved to /var/cache/conftool/dbconfig/20230227-143254-root.json
  • 14:32 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1006.eqiad.wmnet with reason: host reimage
  • 14:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44857 and previous config saved to /var/cache/conftool/dbconfig/20230227-142919-root.json
  • 14:28 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1006.eqiad.wmnet with reason: host reimage
  • 14:22 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44855 and previous config saved to /var/cache/conftool/dbconfig/20230227-142020-root.json
  • 14:18 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 14:18 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44854 and previous config saved to /var/cache/conftool/dbconfig/20230227-141815-root.json
  • 14:18 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 14:18 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44853 and previous config saved to /var/cache/conftool/dbconfig/20230227-141806-root.json
  • 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44852 and previous config saved to /var/cache/conftool/dbconfig/20230227-141800-root.json
  • 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44851 and previous config saved to /var/cache/conftool/dbconfig/20230227-141754-root.json
  • 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44850 and previous config saved to /var/cache/conftool/dbconfig/20230227-141749-root.json
  • 14:17 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 14:17 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 14:14 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye
  • 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44849 and previous config saved to /var/cache/conftool/dbconfig/20230227-141415-root.json
  • 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44848 and previous config saved to /var/cache/conftool/dbconfig/20230227-141130-root.json
  • 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44847 and previous config saved to /var/cache/conftool/dbconfig/20230227-141120-root.json
  • 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44846 and previous config saved to /var/cache/conftool/dbconfig/20230227-140811-root.json
  • 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44845 and previous config saved to /var/cache/conftool/dbconfig/20230227-140707-root.json
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44844 and previous config saved to /var/cache/conftool/dbconfig/20230227-140527-root.json
  • 14:05 ladsgroup@deploy1002: Synchronized php-1.40.0-wmf.24/extensions/MobileFrontend/includes/MobileContext.php: Completely get rid of responsiveimages removal, part III (T326147) (duration: 07m 36s)
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44843 and previous config saved to /var/cache/conftool/dbconfig/20230227-140523-root.json
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44842 and previous config saved to /var/cache/conftool/dbconfig/20230227-140515-root.json
  • 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44841 and previous config saved to /var/cache/conftool/dbconfig/20230227-140310-root.json
  • 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44840 and previous config saved to /var/cache/conftool/dbconfig/20230227-140301-root.json
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P44839 and previous config saved to /var/cache/conftool/dbconfig/20230227-140255-root.json
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P44838 and previous config saved to /var/cache/conftool/dbconfig/20230227-140249-root.json
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P44837 and previous config saved to /var/cache/conftool/dbconfig/20230227-140244-root.json
  • 13:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44836 and previous config saved to /var/cache/conftool/dbconfig/20230227-135910-root.json
  • 13:59 ladsgroup@deploy1002: ladsgroup: Completely get rid of responsiveimages removal, part III (T326147) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2178 db2146 db2180 T330653', diff saved to https://phabricator.wikimedia.org/P44835 and previous config saved to /var/cache/conftool/dbconfig/20230227-135856-root.json
  • 13:58 moritzm: restarting apache on mw canaries to pick up apr-util updates
  • 13:56 ladsgroup@deploy1002: Synchronized php-1.40.0-wmf.24/extensions/MobileFrontend/includes/MobileFrontendHooks.php: Completely get rid of responsiveimages removal, part II (T326147) (duration: 07m 24s)
  • 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44834 and previous config saved to /var/cache/conftool/dbconfig/20230227-135625-root.json
  • 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44833 and previous config saved to /var/cache/conftool/dbconfig/20230227-135615-root.json
  • 13:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44832 and previous config saved to /var/cache/conftool/dbconfig/20230227-135306-root.json
  • 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44831 and previous config saved to /var/cache/conftool/dbconfig/20230227-135225-root.json
  • 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44830 and previous config saved to /var/cache/conftool/dbconfig/20230227-135202-root.json
  • 13:50 ladsgroup@deploy1002: ladsgroup: Completely get rid of responsiveimages removal, part II (T326147) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44828 and previous config saved to /var/cache/conftool/dbconfig/20230227-135023-root.json
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44827 and previous config saved to /var/cache/conftool/dbconfig/20230227-135018-root.json
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44826 and previous config saved to /var/cache/conftool/dbconfig/20230227-135011-root.json
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44825 and previous config saved to /var/cache/conftool/dbconfig/20230227-135010-root.json
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P44824 and previous config saved to /var/cache/conftool/dbconfig/20230227-134756-root.json
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P44823 and previous config saved to /var/cache/conftool/dbconfig/20230227-134753-root.json
  • 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P44821 and previous config saved to /var/cache/conftool/dbconfig/20230227-134405-root.json
  • 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44819 and previous config saved to /var/cache/conftool/dbconfig/20230227-134120-root.json
  • 13:41 ladsgroup@deploy1002: Synchronized php-1.40.0-wmf.24/extensions/MobileFrontend/extension.json: Completely get rid of responsiveimages removal, part I (T326147) (duration: 10m 48s)
  • 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44818 and previous config saved to /var/cache/conftool/dbconfig/20230227-134110-root.json
  • 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2122 T330653', diff saved to https://phabricator.wikimedia.org/P44817 and previous config saved to /var/cache/conftool/dbconfig/20230227-134018-root.json
  • 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44815 and previous config saved to /var/cache/conftool/dbconfig/20230227-133801-root.json
  • 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44814 and previous config saved to /var/cache/conftool/dbconfig/20230227-133720-root.json
  • 13:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44813 and previous config saved to /var/cache/conftool/dbconfig/20230227-133657-root.json
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44811 and previous config saved to /var/cache/conftool/dbconfig/20230227-133518-root.json
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44810 and previous config saved to /var/cache/conftool/dbconfig/20230227-133513-root.json
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44809 and previous config saved to /var/cache/conftool/dbconfig/20230227-133506-root.json
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P44808 and previous config saved to /var/cache/conftool/dbconfig/20230227-133506-root.json
  • 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2175 T330653', diff saved to https://phabricator.wikimedia.org/P44805 and previous config saved to /var/cache/conftool/dbconfig/20230227-133231-root.json
  • 13:32 ladsgroup@deploy1002: ladsgroup: Completely get rid of responsiveimages removal, part I (T326147) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:30 ladsgroup@deploy1002: sync-file aborted: Completely get rid of responsiveimages removal, part I (T308932) (duration: 44m 38s)
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44800 and previous config saved to /var/cache/conftool/dbconfig/20230227-132615-root.json
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44799 and previous config saved to /var/cache/conftool/dbconfig/20230227-132605-root.json
  • 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44798 and previous config saved to /var/cache/conftool/dbconfig/20230227-132257-root.json
  • 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44797 and previous config saved to /var/cache/conftool/dbconfig/20230227-132215-root.json
  • 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44796 and previous config saved to /var/cache/conftool/dbconfig/20230227-132151-root.json
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44795 and previous config saved to /var/cache/conftool/dbconfig/20230227-132013-root.json
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44794 and previous config saved to /var/cache/conftool/dbconfig/20230227-132008-root.json
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44793 and previous config saved to /var/cache/conftool/dbconfig/20230227-132002-root.json
  • 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44791 and previous config saved to /var/cache/conftool/dbconfig/20230227-131100-root.json
  • 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44790 and previous config saved to /var/cache/conftool/dbconfig/20230227-130752-root.json
  • 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44789 and previous config saved to /var/cache/conftool/dbconfig/20230227-130711-root.json
  • 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44788 and previous config saved to /var/cache/conftool/dbconfig/20230227-130646-root.json
  • 13:05 moritzm: installing openssl security updates on Buster
  • 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44787 and previous config saved to /var/cache/conftool/dbconfig/20230227-130508-root.json
  • 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44786 and previous config saved to /var/cache/conftool/dbconfig/20230227-130503-root.json
  • 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44785 and previous config saved to /var/cache/conftool/dbconfig/20230227-130457-root.json
  • 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44784 and previous config saved to /var/cache/conftool/dbconfig/20230227-125605-root.json
  • 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44783 and previous config saved to /var/cache/conftool/dbconfig/20230227-125555-root.json
  • 12:55 ladsgroup@deploy1002: ladsgroup: Completely get rid of responsiveimages removal, part I (T308932) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44782 and previous config saved to /var/cache/conftool/dbconfig/20230227-125247-root.json
  • 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44781 and previous config saved to /var/cache/conftool/dbconfig/20230227-125206-root.json
  • 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44780 and previous config saved to /var/cache/conftool/dbconfig/20230227-125141-root.json
  • 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44779 and previous config saved to /var/cache/conftool/dbconfig/20230227-125003-root.json
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44778 and previous config saved to /var/cache/conftool/dbconfig/20230227-124959-root.json
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44777 and previous config saved to /var/cache/conftool/dbconfig/20230227-124952-root.json
  • 12:43 dcaro@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P44776 and previous config saved to /var/cache/conftool/dbconfig/20230227-124100-root.json
  • 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P44775 and previous config saved to /var/cache/conftool/dbconfig/20230227-124050-root.json
  • 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1022 es2022 T330653', diff saved to https://phabricator.wikimedia.org/P44774 and previous config saved to /var/cache/conftool/dbconfig/20230227-123814-root.json
  • 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P44773 and previous config saved to /var/cache/conftool/dbconfig/20230227-123742-root.json
  • 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44772 and previous config saved to /var/cache/conftool/dbconfig/20230227-123701-root.json
  • 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 3%: Repooling', diff saved to https://phabricator.wikimedia.org/P44771 and previous config saved to /var/cache/conftool/dbconfig/20230227-123636-root.json
  • 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1137 T330653', diff saved to https://phabricator.wikimedia.org/P44770 and previous config saved to /var/cache/conftool/dbconfig/20230227-123514-root.json
  • 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P44769 and previous config saved to /var/cache/conftool/dbconfig/20230227-123459-root.json
  • 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P44768 and previous config saved to /var/cache/conftool/dbconfig/20230227-123454-root.json
  • 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44767 and previous config saved to /var/cache/conftool/dbconfig/20230227-123447-root.json
  • 12:34 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye
  • 12:31 dcaro@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1200 db1111 db1168 db1143 T330653', diff saved to https://phabricator.wikimedia.org/P44766 and previous config saved to /var/cache/conftool/dbconfig/20230227-122804-root.json
  • 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P44765 and previous config saved to /var/cache/conftool/dbconfig/20230227-122131-root.json
  • 12:21 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye
  • 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1182', diff saved to https://phabricator.wikimedia.org/P44764 and previous config saved to /var/cache/conftool/dbconfig/20230227-121846-root.json
  • 12:12 moritzm: installing apr-util security updates on buster
  • 12:10 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-airflow1005.eqiad.wmnet with reason: host still been configuered - T327970
  • 12:10 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-airflow1005.eqiad.wmnet with reason: host still been configuered - T327970
  • 12:04 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye
  • 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1127', diff saved to https://phabricator.wikimedia.org/P44763 and previous config saved to /var/cache/conftool/dbconfig/20230227-120002-root.json
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44762 and previous config saved to /var/cache/conftool/dbconfig/20230227-115947-root.json
  • 11:53 dcaro@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 11:51 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye
  • 11:49 vgutierrez: set "X-Content-Type-Options: nosniff" on upload.wm.o requests - T309787
  • 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 3%: Repooling', diff saved to https://phabricator.wikimedia.org/P44761 and previous config saved to /var/cache/conftool/dbconfig/20230227-114442-root.json
  • 11:43 hnowlan@deploy1002: Finished deploy [restbase/deploy@bcb0a69]: Add azwikimedia T317120 (duration: 01m 25s)
  • 11:42 hnowlan@deploy1002: Started deploy [restbase/deploy@bcb0a69]: Add azwikimedia T317120
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1127', diff saved to https://phabricator.wikimedia.org/P44760 and previous config saved to /var/cache/conftool/dbconfig/20230227-113130-root.json
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P44759 and previous config saved to /var/cache/conftool/dbconfig/20230227-112937-root.json
  • 11:29 apergos: rsync public (huge!) xmldatadumps dir from dumpsdata1003 to dumpsdata1004; running from ariel screen session on dumpsdata1003, no bandwidth cap
  • 11:28 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ms-fe2013.codfw.wmnet with reason: testing redfish T326848
  • 11:28 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on ms-fe2013.codfw.wmnet with reason: testing redfish T326848
  • 11:20 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye
  • 11:10 apergos: rsync private xmldatadumps dir from dumpsdata1003 to dumpsdata1004; running from ariel screen session on dumpsdata1003, no bandwidth cap
  • 11:08 dcaro@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 11:07 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye
  • 11:05 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye
  • 11:04 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1003']
  • 10:59 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudcephosd1003']
  • 10:54 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1003']
  • 10:48 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:48 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 10:43 root@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1003']
  • 10:39 root@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1003']
  • 10:32 marostegui: Restart eqiad sanitarium hosts T330502
  • 10:31 root@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1003']
  • 10:26 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye
  • 10:26 marostegui: Restart codfw sanitarium hosts T330502
  • 10:23 root@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1003']
  • 10:23 dcaro@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 10:22 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-optional-warmup-caches (exit_code=0)
  • 10:19 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-optional-warmup-caches
  • 10:19 claime: live testing cache warmup cookbook
  • 10:18 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab1004.wikimedia.org with reason: Running failover to gitlab2002- T329931
  • 10:17 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab1004.wikimedia.org with reason: Running failover to gitlab2002- T329931
  • 10:08 marostegui: Enable replication codfw -> eqiad on s4 T330619
  • 09:47 dcaro@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 09:46 dcaro@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 09:46 dcaro@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 09:44 marostegui: Enable replication codfw -> eqiad on s1 T330619
  • 09:39 marostegui: Enable replication codfw -> eqiad on s5 T330619
  • 09:36 dcaro@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 09:36 dcaro@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 09:34 marostegui: Enable replication codfw -> eqiad on s6 T330619
  • 09:32 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye
  • 09:27 marostegui: Enable replication codfw -> eqiad on s7 T330619
  • 09:26 marostegui: Enable replication codfw -> eqiad on s8 T330619
  • 09:20 hashar: Restarting CI Jenkins T330045
  • 09:19 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye
  • 09:16 hashar@deploy1002: Finished deploy [releng/jenkins-deploy@0e465ac] (releasing): (no justification provided) (duration: 00m 46s)
  • 09:15 hashar@deploy1002: Started deploy [releng/jenkins-deploy@0e465ac] (releasing): (no justification provided)
  • 09:12 marostegui: Enable replication codfw -> eqiad on s3 T330619
  • 08:56 moritzm: updating mw/codfw to PHP 1:7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2 T330270
  • 08:54 vgutierrez: test haproxy hardening in cp4045 - T323944
  • 08:51 marostegui: Disable GTID on es% x1 and s% on codfw masters T330619
  • 08:51 marostegui: Enable replication codfw -> eqiad on s2 T330619
  • 08:32 marostegui: Enable replication codfw -> eqiad on es4 and es5 T330619
  • 07:57 marostegui: Enable replication codfw -> eqiad on x1 T330619
  • 07:42 marostegui: Enable replication codfw -> eqiad on pcX T330619

2023-02-26

  • 02:07 Amir1: foreachwikiindblist s5 maintenance/migrateExternallinks.php --batch-size=100 --sleep 1 (T326314)

2023-02-25

  • 15:30 apergos: resized lvm and filesystem for /data on dumpsdata1004,5,7; was <100G, now is 38T usable (left some room for growth later)
  • 11:03 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on dse-k8s-worker[1001-1004,1007].eqiad.wmnet with reason: Downtime DSE workers for cluster upgrade
  • 11:02 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on dse-k8s-worker[1001-1004,1007].eqiad.wmnet with reason: Downtime DSE workers for cluster upgrade
  • 09:38 elukey: delete knative pods on ml-serve-codfw to clear latency alerts

2023-02-24

  • 23:15 mutante: people2002 - for each user who has a public_html dir that is not empty (for pubdir in $(find . -name public_html -type d -not -empty); ..); rsync it from people1003 with --delete (rsync -avp rsync://people1003.eqiad.wmnet/people-home/${pubdiruser}/public_html/ /home/${pubdiruser}/public_html/); T330091
  • 22:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2013.codfw.wmnet with OS bullseye
  • 22:49 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:40 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe2013.codfw.wmnet with reason: host reimage
  • 22:18 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2013.codfw.wmnet with reason: host reimage
  • 21:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2013.codfw.wmnet with OS bullseye
  • 21:26 mutante: people2002 - performing the usual dance when device names changed after editing virtual hardware (s/ens13/ens14 in /etc/network/interfaces ... reboot)
  • 21:19 mutante: rebooting people2002
  • 21:17 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns3002.wikimedia.org with OS bullseye
  • 21:06 mutante: ganeti2021 - adding a virtual 20G disk to people2002 - to temp get some space for backups and syncing T330091
  • 20:59 fab@deploy1002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 10s)
  • 20:59 fab@deploy1002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
  • 20:55 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns3002.wikimedia.org with reason: host reimage
  • 20:52 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns3002.wikimedia.org with reason: host reimage
  • 20:46 fab@deploy1002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 19s)
  • 20:45 fab@deploy1002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
  • 20:33 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-fe2014']
  • 20:33 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['thanos-fe2004']
  • 20:33 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-fe2013']
  • 20:32 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns3002.wikimedia.org with OS bullseye
  • 20:11 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns3002.wikimedia.org with OS bullseye
  • 20:06 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 19:36 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 19:36 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe2004']
  • 19:35 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe2014']
  • 19:33 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe2013']
  • 19:32 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['thanos-fe2004']
  • 19:29 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-fe2014']
  • 19:28 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-fe2013']
  • 19:21 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe2004']
  • 19:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-fe2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:18 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe2014']
  • 19:18 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns3002.wikimedia.org with OS bullseye
  • 19:15 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe2013']
  • 19:14 pt1979@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['ms-fe2013']
  • 19:14 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe2013']
  • 19:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe2014.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe2013.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:04 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host thanos-fe2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:02 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-fe2014.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:00 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns6001.wikimedia.org with OS bullseye
  • 18:56 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-fe2013.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:55 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:55 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new ms-fe and thanos nodes - pt1979@cumin2002"
  • 18:54 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new ms-fe and thanos nodes - pt1979@cumin2002"
  • 18:51 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 18:38 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs2022.codfw.wmnet with OS bullseye
  • 18:37 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns6001.wikimedia.org with reason: host reimage
  • 18:34 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns6001.wikimedia.org with reason: host reimage
  • 18:32 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2022.codfw.wmnet with OS bullseye
  • 18:21 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 18:19 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 18:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns6001.wikimedia.org with OS bullseye
  • 18:13 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 18:10 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 18:09 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 18:09 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 18:00 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1004.eqiad.wmnet with OS bullseye
  • 17:12 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1004.eqiad.wmnet with OS bullseye
  • 17:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 17:09 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 17:08 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 17:08 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 17:03 fab@deploy1002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 10s)
  • 17:03 fab@deploy1002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
  • 16:57 fab@deploy1002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 26s)
  • 16:57 fab@deploy1002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
  • 16:45 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1003.eqiad.wmnet with OS buster
  • 15:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2020.codfw.wmnet with OS bullseye
  • 15:22 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:21 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new url downloaders - jmm@cumin2002 - T329945"
  • 15:11 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new url downloaders - jmm@cumin2002 - T329945"
  • 15:11 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:55 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2020.codfw.wmnet with reason: host reimage
  • 14:52 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2020.codfw.wmnet with reason: host reimage
  • 14:50 elukey@cumin1001: END (PASS) - Cookbook sre.k8s.upgrade-cluster (exit_code=0) Upgrade K8s version: Upgrade to k8s 1.23
  • 14:50 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1007.eqiad.wmnet with OS bullseye
  • 14:50 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on dse-k8s-worker1007.eqiad.wmnet with reason: Cluster half broken, in the middle of upgrading
  • 14:50 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on dse-k8s-worker1007.eqiad.wmnet with reason: Cluster half broken, in the middle of upgrading
  • 14:50 jbond@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['wdqs2013']
  • 14:49 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2013']
  • 14:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1007.eqiad.wmnet with reason: host reimage
  • 14:36 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1008.eqiad.wmnet with OS bullseye
  • 14:35 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1007.eqiad.wmnet with reason: host reimage
  • 14:31 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2020.codfw.wmnet with OS bullseye
  • 14:31 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye
  • 14:23 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1005.eqiad.wmnet with OS bullseye
  • 14:23 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1008.eqiad.wmnet with OS bullseye
  • 14:22 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1007.eqiad.wmnet with OS bullseye
  • 14:17 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye
  • 14:10 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1005.eqiad.wmnet with OS bullseye
  • 12:26 dcaro@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 11:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host urldownloader2004.wikimedia.org with OS bullseye
  • 11:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on urldownloader2004.wikimedia.org with reason: host reimage
  • 11:41 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on urldownloader2004.wikimedia.org with reason: host reimage
  • 11:37 dcaro@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 11:13 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host urldownloader2004.wikimedia.org with OS bullseye
  • 11:02 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1007.eqiad.wmnet with OS bullseye
  • 10:59 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1008.eqiad.wmnet with OS bullseye
  • 10:59 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye
  • 10:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host urldownloader2003.wikimedia.org with OS bullseye
  • 10:52 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1005.eqiad.wmnet with OS bullseye
  • 10:46 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:46 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:45 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:45 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on urldownloader2003.wikimedia.org with reason: host reimage
  • 10:44 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:44 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:41 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on urldownloader2003.wikimedia.org with reason: host reimage
  • 10:40 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:40 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:35 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:35 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:35 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:35 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:35 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:35 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:32 moritzm: installing emacs security updates on bullseye
  • 10:32 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:32 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:31 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:31 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:31 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:31 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:29 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1008.eqiad.wmnet with OS bullseye
  • 10:13 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host urldownloader2003.wikimedia.org with OS bullseye
  • 10:12 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1004.eqiad.wmnet with OS bullseye
  • 10:10 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1002.eqiad.wmnet with OS bullseye
  • 10:07 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1003.eqiad.wmnet with OS bullseye
  • 10:06 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1001.eqiad.wmnet with OS bullseye
  • 09:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1004.eqiad.wmnet with reason: host reimage
  • 09:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1002.eqiad.wmnet with reason: host reimage
  • 09:37 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1004.eqiad.wmnet with reason: host reimage
  • 09:37 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1003.eqiad.wmnet with reason: host reimage
  • 09:37 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on dse-k8s-worker1001.eqiad.wmnet with reason: host reimage
  • 09:35 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1002.eqiad.wmnet with reason: host reimage
  • 09:34 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1007.eqiad.wmnet with reason: host reimage
  • 09:33 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1003.eqiad.wmnet with reason: host reimage
  • 09:32 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1001.eqiad.wmnet with reason: host reimage
  • 09:31 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1006.eqiad.wmnet with reason: host reimage
  • 09:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1005.eqiad.wmnet with reason: host reimage
  • 09:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1007.eqiad.wmnet with reason: host reimage
  • 09:27 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1008.eqiad.wmnet with OS bullseye
  • 09:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1006.eqiad.wmnet with reason: host reimage
  • 09:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1005.eqiad.wmnet with reason: host reimage
  • 09:13 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1008.eqiad.wmnet with OS bullseye
  • 09:13 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1007.eqiad.wmnet with OS bullseye
  • 09:13 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1006.eqiad.wmnet with OS bullseye
  • 09:12 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1005.eqiad.wmnet with OS bullseye
  • 09:11 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1004.eqiad.wmnet with OS bullseye
  • 09:11 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1003.eqiad.wmnet with OS bullseye
  • 09:10 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1002.eqiad.wmnet with OS bullseye
  • 09:09 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1001.eqiad.wmnet with OS bullseye
  • 09:08 elukey: rm /var/log/{syslog,messages,user.log}.1 on kubetcd1005 to free up space - T329717
  • 09:08 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host dse-k8s-ctrl1002.eqiad.wmnet with OS bullseye
  • 08:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-ctrl1002.eqiad.wmnet with reason: host reimage
  • 08:51 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-ctrl1002.eqiad.wmnet with reason: host reimage
  • 08:40 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host dse-k8s-ctrl1002.eqiad.wmnet with OS bullseye
  • 08:37 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host dse-k8s-ctrl1001.eqiad.wmnet with OS bullseye
  • 08:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-ctrl1001.eqiad.wmnet with reason: host reimage
  • 08:21 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-ctrl1001.eqiad.wmnet with reason: host reimage
  • 08:10 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host dse-k8s-ctrl1001.eqiad.wmnet with OS bullseye
  • 08:06 elukey@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: Upgrade to k8s 1.23
  • 08:00 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on 8 hosts with reason: Downtime DSE workers for cluster upgrade
  • 07:59 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on 8 hosts with reason: Downtime DSE workers for cluster upgrade
  • 07:52 elukey: rm /var/log/{syslog,messages,user.log}.1 on kubetcd1006 to free up space - T329717
  • 03:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2021.codfw.wmnet with OS bullseye
  • 03:49 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 03:47 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 03:47 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2020.codfw.wmnet with OS bullseye
  • 03:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2021.codfw.wmnet with reason: host reimage
  • 03:28 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2021.codfw.wmnet with reason: host reimage
  • 03:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2019.codfw.wmnet with OS bullseye
  • 03:13 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 03:12 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 03:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2018.codfw.wmnet with OS bullseye
  • 03:12 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 03:08 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2021.codfw.wmnet with OS bullseye
  • 03:04 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:58 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2020.codfw.wmnet with OS bullseye
  • 02:58 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs2020.codfw.wmnet with OS bullseye
  • 02:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2019.codfw.wmnet with reason: host reimage
  • 02:53 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2019.codfw.wmnet with reason: host reimage
  • 02:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2018.codfw.wmnet with reason: host reimage
  • 02:45 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2018.codfw.wmnet with reason: host reimage
  • 02:41 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2020.codfw.wmnet with OS bullseye
  • 02:18 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2018.codfw.wmnet with OS bullseye
  • 02:18 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs2018.codfw.wmnet with OS bullseye
  • 02:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2019.codfw.wmnet with OS bullseye
  • 02:03 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2015.codfw.wmnet with OS bullseye
  • 02:02 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2018.codfw.wmnet with OS bullseye
  • 01:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2017.codfw.wmnet with OS bullseye
  • 01:56 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2014.codfw.wmnet with OS bullseye
  • 01:53 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2016.codfw.wmnet with OS bullseye
  • 01:53 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:53 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:51 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs2015
  • 01:51 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs2015
  • 01:51 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs2015
  • 01:50 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs2015
  • 01:49 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:49 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs2015
  • 01:49 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs2015
  • 01:48 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs2015
  • 01:48 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs2015
  • 01:43 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2017.codfw.wmnet with reason: host reimage
  • 01:34 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2017.codfw.wmnet with reason: host reimage
  • 01:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2016.codfw.wmnet with reason: host reimage
  • 01:30 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2016.codfw.wmnet with reason: host reimage
  • 01:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2014.codfw.wmnet with reason: host reimage
  • 01:24 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2014.codfw.wmnet with reason: host reimage
  • 01:13 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2017.codfw.wmnet with OS bullseye
  • 01:10 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2016.codfw.wmnet with OS bullseye
  • 01:06 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2015.codfw.wmnet with OS bullseye
  • 01:04 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2014.codfw.wmnet with OS bullseye
  • 00:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2013.codfw.wmnet with OS bullseye
  • 00:47 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:13 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: apply JRE updates - bking@cumin1001 - T329957

2023-02-23

  • 23:27 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:26 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:25 zabe: mwscript namespaceDupes.php shnwiktionary --fix # T330456
  • 23:19 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns6002.wikimedia.org with OS bullseye
  • 22:55 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns6002.wikimedia.org with reason: host reimage
  • 22:52 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns6002.wikimedia.org with reason: host reimage
  • 22:34 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: apply JRE updates - bking@cumin1001 - T329957
  • 22:32 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns6002.wikimedia.org with OS bullseye
  • 22:29 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:17 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns5004.wikimedia.org with OS bullseye
  • 22:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2013.codfw.wmnet with reason: host reimage
  • 22:11 zabe@deploy1002: Synchronized wmf-config/interwiki.php: gerrit:891395 (duration: 07m 11s)
  • 22:10 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2013.codfw.wmnet with reason: host reimage
  • 21:56 zabe@deploy1002: Synchronized wmf-config/InitialiseSettings.php: T306015 (duration: 06m 49s)
  • 21:48 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
  • 21:46 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:46 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 21:45 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:45 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
  • 21:45 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 21:44 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:44 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 21:43 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2013.codfw.wmnet with OS bullseye
  • 21:38 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:37 zabe@deploy1002: Finished scap: create azwikimedia T306015 (duration: 07m 54s)
  • 21:36 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 21:32 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: apply JRE updates - bking@cumin1001 - T329957
  • 21:31 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:29 TheresNoTime: `[samtar@mwmaint1002 ~]$ mwscript maintenance/emptyUserGroup.php --wiki newiki reviewer` for T327114
  • 21:29 zabe@deploy1002: Started scap: create azwikimedia T306015
  • 21:25 zabe: create Azerbaijani Wikimedians User Group wiki # T306015
  • 21:16 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['wdqs2013']
  • 21:15 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2013']
  • 21:12 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns5004.wikimedia.org with OS bullseye
  • 21:07 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 20:50 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2022']
  • 20:49 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2022']
  • 20:45 bking@cumin1001: START - Cookbook sre.wdqs.restart
  • 20:35 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 20:26 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 20:25 bking@cumin1001: START - Cookbook sre.wdqs.restart
  • 20:21 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 20:18 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 20:16 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host relforge1004.eqiad.wmnet
  • 20:10 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host relforge1004.eqiad.wmnet
  • 20:09 brennen@deploy1002: Finished deploy [phabricator/deployment@3f2dd1b]: test deploy to aphlict2001, take 3 (duration: 03m 13s)
  • 20:08 bking@cumin1001: START - Cookbook sre.wdqs.restart
  • 20:06 brennen@deploy1002: Started deploy [phabricator/deployment@3f2dd1b]: test deploy to aphlict2001, take 3
  • 20:05 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 20:05 bking@cumin1001: START - Cookbook sre.wdqs.restart
  • 19:58 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host relforge1003.eqiad.wmnet
  • 19:55 brennen@deploy1002: Finished deploy [phabricator/deployment@3f2dd1b]: test deploy to aphlict2001, take 2 (duration: 01m 04s)
  • 19:54 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: apply JRE updates - bking@cumin1001 - T329957
  • 19:53 brennen@deploy1002: Started deploy [phabricator/deployment@3f2dd1b]: test deploy to aphlict2001, take 2
  • 19:51 brennen@deploy1002: Finished deploy [phabricator/deployment@3f2dd1b]: test deploy to aphlict2001 (duration: 01m 10s)
  • 19:51 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host relforge1003.eqiad.wmnet
  • 19:50 brennen@deploy1002: Started deploy [phabricator/deployment@3f2dd1b]: test deploy to aphlict2001
  • 19:50 mutante: aphlict2001 - manually created /etc/phabricator/config.yaml - empty file owned by root:phab-deploy to debug for T330393 T322369
  • 19:46 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 19:45 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 19:45 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 19:45 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 19:45 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 19:45 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 19:20 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns5003.wikimedia.org with OS bullseye
  • 18:53 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 18:53 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 18:52 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 18:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns5003.wikimedia.org with reason: host reimage
  • 18:50 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 18:49 sukhe: run puppet agent on puppetdb2003
  • 18:48 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns5003.wikimedia.org with reason: host reimage
  • 18:47 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2022']
  • 18:38 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2022']
  • 18:38 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 18:36 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 18:36 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 18:36 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 18:35 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 18:35 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 18:34 bd808@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 18:34 dcaro@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 18:24 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2021']
  • 18:15 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2021']
  • 18:14 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns5003.wikimedia.org with OS bullseye
  • 18:14 jbond@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 18:08 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 17:56 fab@deploy1002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 10s)
  • 17:55 fab@deploy1002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
  • 17:49 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4003.wikimedia.org with OS bullseye
  • 17:46 dcaro@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1003.eqiad.wmnet with OS bullseye
  • 17:45 fab@deploy1002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 27s)
  • 17:44 fab@deploy1002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
  • 17:44 fab@deploy1002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 03m 32s)
  • 17:43 dcaro@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:43 dcaro@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved cloudcephosd1003/1004 to new racks - dcaro@cumin1001"
  • 17:41 fab@deploy1002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
  • 17:36 dcaro@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved cloudcephosd1003/1004 to new racks - dcaro@cumin1001"
  • 17:36 cdanis@cumin1001: dbctl commit (dc=all): 'so hot right now', diff saved to https://phabricator.wikimedia.org/P44753 and previous config saved to /var/cache/conftool/dbconfig/20230223-173608-cdanis.json
  • 17:31 cdanis@cumin1001: dbctl commit (dc=all): 'db1127 running very hot', diff saved to https://phabricator.wikimedia.org/P44752 and previous config saved to /var/cache/conftool/dbconfig/20230223-173127-cdanis.json
  • 17:31 dcaro@cumin1001: START - Cookbook sre.dns.netbox
  • 17:27 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
  • 17:25 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
  • 17:24 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 17:23 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 17:19 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 17:18 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 17:18 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 17:17 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 17:07 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 17:06 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bullseye
  • 16:58 papaul: raplacing fpc2 to fpc1 DAC cable complete
  • 16:52 papaul: raplacing fpc2 to fpc1 DAC cable
  • 16:52 hnowlan@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 16:42 hnowlan@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 16:41 hnowlan: eqiad: roll-restarting swift frontends and thumbor hosts for key rotation
  • 16:29 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2021']
  • 16:29 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2021']
  • 16:26 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 16:18 hnowlan@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 16:15 hnowlan@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 16:14 hnowlan: codfw: roll-restarting swift frontends and thumbor hosts for key rotation
  • 16:13 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 16:13 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 16:07 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2021']
  • 16:07 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2021']
  • 16:07 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2021']
  • 16:07 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2021']
  • 16:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2020']
  • 16:04 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2020']
  • 16:04 dcaro@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:04 dcaro@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved cloudcephosd1003/1004 to new racks - dcaro@cumin1001"
  • 16:03 moritzm: installing c-ares security updates
  • 16:03 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1002
  • 16:03 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1002
  • 16:03 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1001
  • 16:03 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1001
  • 16:03 dcaro@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved cloudcephosd1003/1004 to new racks - dcaro@cumin1001"
  • 16:00 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 16:00 dcaro@cumin1001: START - Cookbook sre.dns.netbox
  • 15:57 dcaro@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:57 dcaro@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved cloudcephosd1003/1004 to new racks - dcaro@cumin1001"
  • 15:51 dcaro@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved cloudcephosd1003/1004 to new racks - dcaro@cumin1001"
  • 15:45 dcaro@cumin1001: START - Cookbook sre.dns.netbox
  • 15:42 klausman@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES eqiad cluster: Roll restart of ORES's daemons.
  • 15:38 dcaro@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:36 dcaro@cumin1001: START - Cookbook sre.dns.netbox
  • 15:27 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Move more ns-related config out of InitialiseSettings, part II (T308932) (duration: 06m 35s)
  • 15:22 klausman@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES eqiad cluster: Roll restart of ORES's daemons.
  • 15:15 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES codfw cluster: Roll restart of ORES's daemons.
  • 15:15 sukhe: running authdns-update for CR 891569
  • 15:15 ladsgroup@deploy1002: Synchronized wmf-config/core-Namespaces.php: Move more ns-related config out of InitialiseSettings, part I (T308932) (duration: 07m 01s)
  • 14:56 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES codfw cluster: Roll restart of ORES's daemons.
  • 14:55 ladsgroup@deploy1002: Finished scap: Backport for [shnwiktionary] Create 8 new namespaces (T330376) (duration: 09m 21s)
  • 14:48 ladsgroup@deploy1002: ladsgroup and superpes: Backport for [shnwiktionary] Create 8 new namespaces (T330376) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 14:45 ladsgroup@deploy1002: Started scap: Backport for [shnwiktionary] Create 8 new namespaces (T330376)
  • 14:45 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 14:44 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 14:43 ladsgroup@deploy1002: Finished scap: Backport for [sysop_itwiki] Change the logo, the favicon, and add a wordmark (T330279) (duration: 09m 58s)
  • 14:43 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 14:42 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
  • 14:41 volans: installed spicearck 6.2.2 to cumin hosts
  • 14:38 volans: uploaded spicerack_6.2.2 to apt.wikimedia.org bullseye-wikimedia
  • 14:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2020']
  • 14:36 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2019']
  • 14:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2018']
  • 14:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2017']
  • 14:35 ladsgroup@deploy1002: ladsgroup and superpes: Backport for [sysop_itwiki] Change the logo, the favicon, and add a wordmark (T330279) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 14:33 ladsgroup@deploy1002: Started scap: Backport for [sysop_itwiki] Change the logo, the favicon, and add a wordmark (T330279)
  • 14:30 ladsgroup@deploy1002: Finished scap: Backport for trwiki: Restrict ContentTranslation to autoreview/patroller/sysop (T330363) (duration: 11m 10s)
  • 14:29 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 14:29 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 14:29 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2020']
  • 14:28 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2016']
  • 14:28 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 14:27 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2019']
  • 14:27 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2015']
  • 14:27 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2018']
  • 14:27 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2017']
  • 14:26 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2014']
  • 14:25 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2013']
  • 14:21 ladsgroup@deploy1002: stang and ladsgroup: Backport for trwiki: Restrict ContentTranslation to autoreview/patroller/sysop (T330363) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 14:20 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2016']
  • 14:19 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2015']
  • 14:19 ladsgroup@deploy1002: Started scap: Backport for trwiki: Restrict ContentTranslation to autoreview/patroller/sysop (T330363)
  • 14:19 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2014']
  • 14:18 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2013']
  • 14:13 akosiaris: upgrade istio in wikikube codfw, staging-eqiad, staging-codfw to 1.15.3-2 to re-enable istio metrics
  • 12:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P44749 and previous config saved to /var/cache/conftool/dbconfig/20230223-120449-root.json
  • 11:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P44747 and previous config saved to /var/cache/conftool/dbconfig/20230223-114944-root.json
  • 11:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P44746 and previous config saved to /var/cache/conftool/dbconfig/20230223-113440-root.json
  • 11:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P44744 and previous config saved to /var/cache/conftool/dbconfig/20230223-111935-root.json
  • 10:59 moritzm: updating mw canaries to PHP 1:7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2 T330270
  • 09:51 moritzm: uploaded php7.4 1:7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2 to component/php74 T330270
  • 09:51 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd1004.eqiad.wmnet
  • 09:51 dcaro@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:51 dcaro@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dcaro@cumin1001"
  • 09:50 dcaro@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dcaro@cumin1001"
  • 09:47 moritzm: uploaded php7.4 1:7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2 to component/php74 T323358
  • 09:37 elukey: powercycle thumbor1005 - OEM even for DIMM B1 detected in `getsel`, no tty available via mgmt console
  • 09:32 dcaro@cumin1001: START - Cookbook sre.dns.netbox
  • 09:23 dcaro@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1004.eqiad.wmnet
  • 09:23 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd1003.eqiad.wmnet
  • 09:23 dcaro@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:23 dcaro@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dcaro@cumin1001"
  • 09:12 dcaro@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dcaro@cumin1001"
  • 09:09 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.24 refs T325587
  • 09:09 dcaro@cumin1001: START - Cookbook sre.dns.netbox
  • 08:45 apergos: UTC morning backport and config training done
  • 08:42 kartik@deploy1002: Finished scap: Backport for Fix contribution menu entrypoint in vector-2022 skin (T329893) (duration: 10m 27s)
  • 08:36 dcaro@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1003.eqiad.wmnet
  • 08:33 kartik@deploy1002: kartik: Backport for Fix contribution menu entrypoint in vector-2022 skin (T329893) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 08:31 kartik@deploy1002: Started scap: Backport for Fix contribution menu entrypoint in vector-2022 skin (T329893)
  • 08:30 kartik@deploy1002: Finished scap: Backport for Fix contribution menu entrypoint in vector-2022 skin (T329893) (duration: 13m 08s)
  • 08:19 kartik@deploy1002: kartik: Backport for Fix contribution menu entrypoint in vector-2022 skin (T329893) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 08:17 kartik@deploy1002: Started scap: Backport for Fix contribution menu entrypoint in vector-2022 skin (T329893)
  • 07:49 hashar: operations/mediawiki-config will no run `tox` to verify logos | T329231 | https://gerrit.wikimedia.org/r/c/integration/config/+/891317
  • 02:57 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 02:57 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS ms-be2070 - pt1979@cumin2002"
  • 02:51 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS ms-be2070 - pt1979@cumin2002"
  • 02:49 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 02:48 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2070
  • 02:46 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2070
  • 02:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2022.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:16 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2022.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2021.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2020.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2019.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:00 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2021.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2017.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2018.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:57 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2020.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:55 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2019.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:53 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2018.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:52 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2017.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2016.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2015.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2014.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2013.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:39 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2016.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:37 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2015.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:35 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2014.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:34 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2013.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:32 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:32 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new wdqs nodes - pt1979@cumin2002"
  • 01:31 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new wdqs nodes - pt1979@cumin2002"
  • 01:29 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 00:47 zabe@deploy1002: Finished scap: Backport for Fix interwiki prefix for generic wikimaniawiki (T327575) (duration: 08m 33s)
  • 00:41 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 00:40 zabe@deploy1002: aklapper and zabe: Backport for Fix interwiki prefix for generic wikimaniawiki (T327575) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 00:39 zabe@deploy1002: Started scap: Backport for Fix interwiki prefix for generic wikimaniawiki (T327575)
  • 00:36 zabe@deploy1002: Finished scap: Backport for throttle: Remove expired rules (duration: 08m 36s)
  • 00:29 zabe@deploy1002: zabe: Backport for throttle: Remove expired rules synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 00:27 zabe@deploy1002: Started scap: Backport for throttle: Remove expired rules

2023-02-22

  • 23:10 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:08 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 23:07 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 23:07 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:56 zabe@deploy1002: Synchronized wmf-config/interwiki.php: T230382 (duration: 07m 06s)
  • 21:40 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:40 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 21:39 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:36 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 21:31 urbanecm@deploy1002: Finished scap: Backport for Build backend for PersonalizedPraise (T322444) (duration: 07m 22s)
  • 21:24 urbanecm@deploy1002: Started scap: Backport for Build backend for PersonalizedPraise (T322444)
  • 21:10 urbanecm@deploy1002: Finished scap: Backport for Growth: Set GEPersonalizedPraiseBackendEnabled to true on pilot wikis (T322444) (duration: 07m 33s)
  • 21:02 urbanecm@deploy1002: Started scap: Backport for Growth: Set GEPersonalizedPraiseBackendEnabled to true on pilot wikis (T322444)
  • 20:37 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase10[26-27].eqiad.wmnet: Replace expiring keys/certs - eevans@cumin1001
  • 20:17 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10[26-27].eqiad.wmnet: Replace expiring keys/certs - eevans@cumin1001
  • 20:14 eevans@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching restbase10[19-27].eqiad.wmnet: Replace expiring keys/certs - eevans@cumin1001
  • 20:08 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:08 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 20:07 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:06 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 19:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P44743 and previous config saved to /var/cache/conftool/dbconfig/20230222-193422-root.json
  • 19:34 mforns: restarted the following an-launcher1002 timers, which seemed stuck (next run = n/a): gobblin-webrequest.timer, reportupdater-browser.timer, reportupdater-reference-previews.timer, refine_event.timer, refine_eventlogging_legacy.timer
  • 19:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P44742 and previous config saved to /var/cache/conftool/dbconfig/20230222-191918-root.json
  • 19:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P44741 and previous config saved to /var/cache/conftool/dbconfig/20230222-190413-root.json
  • 18:58 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10[19-27].eqiad.wmnet: Replace expiring keys/certs - eevans@cumin1001
  • 18:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P44740 and previous config saved to /var/cache/conftool/dbconfig/20230222-184908-root.json
  • 18:33 mutante: planet* - stopping and restarting all the timers for the various languages, commands from https://phabricator.wikimedia.org/P44739
  • 18:24 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 18:24 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 18:23 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 18:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'depool db1112', diff saved to https://phabricator.wikimedia.org/P44738 and previous config saved to /var/cache/conftool/dbconfig/20230222-181046-ladsgroup.json
  • 18:08 jbond: stop all failed timer servies and restart the corrosponding timer unit
  • 17:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 100%: T329864', diff saved to https://phabricator.wikimedia.org/P44737 and previous config saved to /var/cache/conftool/dbconfig/20230222-175434-root.json
  • 17:50 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 17:41 sukhe: force puppet run on stat1007
  • 17:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 75%: T329864', diff saved to https://phabricator.wikimedia.org/P44736 and previous config saved to /var/cache/conftool/dbconfig/20230222-173929-root.json
  • 17:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 25%: T329864', diff saved to https://phabricator.wikimedia.org/P44734 and previous config saved to /var/cache/conftool/dbconfig/20230222-172424-root.json
  • 17:24 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns4004.wikimedia.org with OS bullseye
  • 17:17 sukhe: running authdns-update for T152882 / CR 890908
  • 17:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 10%: T329864', diff saved to https://phabricator.wikimedia.org/P44733 and previous config saved to /var/cache/conftool/dbconfig/20230222-170920-root.json
  • 16:23 hashar@deploy1002: Finished deploy [integration/docroot@b32e023]: doc: Add GrowthExperiments to MediaWiki components - T329034 (duration: 00m 07s)
  • 16:23 hashar@deploy1002: Started deploy [integration/docroot@b32e023]: doc: Add GrowthExperiments to MediaWiki components - T329034
  • 16:21 hashar@deploy1002: Finished deploy [integration/docroot@956dd11]: zuul: Link to report_url if available (duration: 00m 15s)
  • 16:21 hashar@deploy1002: Started deploy [integration/docroot@956dd11]: zuul: Link to report_url if available
  • 16:17 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 16:09 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 16:09 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 16:08 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 15:58 urbanecm@deploy1002: Finished scap: Backport for [tox] Make running `tox` work (T329231) (duration: 07m 54s)
  • 15:52 urbanecm@deploy1002: urbanecm: Backport for [tox] Make running `tox` work (T329231) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 15:50 urbanecm@deploy1002: Started scap: Backport for [tox] Make running `tox` work (T329231)
  • 15:47 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:45 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:45 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 15:30 moritzm: update mwdebug2002 to PHP 1:7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2 T323358
  • 15:10 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: sync
  • 15:09 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: sync
  • 15:08 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: sync
  • 15:08 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: sync
  • 15:08 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: sync
  • 15:08 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: sync
  • 15:02 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage
  • 14:57 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage
  • 14:48 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:47 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:38 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS bullseye
  • 14:30 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Move all of userrights config out of IS.php to a dedicated file, part III (T308932) (duration: 06m 16s)
  • 14:27 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@1a041e2] (releasing): (no justification provided) (duration: 00m 49s)
  • 14:26 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@1a041e2] (releasing): (no justification provided)
  • 14:23 ladsgroup@deploy1002: Synchronized multiversion/MWConfigCacheGenerator.php: Move all of userrights config out of IS.php to a dedicated file, part II (T308932) (duration: 07m 01s)
  • 14:15 ladsgroup@deploy1002: Synchronized wmf-config/core-Permissions.php: Move all of userrights config out of IS.php to a dedicated file, part I (T308932) (duration: 68m 38s)
  • 14:11 akosiaris: uncordon kubernetes20{17,18,19,21} T330048
  • 14:10 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: sync
  • 14:10 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: sync
  • 14:09 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 14:09 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 14:08 akosiaris: test network connectivity of kubernetes20{17,18,19,21}
  • 14:06 TheresNoTime: UTC afternoon backport window not done due to in-progress deployment
  • 14:06 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 12:40 moritzm: rolling restart of FPM on mw canaries
  • 12:32 moritzm: installing openssl security updates on buster
  • 12:22 akosiaris: test
  • 12:18 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
  • 12:18 cgoubert@cumin1001: MediaWiki read-only period ends at: 2023-02-22 12:18:45.829060
  • 12:14 cgoubert@cumin1001: conftool action : set/val=false; selector: name=ReadOnly,scope=codfw
  • 12:13 cgoubert@cumin1001: conftool action : set/val=no; selector: name=ReadOnly,scope=codfw
  • 12:13 cgoubert@cumin1001: conftool action : set/val=False; selector: name=ReadOnly,scope=codfw
  • 12:02 moritzm: installing NSS security updates
  • 11:57 cgoubert@cumin1001: conftool action : set/val=false; selector: name=ReadOnly,scope=codfw
  • 11:49 jynus@cumin1001: dbctl commit (dc=all): 'Repool db1166, errors not fixed', diff saved to https://phabricator.wikimedia.org/P44727 and previous config saved to /var/cache/conftool/dbconfig/20230222-114940-jynus.json
  • 11:45 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1166, seen mw errors', diff saved to https://phabricator.wikimedia.org/P44726 and previous config saved to /var/cache/conftool/dbconfig/20230222-114515-jynus.json
  • 11:26 moritzm: installing git security updates
  • 11:24 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters (exit_code=0)
  • 11:18 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters
  • 11:17 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.09-restore-ttl (exit_code=0)
  • 11:16 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.09-restore-ttl
  • 11:16 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 11:15 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 11:14 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners (exit_code=0)
  • 11:14 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners
  • 11:13 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
  • 11:13 cgoubert@cumin1001: [DRY-RUN] MediaWiki read-only period ends at: 2023-02-22 11:13:51.466468
  • 11:13 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
  • 11:13 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
  • 11:13 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
  • 11:13 eoghan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gitlab2002.wikimedia.org with reason: Running failover to gitlab1003 - T329930
  • 11:13 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
  • 11:13 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
  • 11:13 eoghan@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab2002.wikimedia.org with reason: Running failover to gitlab1003 - T329930
  • 11:04 cgoubert@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=99)
  • 11:03 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
  • 11:03 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
  • 11:03 cgoubert@cumin1001: [DRY-RUN] MediaWiki read-only period starts at: 2023-02-22 11:03:19.149671
  • 11:03 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
  • 11:02 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
  • 11:02 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 11:01 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks (exit_code=0)
  • 11:01 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks
  • 11:01 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 11:01 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 10:47 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2019.codfw.wmnet with OS bullseye
  • 10:45 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2021.codfw.wmnet with OS bullseye
  • 10:40 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2020.codfw.wmnet with OS bullseye
  • 10:39 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2018.codfw.wmnet with OS bullseye
  • 10:35 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 10:35 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 10:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2019.codfw.wmnet with reason: host reimage
  • 10:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2021.codfw.wmnet with reason: host reimage
  • 10:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2020.codfw.wmnet with reason: host reimage
  • 10:22 nfraison@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1005.eqiad.wmnet with OS bullseye
  • 10:21 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2018.codfw.wmnet with reason: host reimage
  • 10:21 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2017.codfw.wmnet with OS bullseye
  • 10:20 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2021.codfw.wmnet with reason: host reimage
  • 10:19 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2020.codfw.wmnet with reason: host reimage
  • 10:18 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2019.codfw.wmnet with reason: host reimage
  • 10:18 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2018.codfw.wmnet with reason: host reimage
  • 10:13 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 10:08 claime: Starting sre.switchdc.mediawiki live test preparation steps
  • 10:07 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 10:05 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2021.codfw.wmnet with OS bullseye
  • 10:04 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2020.codfw.wmnet with OS bullseye
  • 10:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2017.codfw.wmnet with reason: host reimage
  • 10:04 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2019.codfw.wmnet with OS bullseye
  • 10:04 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2018.codfw.wmnet with OS bullseye
  • 10:01 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2017.codfw.wmnet with reason: host reimage
  • 09:59 hashar@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.24 refs T325587 (duration: 06m 33s)
  • 09:52 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.24 refs T325587
  • 09:51 nfraison@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1005.eqiad.wmnet with reason: host reimage
  • 09:48 nfraison@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1005.eqiad.wmnet with reason: host reimage
  • 09:46 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2017.codfw.wmnet with OS bullseye
  • 09:30 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.40.0-wmf.23" - T325587
  • 09:14 hashar@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.24 refs T325587 (duration: 06m 38s)
  • 09:07 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.24 refs T325587
  • 09:03 nfraison@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1004.eqiad.wmnet with OS bullseye
  • 08:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 08:50 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 08:49 vgutierrez: rolling upgrade to HAProxy 2.6.9 in codfw, eqsin, drmrs, esams and eqiad
  • 08:47 nfraison@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1004.eqiad.wmnet with reason: host reimage
  • 08:43 nfraison@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1004.eqiad.wmnet with reason: host reimage
  • 08:36 ryankemper: [WDQS] Repooled `wdqs20[05,07,10]`
  • 08:22 kartik@deploy1002: Finished scap: Backport for Content Translation: Set MT threshold to 45% for Kurdish WP (T324941) (duration: 10m 41s)
  • 08:17 nfraison@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1004.eqiad.wmnet with OS bullseye
  • 08:14 kartik@deploy1002: kartik: Backport for Content Translation: Set MT threshold to 45% for Kurdish WP (T324941) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 08:12 kartik@deploy1002: Started scap: Backport for Content Translation: Set MT threshold to 45% for Kurdish WP (T324941)
  • 08:00 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1128', diff saved to https://phabricator.wikimedia.org/P44724 and previous config saved to /var/cache/conftool/dbconfig/20230222-080050-jynus.json
  • 01:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2186.codfw.wmnet with OS bullseye
  • 01:52 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2186.codfw.wmnet with reason: host reimage
  • 01:33 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2186.codfw.wmnet with reason: host reimage
  • 01:31 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2186.codfw.wmnet with OS bullseye

2023-02-21

  • 23:55 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T328255)', diff saved to https://phabricator.wikimedia.org/P44723 and previous config saved to /var/cache/conftool/dbconfig/20230221-235012-ladsgroup.json
  • 23:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P44722 and previous config saved to /var/cache/conftool/dbconfig/20230221-233506-ladsgroup.json
  • 23:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P44721 and previous config saved to /var/cache/conftool/dbconfig/20230221-232000-ladsgroup.json
  • 23:09 tzatziki: removing 5 files for legal compliance
  • 23:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T328255)', diff saved to https://phabricator.wikimedia.org/P44720 and previous config saved to /var/cache/conftool/dbconfig/20230221-230454-ladsgroup.json
  • 23:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T328255)', diff saved to https://phabricator.wikimedia.org/P44719 and previous config saved to /var/cache/conftool/dbconfig/20230221-230109-ladsgroup.json
  • 23:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 23:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 23:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T328255)', diff saved to https://phabricator.wikimedia.org/P44718 and previous config saved to /var/cache/conftool/dbconfig/20230221-230048-ladsgroup.json
  • 22:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P44717 and previous config saved to /var/cache/conftool/dbconfig/20230221-224542-ladsgroup.json
  • 22:44 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1003.eqiad.wmnet: Restarting Cassandra to apply JVM 1.8.0_362 - eevans@cumin1001
  • 22:43 tzatziki: removing 15 files for legal compliance
  • 22:37 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1003.eqiad.wmnet: Restarting Cassandra to apply JVM 1.8.0_362 - eevans@cumin1001
  • 22:37 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1002.eqiad.wmnet: Restarting Cassandra to apply JVM 1.8.0_362 - eevans@cumin1001
  • 22:31 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1002.eqiad.wmnet: Restarting Cassandra to apply JVM 1.8.0_362 - eevans@cumin1001
  • 22:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P44716 and previous config saved to /var/cache/conftool/dbconfig/20230221-223036-ladsgroup.json
  • 22:30 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1001.eqiad.wmnet: Restarting Cassandra to apply JVM 1.8.0_362 - eevans@cumin1001
  • 22:24 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1001.eqiad.wmnet: Restarting Cassandra to apply JVM 1.8.0_362 - eevans@cumin1001
  • 22:22 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2003.codfw.wmnet: Restarting Cassandra to apply JVM 1.8.0_362 - eevans@cumin1001
  • 22:16 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2003.codfw.wmnet: Restarting Cassandra to apply JVM 1.8.0_362 - eevans@cumin1001
  • 22:16 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:16 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2002.codfw.wmnet: Restarting Cassandra to apply JVM 1.8.0_362 - eevans@cumin1001
  • 22:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T328255)', diff saved to https://phabricator.wikimedia.org/P44715 and previous config saved to /var/cache/conftool/dbconfig/20230221-221529-ladsgroup.json
  • 22:14 catrope@deploy1002: Finished scap: Backport for Add VueTest to extension-list, add config var (T315621) (duration: 37m 07s)
  • 22:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T328255)', diff saved to https://phabricator.wikimedia.org/P44714 and previous config saved to /var/cache/conftool/dbconfig/20230221-221042-ladsgroup.json
  • 22:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 22:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 22:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T328255)', diff saved to https://phabricator.wikimedia.org/P44713 and previous config saved to /var/cache/conftool/dbconfig/20230221-221021-ladsgroup.json
  • 22:10 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2002.codfw.wmnet: Restarting Cassandra to apply JVM 1.8.0_362 - eevans@cumin1001
  • 22:09 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2001.codfw.wmnet: Restarting Cassandra to apply JVM 1.8.0_362 - eevans@cumin1001
  • 22:02 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2001.codfw.wmnet: Restarting Cassandra to apply JVM 1.8.0_362 - eevans@cumin1001
  • 22:01 catrope@deploy1002: catrope: Backport for Add VueTest to extension-list, add config var (T315621) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 21:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P44712 and previous config saved to /var/cache/conftool/dbconfig/20230221-215515-ladsgroup.json
  • 21:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P44711 and previous config saved to /var/cache/conftool/dbconfig/20230221-214009-ladsgroup.json
  • 21:37 catrope@deploy1002: Started scap: Backport for Add VueTest to extension-list, add config var (T315621)
  • 21:35 catrope@deploy1002: Finished scap: Backport for Remove wgLinterSubmitterWhitelist (T329992) (duration: 10m 36s)
  • 21:26 catrope@deploy1002: arlolra and catrope: Backport for Remove wgLinterSubmitterWhitelist (T329992) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T328255)', diff saved to https://phabricator.wikimedia.org/P44710 and previous config saved to /var/cache/conftool/dbconfig/20230221-212503-ladsgroup.json
  • 21:24 catrope@deploy1002: Started scap: Backport for Remove wgLinterSubmitterWhitelist (T329992)
  • 21:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T328255)', diff saved to https://phabricator.wikimedia.org/P44709 and previous config saved to /var/cache/conftool/dbconfig/20230221-212123-ladsgroup.json
  • 21:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 21:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 21:19 catrope@deploy1002: Finished scap: Backport for Add static "Cleopatra" page to facilitate synthetic testing of 885362 (T326147 T293303) (duration: 10m 28s)
  • 21:11 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2186.codfw.wmnet with OS bullseye
  • 21:10 catrope@deploy1002: catrope and nray: Backport for Add static "Cleopatra" page to facilitate synthetic testing of 885362 (T326147 T293303) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 21:08 catrope@deploy1002: Started scap: Backport for Add static "Cleopatra" page to facilitate synthetic testing of 885362 (T326147 T293303)
  • 20:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2108 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P44708 and previous config saved to /var/cache/conftool/dbconfig/20230221-205822-root.json
  • 20:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2186.codfw.wmnet with reason: host reimage
  • 20:54 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2186.codfw.wmnet with reason: host reimage
  • 20:45 ebernhardson@deploy1002: Finished deploy [airflow-dags/search@5edcd7b]: Test deployment of search airflow dags (duration: 01m 08s)
  • 20:44 ebernhardson@deploy1002: Started deploy [airflow-dags/search@5edcd7b]: Test deployment of search airflow dags
  • 20:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2108 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P44707 and previous config saved to /var/cache/conftool/dbconfig/20230221-204317-root.json
  • 20:35 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2186.codfw.wmnet with OS bullseye
  • 20:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2108 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P44706 and previous config saved to /var/cache/conftool/dbconfig/20230221-202813-root.json
  • 20:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2108 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P44705 and previous config saved to /var/cache/conftool/dbconfig/20230221-201308-root.json
  • 20:10 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db2186']
  • 20:10 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2186']
  • 18:38 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:38 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2186.codfw.wmnet with OS bullseye
  • 18:37 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 18:21 dancy@deploy1002: Installing scap version "4.38.0" for 564 hosts
  • 18:02 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool restbase-async in eqiad: T327991
  • 17:57 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) restbase-async.discovery.wmnet on all recursors
  • 17:57 cgoubert@cumin1001: START - Cookbook sre.dns.wipe-cache restbase-async.discovery.wmnet on all recursors
  • 17:57 cgoubert@cumin1001: START - Cookbook sre.discovery.service-route depool restbase-async in eqiad: T327991
  • 17:55 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 17:54 sukhe: run authdns-update for Gerrit: 890847. repooling codfw
  • 17:53 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 17:53 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 17:52 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 17:51 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 17:50 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 17:50 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2186.codfw.wmnet with OS bullseye
  • 17:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 17:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 17:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2186']
  • 17:42 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 17:41 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 17:37 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: sync
  • 17:37 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: sync
  • 17:36 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 17:36 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 17:33 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2186']
  • 17:33 pt1979@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['db2186']
  • 17:33 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2186']
  • 17:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2186.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:31 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 17:31 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 17:31 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 17:31 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 17:31 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 17:31 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 17:30 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 17:30 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 17:27 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: sync
  • 17:27 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: sync
  • 17:26 cwhite: Grafana 9x upgrade in production complete T317887
  • 17:25 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 17:25 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 17:17 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.24 refs T325587
  • 17:09 nfraison@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1003.eqiad.wmnet with OS bullseye
  • 17:08 elukey@puppetmaster1001: conftool action : set/weight=10; selector: name=kubernetes2023.codfw.wmnet
  • 17:07 elukey@puppetmaster1001: conftool action : set/weight=10; selector: name=kubernetes2024.codfw.wmnet
  • 17:07 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=kubernetes2024.codfw.wmnet
  • 17:07 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=kubernetes2023.codfw.wmnet
  • 17:04 akosiaris@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: T327991 - None
  • 17:00 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2186.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:00 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2186.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:59 bking@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wcqs,name=codfw
  • 16:59 bking@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal,name=codfw
  • 16:59 bking@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=codfw
  • 16:57 akosiaris@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in codfw: None - None
  • 16:57 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter status all services in codfw: None - None
  • 16:51 nfraison@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1003.eqiad.wmnet with reason: host reimage
  • 16:50 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: T327991 - None
  • 16:48 nfraison@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1003.eqiad.wmnet with reason: host reimage
  • 16:24 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 16:22 papaul: rebooting mgmt switch in rack c3
  • 16:20 papaul: rebooting mgmt switch in rack b3
  • 16:17 papaul: rebooting mgmt switch in rack b1
  • 16:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T328255)', diff saved to https://phabricator.wikimedia.org/P44704 and previous config saved to /var/cache/conftool/dbconfig/20230221-161552-ladsgroup.json
  • 16:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 16:15 nfraison@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1003.eqiad.wmnet with OS bullseye
  • 16:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 16:13 papaul: rebooting mgmt switch in rack a7
  • 16:10 papaul: rebooting mgmt switch in rack a5
  • 16:06 moritzm: imported libxml2 2.9.4+dfsg1-7+deb10u5+icu67+wmf1 to component/icu67 for buster-wikimedia T329491
  • 16:05 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 16:05 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 16:02 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 16:02 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:29 jayme@cumin1001: END (PASS) - Cookbook sre.k8s.upgrade-cluster (exit_code=0) Upgrade K8s version: T329664
  • 15:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 15:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 15:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 15:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 15:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 15:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 15:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 15:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 15:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 15:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 15:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 15:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 15:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 15:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 15:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 15:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 15:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 14:36 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=thanos-fe2002.codfw.wmnet
  • 14:36 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=prometheus2005.codfw.wmnet
  • 14:29 moritzm: installing NSS security updates
  • 14:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: codfw maint (T327991)
  • 14:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: codfw maint (T327991)
  • 14:07 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2186.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 27 hosts with reason: codfw maint (T327991)
  • 14:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 27 hosts with reason: codfw maint (T327991)
  • 14:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2134,2160].codfw.wmnet,db[1117,1159].eqiad.wmnet with reason: codfw maint (T327991)
  • 14:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db[2134,2160].codfw.wmnet,db[1117,1159].eqiad.wmnet with reason: codfw maint (T327991)
  • 14:03 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: codfw maint (T327991)
  • 14:02 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: codfw maint (T327991)
  • 14:00 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 14:00 vgutierrez: depooling codfw - T327991
  • 13:59 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:59 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 13:59 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 13:59 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 13:59 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 13:59 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 13:58 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 13:58 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 13:58 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 13:58 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 13:58 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 13:56 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 13:55 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 13:55 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 13:55 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:55 gehel: depooling wdqs[2005,2007,2010].codfw.wmnet for switch maintenance - T327991
  • 13:54 gehel: depooling wcqs2001.codfw.wmnet for switch maintenance - T327991
  • 13:54 vgutierrez: depool doh2002 - T327991
  • 13:54 gehel: depooling elastic[2041-2044,2057-2058,2063-2064,2070,2077-2080].codfw.wmnet for switch maintenance - T327991
  • 13:51 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2009.codfw.wmnet with OS bullseye
  • 13:51 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 13:50 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 13:50 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 13:50 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 13:50 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 13:50 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 13:49 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 13:49 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 13:49 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 13:49 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 13:48 godog: stop kafka on kafka-logging[2002,2004].codfw.wmnet - T327991
  • 13:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 215 hosts with reason: codfw row B upgrade
  • 13:38 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 215 hosts with reason: codfw row B upgrade
  • 13:38 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2014.codfw.wmnet with OS bullseye
  • 13:36 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2011.codfw.wmnet with OS bullseye
  • 13:34 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2022.codfw.wmnet with OS bullseye
  • 13:33 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2009.codfw.wmnet with reason: host reimage
  • 13:33 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2008.codfw.wmnet with OS bullseye
  • 13:31 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2012.codfw.wmnet with OS bullseye
  • 13:30 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2009.codfw.wmnet with reason: host reimage
  • 13:28 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2013.codfw.wmnet with OS bullseye
  • 13:26 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2024.codfw.wmnet with OS bullseye
  • 13:26 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2007.codfw.wmnet with OS bullseye
  • 13:25 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2010.codfw.wmnet with OS bullseye
  • 13:24 nfraison@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1002.eqiad.wmnet with OS bullseye
  • 13:21 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2020.codfw.wmnet with OS bullseye
  • 13:21 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes2014.codfw.wmnet with reason: host reimage
  • 13:19 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes2011.codfw.wmnet with reason: host reimage
  • 13:18 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2024.codfw.wmnet with reason: host reimage
  • 13:17 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2023.codfw.wmnet with OS bullseye
  • 13:16 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubernetes2015.codfw.wmnet with OS bullseye
  • 13:16 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2022.codfw.wmnet with reason: host reimage
  • 13:16 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes2008.codfw.wmnet with reason: host reimage
  • 13:15 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubernetes2006.codfw.wmnet with OS bullseye
  • 13:14 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2009.codfw.wmnet with OS bullseye
  • 13:14 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2012.codfw.wmnet with reason: host reimage
  • 13:13 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2024.codfw.wmnet with reason: host reimage
  • 13:13 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2009.codfw.wmnet with OS bullseye
  • 13:12 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2022.codfw.wmnet with reason: host reimage
  • 13:12 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubernetes2016.codfw.wmnet with OS bullseye
  • 13:11 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2013.codfw.wmnet with reason: host reimage
  • 13:11 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2014.codfw.wmnet with reason: host reimage
  • 13:10 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2012.codfw.wmnet with reason: host reimage
  • 13:09 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2007.codfw.wmnet with reason: host reimage
  • 13:09 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2011.codfw.wmnet with reason: host reimage
  • 13:08 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubernetes2005.codfw.wmnet with OS bullseye
  • 13:06 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2013.codfw.wmnet with reason: host reimage
  • 13:06 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2010.codfw.wmnet with reason: host reimage
  • 13:06 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes2020.codfw.wmnet with reason: host reimage
  • 13:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2008.codfw.wmnet with reason: host reimage
  • 13:04 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2007.codfw.wmnet with reason: host reimage
  • 13:04 nfraison@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1002.eqiad.wmnet with reason: host reimage
  • 13:02 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2023.codfw.wmnet with reason: host reimage
  • 13:01 nfraison@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1002.eqiad.wmnet with reason: host reimage
  • 12:59 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2010.codfw.wmnet with reason: host reimage
  • 12:59 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2015.codfw.wmnet with reason: host reimage
  • 12:59 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2023.codfw.wmnet with reason: host reimage
  • 12:58 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2020.codfw.wmnet with reason: host reimage
  • 12:58 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2024.codfw.wmnet with OS bullseye
  • 12:57 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2022.codfw.wmnet with OS bullseye
  • 12:56 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2016.codfw.wmnet with reason: host reimage
  • 12:56 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes2006.codfw.wmnet with reason: host reimage
  • 12:55 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2014.codfw.wmnet with OS bullseye
  • 12:54 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2012.codfw.wmnet with OS bullseye
  • 12:54 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2005.codfw.wmnet with reason: host reimage
  • 12:53 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2011.codfw.wmnet with OS bullseye
  • 12:51 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2006.codfw.wmnet with reason: host reimage
  • 12:51 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2015.codfw.wmnet with reason: host reimage
  • 12:51 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2016.codfw.wmnet with reason: host reimage
  • 12:51 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2005.codfw.wmnet with reason: host reimage
  • 12:51 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2013.codfw.wmnet with OS bullseye
  • 12:50 nfraison@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1002.eqiad.wmnet with OS bullseye
  • 12:50 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2008.codfw.wmnet with OS bullseye
  • 12:49 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2007.codfw.wmnet with OS bullseye
  • 12:43 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2023.codfw.wmnet with OS bullseye
  • 12:43 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2020.codfw.wmnet with OS bullseye
  • 12:43 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2010.codfw.wmnet with OS bullseye
  • 12:41 nfraison@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1001.eqiad.wmnet with OS bullseye
  • 12:40 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubernetes2015.codfw.wmnet with OS bullseye
  • 12:40 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubernetes2016.codfw.wmnet with OS bullseye
  • 12:40 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubernetes2006.codfw.wmnet with OS bullseye
  • 12:39 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubernetes2005.codfw.wmnet with OS bullseye
  • 12:36 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2009.codfw.wmnet with OS bullseye
  • 12:35 jayme@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: T329664
  • 12:34 jayme@cumin1001: END (FAIL) - Cookbook sre.k8s.upgrade-cluster (exit_code=99) Upgrade K8s version: T329664
  • 12:34 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubemaster2002.codfw.wmnet with OS bullseye
  • 12:19 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubemaster2002.codfw.wmnet with reason: host reimage
  • 12:16 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubemaster2002.codfw.wmnet with reason: host reimage
  • 12:12 akosiaris: add 10.194.128.0/18 to kubernetes-ipv4 prefix-list for codfw. T326617
  • 12:06 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubemaster2002.codfw.wmnet with OS bullseye
  • 12:05 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubemaster2001.codfw.wmnet with OS bullseye
  • 11:55 nfraison@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1001.eqiad.wmnet with reason: host reimage
  • 11:52 nfraison@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1001.eqiad.wmnet with reason: host reimage
  • 11:49 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubemaster2001.codfw.wmnet with reason: host reimage
  • 11:46 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubemaster2001.codfw.wmnet with reason: host reimage
  • 11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P44702 and previous config saved to /var/cache/conftool/dbconfig/20230221-114338-root.json
  • 11:35 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 11:34 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubemaster2001.codfw.wmnet with OS bullseye
  • 11:32 root@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubetcd2004.codfw.wmnet with OS bullseye
  • 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P44701 and previous config saved to /var/cache/conftool/dbconfig/20230221-112833-root.json
  • 11:27 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubetcd2005.codfw.wmnet with OS bullseye
  • 11:26 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubetcd2006.codfw.wmnet with OS bullseye
  • 11:26 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 11:25 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 11:25 jayme@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: T329664
  • 11:25 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 11:25 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 11:24 jayme@cumin1001: END (FAIL) - Cookbook sre.k8s.upgrade-cluster (exit_code=99) Upgrade K8s version: T329664
  • 11:24 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 11:23 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 11:23 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 11:19 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubetcd2004.codfw.wmnet with reason: host reimage
  • 11:16 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1003.eqiad.wmnet with OS bullseye
  • 11:16 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubetcd2004.codfw.wmnet with reason: host reimage
  • 11:16 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubetcd2005.codfw.wmnet with reason: host reimage
  • 11:13 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubetcd2006.codfw.wmnet with reason: host reimage
  • 11:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P44700 and previous config saved to /var/cache/conftool/dbconfig/20230221-111328-root.json
  • 11:12 vgutierrez: rolling upgrade to HAproxy 2.6.9 on ulsfo
  • 11:11 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubetcd2005.codfw.wmnet with reason: host reimage
  • 11:11 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubetcd2006.codfw.wmnet with reason: host reimage
  • 11:01 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1003.eqiad.wmnet with reason: host reimage
  • 11:00 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubetcd2006.codfw.wmnet with OS bullseye
  • 10:59 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubetcd2005.codfw.wmnet with OS bullseye
  • 10:59 root@cumin1001: START - Cookbook sre.ganeti.reimage for host kubetcd2004.codfw.wmnet with OS bullseye
  • 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P44699 and previous config saved to /var/cache/conftool/dbconfig/20230221-105823-root.json
  • 10:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2161 T330134', diff saved to https://phabricator.wikimedia.org/P44698 and previous config saved to /var/cache/conftool/dbconfig/20230221-105714-ladsgroup.json
  • 10:56 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1003.eqiad.wmnet with reason: host reimage
  • 10:55 jayme@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: T329664
  • 10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2165 to s8 primary T330134', diff saved to https://phabricator.wikimedia.org/P44697 and previous config saved to /var/cache/conftool/dbconfig/20230221-105503-ladsgroup.json
  • 10:54 Amir1: Starting s8 codfw failover from db2161 to db2165 - T330134
  • 10:50 nfraison@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1001.eqiad.wmnet with OS bullseye
  • 10:49 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 10:46 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 23 hosts with reason: Reinitialize wikikube codfw with k8s 1.23
  • 10:46 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 23 hosts with reason: Reinitialize wikikube codfw with k8s 1.23
  • 10:43 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc-gp1003.eqiad.wmnet with OS bullseye
  • 10:39 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 10:37 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:37 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 10:33 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2165 with weight 0 T330134', diff saved to https://phabricator.wikimedia.org/P44696 and previous config saved to /var/cache/conftool/dbconfig/20230221-103053-ladsgroup.json
  • 10:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s8 T330134
  • 10:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s8 T330134
  • 10:29 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 09:59 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 09:53 jayme@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99) depool 2 services in codfw: T329664
  • 09:48 jayme@cumin1001: START - Cookbook sre.discovery.service-route depool 2 services in codfw: T329664
  • 09:48 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 09:46 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 09:36 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 09:31 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) depool all active/active services in codfw: maintenance
  • 09:24 filippo@cumin1001: conftool action : set/pooled=no; selector: name=prometheus2005.codfw.wmnet
  • 09:24 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe2002.codfw.wmnet
  • 09:14 vgutierrez: testing HAProxy 2.6.9 in cp4052 and cp4044
  • 09:13 jayme@cumin1001: START - Cookbook sre.discovery.datacenter depool all active/active services in codfw: maintenance
  • 09:12 hashar@deploy1002: Pruned MediaWiki: 1.40.0-wmf.22 (duration: 02m 16s)
  • 09:12 vgutierrez: update thirdparty/haproxy26 to version 2.6.9 for bullseye and buster (apt.wm.o)
  • 09:10 hashar@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.24 refs T325587 (duration: 45m 58s)
  • 08:49 moritzm: installing clamav security updates
  • 08:24 hashar@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.24 refs T325587
  • 08:21 kartik@deploy1002: Finished scap: Backport for Section Translation: Fix language code for Cantonese Wikipedia (T304865) (duration: 16m 36s)
  • 08:09 kartik@deploy1002: kartik: Backport for Section Translation: Fix language code for Cantonese Wikipedia (T304865) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 08:04 kartik@deploy1002: Started scap: Backport for Section Translation: Fix language code for Cantonese Wikipedia (T304865)
  • 07:49 XioNoX: Staging the new Junos version on the codfw row B switches - T327991
  • 01:06 urbanecm@deploy1002: Finished scap: Backport for cswikibooks: Enable visualeditor for all users (T330015) (duration: 08m 47s)
  • 00:59 urbanecm@deploy1002: urbanecm: Backport for cswikibooks: Enable visualeditor for all users (T330015) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 00:57 urbanecm@deploy1002: Started scap: Backport for cswikibooks: Enable visualeditor for all users (T330015)

2023-02-20

  • 21:35 zabe: close UTC late backport window
  • 21:33 zabe@deploy1002: Finished scap: T329983 T330104 (duration: 11m 51s)
  • 21:24 zabe@deploy1002: zabe: T329983 T330104 synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 21:21 zabe@deploy1002: Started scap: T329983 T330104
  • 21:16 zabe: zabe@mwmaint1002:~$ mwscript namespaceDupes.php tawiki --fix # T329248
  • 21:14 zabe@deploy1002: Finished scap: Backport for [tawiki] Add Draft and Draft_talk namespaces (T329248) (duration: 08m 52s)
  • 21:07 zabe@deploy1002: superpes and zabe: Backport for [tawiki] Add Draft and Draft_talk namespaces (T329248) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 21:06 zabe@deploy1002: Started scap: Backport for [tawiki] Add Draft and Draft_talk namespaces (T329248)
  • 20:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1002.eqiad.wmnet with OS bullseye
  • 20:41 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1002.eqiad.wmnet with reason: host reimage
  • 20:39 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1002.eqiad.wmnet with reason: host reimage
  • 20:26 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc-gp1002.eqiad.wmnet with OS bullseye
  • 18:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P44693 and previous config saved to /var/cache/conftool/dbconfig/20230220-185659-root.json
  • 18:50 taavi: taavi@mwmaint1002:~$ mwscript extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --wiki itwiki --current --all | tee T315510-itwiki.log # T315510
  • 18:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P44692 and previous config saved to /var/cache/conftool/dbconfig/20230220-184154-root.json
  • 18:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P44691 and previous config saved to /var/cache/conftool/dbconfig/20230220-182649-root.json
  • 18:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P44690 and previous config saved to /var/cache/conftool/dbconfig/20230220-181144-root.json
  • 17:26 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 17:16 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 16:50 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: sync
  • 16:49 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: sync
  • 16:49 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: sync
  • 16:48 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: sync
  • 16:41 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 16:40 volans: upgraded spicerack to v6.2.1 to the cumin hosts
  • 16:38 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Christina Macholan out of all services on: 943 hosts
  • 16:38 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Christina Macholan out of all services on: 943 hosts
  • 16:36 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Christina Macholan out of all services on: 1069 hosts
  • 16:36 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Christina Macholan out of all services on: 1069 hosts
  • 16:31 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 16:28 nfraison@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1001.eqiad.wmnet with OS bullseye
  • 16:25 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin2002.codfw.wmnet with reason: test spicerack v6.2.1
  • 16:25 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin2002.codfw.wmnet with reason: test spicerack v6.2.1
  • 16:20 volans: uploaded spicerack_6.2.1 to apt.wikimedia.org bullseye-wikimedia
  • 16:18 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 16:09 nfraison@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1001.eqiad.wmnet with OS bullseye
  • 16:08 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:57 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: sync
  • 15:57 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: sync
  • 15:54 nfraison@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1001.eqiad.wmnet with OS bullseye
  • 15:53 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:43 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:34 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2001.codfw.wmnet
  • 15:24 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:18 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:16 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-staging2001.codfw.wmnet
  • 15:13 TheresNoTime: closing UTC afternoon backport window
  • 15:13 samtar@deploy1002: Finished scap: Backport for PageAssessments.i18n.alias.php: Fix spelling mistake (T328224) (duration: 22m 03s)
  • 15:08 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:04 samtar@deploy1002: samtar: Backport for PageAssessments.i18n.alias.php: Fix spelling mistake (T328224) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 15:03 TheresNoTime: UTC afternoon backport window overrunning
  • 14:58 nfraison@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1001.eqiad.wmnet with OS bullseye
  • 14:51 nfraison@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1001.eqiad.wmnet with OS bullseye
  • 14:51 samtar@deploy1002: Started scap: Backport for PageAssessments.i18n.alias.php: Fix spelling mistake (T328224)
  • 14:38 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Enable WIP Wikibase REST API routes on beta wikidata (T326313) (duration: 08m 12s)
  • 14:31 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and ollieshotton: Backport for Enable WIP Wikibase REST API routes on beta wikidata (T326313) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 14:30 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Enable WIP Wikibase REST API routes on beta wikidata (T326313)
  • 14:25 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2003.codfw.wmnet with OS bullseye
  • 14:20 samtar@deploy1002: Finished scap: Backport for Remove unused $wgLexemeEnableNewAlpha (T307866) (duration: 07m 44s)
  • 14:14 samtar@deploy1002: lucaswerkmeister-wmde and samtar: Backport for Remove unused $wgLexemeEnableNewAlpha (T307866) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 14:12 samtar@deploy1002: Started scap: Backport for Remove unused $wgLexemeEnableNewAlpha (T307866)
  • 14:10 samtar@deploy1002: Finished scap: Backport for zhwiki(books|quote): Enable block feature for AbuseFilter (T330026) (duration: 09m 00s)
  • 14:08 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2003.codfw.wmnet with reason: host reimage
  • 14:05 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2003.codfw.wmnet with reason: host reimage
  • 14:03 samtar@deploy1002: samtar and stang: Backport for zhwiki(books|quote): Enable block feature for AbuseFilter (T330026) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 14:01 samtar@deploy1002: Started scap: Backport for zhwiki(books|quote): Enable block feature for AbuseFilter (T330026)
  • 13:55 nfraison@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1001.eqiad.wmnet with OS bullseye
  • 13:51 nfraison@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1001.eqiad.wmnet with OS bullseye
  • 13:50 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc-gp2003.codfw.wmnet with OS bullseye
  • 13:12 nfraison@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1001.eqiad.wmnet with OS bullseye
  • 13:06 jbond: switch netbox to active/passive (had issues with active/active config)
  • 12:50 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2002.codfw.wmnet with OS bullseye
  • 12:49 jbond: switch netbox to active/active
  • 12:33 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2002.codfw.wmnet with reason: host reimage
  • 12:30 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2002.codfw.wmnet with reason: host reimage
  • 12:19 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on 32 hosts with reason: In setup
  • 12:19 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on 32 hosts with reason: In setup
  • 12:15 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc-gp2002.codfw.wmnet with OS bullseye
  • 12:12 moritzm: installing Java 8 security updates on Bullseye
  • 12:08 moritzm: upload openjdk-8 8u362-ga-4~deb11u1 to component/jdk8 for wikimedia-bullseye (forward port of latest Java 8 security fixes)
  • 11:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 30781
  • 11:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 30781
  • 11:03 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 38565
  • 11:03 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:52 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 10:21 zabe: deployed updated mitigations for T326691
  • 10:21 zabe@deploy1002: Synchronized private/PrivateSettings.php: (no justification provided) (duration: 06m 59s)
  • 10:21 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 38565
  • 10:19 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:16 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 2711
  • 10:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2711
  • 10:12 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Update wmf-plugin - ayounsi@cumin1001
  • 10:11 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Update wmf-plugin - ayounsi@cumin1001
  • 10:09 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 09:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 09:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 09:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2165 T330056', diff saved to https://phabricator.wikimedia.org/P44688 and previous config saved to /var/cache/conftool/dbconfig/20230220-095526-ladsgroup.json
  • 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2161 to s8 primary T330056', diff saved to https://phabricator.wikimedia.org/P44687 and previous config saved to /var/cache/conftool/dbconfig/20230220-095308-ladsgroup.json
  • 09:52 Amir1: Starting s8 codfw failover from db2165 to db2161 - T330056
  • 09:48 akosiaris: Point out risk of MW train failing on Feb 21st in https://wikitech.wikimedia.org/wiki/Deployments#Tuesday,_February_21 due to WikiKube codfw upgrade
  • 09:44 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 09:33 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 09:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2161 with weight 0 T330056', diff saved to https://phabricator.wikimedia.org/P44686 and previous config saved to /var/cache/conftool/dbconfig/20230220-092727-ladsgroup.json
  • 09:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s8 T330056
  • 09:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s8 T330056
  • 09:09 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Move all of NS-related config out of IS.php to a dedicated file, part III (T308932) (duration: 06m 24s)
  • 09:09 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 09:06 XioNoX: delete old group ID custom field from Netbox - https://netbox.wikimedia.org/extras/custom-fields/6/ - T260363
  • 09:01 ladsgroup@deploy1002: Synchronized multiversion/MWConfigCacheGenerator.php: Move all of NS-related config out of IS.php to a dedicated file, part II (T308932) (duration: 06m 47s)
  • 08:58 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:54 ladsgroup@deploy1002: Synchronized wmf-config/core-Namespaces.php: Move all of NS-related config out of IS.php to a dedicated file, part I (T308932) (duration: 16m 10s)
  • 08:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Update wmf-plugin - ayounsi@cumin1001
  • 08:52 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Update wmf-plugin - ayounsi@cumin1001
  • 08:52 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:44 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Mepps out of all services on: 946 hosts
  • 08:43 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Mepps out of all services on: 946 hosts
  • 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Mepps out of all services on: 1067 hosts
  • 08:42 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:41 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:41 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Mepps out of all services on: 1067 hosts
  • 08:41 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:41 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:41 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:41 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:41 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:40 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:40 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:40 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:40 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:40 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:40 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:39 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:39 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:08 moritzm: updating openjdk-11 on elastic* servers T329957
  • 07:44 moritzm: imported jenkins 2.375.3 to thirdparty/ci T330045
  • 07:41 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:41 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 07:40 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:40 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 07:40 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:39 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 07:39 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:39 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 07:27 Amir1: running migrateTagTemplate.php on all wikis (T329766)
  • 06:40 hashar: Restarting Gerrit

2023-02-18

  • 08:29 elukey: kill leftover processes of user `mepps` (offboarded) from stat100[4,5] to unblock puppet
  • 08:24 elukey: delete /var/log/{syslog,messages,user.log).1 on kubestagetcd1005 to free space
  • 08:22 elukey: delete /var/log/{messages,user.log).1 on kubestageetcd1006 to free space
  • 08:21 elukey: delete /var/log/syslog.1 on kubestageetcd1006 to free space

2023-02-17

  • 22:45 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - bking@cumin1001 - T329957
  • 22:09 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - bking@cumin1001 - T329957
  • 22:06 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - bking@cumin1001 - T329957
  • 22:05 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - bking@cumin1001 - T329957
  • 19:30 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1016.eqiad.wmnet
  • 19:02 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1016.eqiad.wmnet
  • 18:46 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase10*.eqiad.wmnet: Restarting Cassandra to apply JVM 1.8.0_362 - eevans@cumin1001
  • 17:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2187.codfw.wmnet with OS bullseye
  • 17:49 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 17:48 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 17:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2187.codfw.wmnet with reason: host reimage
  • 17:30 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2187.codfw.wmnet with reason: host reimage
  • 17:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2185.codfw.wmnet with OS bullseye
  • 17:27 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 17:19 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 17:10 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T329957
  • 17:10 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2187.codfw.wmnet with OS bullseye
  • 17:06 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T329957
  • 17:06 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T329957
  • 17:06 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T329957
  • 17:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2185.codfw.wmnet with reason: host reimage
  • 17:01 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2185.codfw.wmnet with reason: host reimage
  • 16:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2185.codfw.wmnet with OS bullseye
  • 16:40 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2185.codfw.wmnet with OS bullseye
  • 16:31 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2185.codfw.wmnet with OS bullseye
  • 16:20 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db2187']
  • 16:20 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2187']
  • 16:02 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db2187']
  • 16:01 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2187']
  • 16:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2187.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:53 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2187.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:50 elukey@cumin1001: END (PASS) - Cookbook sre.k8s.wipe-cluster (exit_code=0) Wipe the K8s cluster ml-staging-codfw: T327767
  • 15:45 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:42 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db2187']
  • 15:42 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10*.eqiad.wmnet: Restarting Cassandra to apply JVM 1.8.0_362 - eevans@cumin1001
  • 15:41 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2187']
  • 15:40 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db2187']
  • 15:40 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2187']
  • 15:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2185']
  • 15:35 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:29 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db2187']
  • 15:16 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2187']
  • 15:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2187.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host urldownloader2004.wikimedia.org
  • 15:07 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2185']
  • 15:07 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2185.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) urldownloader2004.wikimedia.org on all recursors
  • 15:03 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache urldownloader2004.wikimedia.org on all recursors
  • 15:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM urldownloader2004.wikimedia.org - jmm@cumin2002"
  • 15:02 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM urldownloader2004.wikimedia.org - jmm@cumin2002"
  • 14:58 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2187.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:58 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2186.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:56 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:56 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2186.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:55 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2186.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:55 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:54 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:54 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:54 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:52 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:52 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:52 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:52 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:51 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:51 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:46 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 14:46 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host urldownloader2004.wikimedia.org
  • 14:46 elukey@cumin1001: START - Cookbook sre.k8s.wipe-cluster Wipe the K8s cluster ml-staging-codfw: T327767
  • 14:44 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2186.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:42 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2185.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host urldownloader2003.wikimedia.org
  • 14:38 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:38 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for db218[567] - pt1979@cumin2002"
  • 14:37 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for db218[567] - pt1979@cumin2002"
  • 14:35 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 14:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) urldownloader2003.wikimedia.org on all recursors
  • 14:31 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache urldownloader2003.wikimedia.org on all recursors
  • 14:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM urldownloader2003.wikimedia.org - jmm@cumin2002"
  • 14:28 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM urldownloader2003.wikimedia.org - jmm@cumin2002"
  • 14:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 14:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host urldownloader2003.wikimedia.org
  • 13:40 godog: docker system prune on alert2001 - root fs almost full
  • 13:35 pfischer@deploy1002: Finished deploy [wikimedia/discovery/analytics@3a94765]: T327381: rdf-spark-tools update (duration: 02m 39s)
  • 13:33 pfischer@deploy1002: Started deploy [wikimedia/discovery/analytics@3a94765]: T327381: rdf-spark-tools update
  • 13:31 godog: docker system prune on alert1001 - root fs almost full
  • 12:55 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:55 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:55 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:55 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 12:55 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:55 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 12:54 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:54 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 12:54 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:54 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 12:54 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:54 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 12:52 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:52 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 12:51 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:51 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 12:51 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:50 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 12:50 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:50 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 12:50 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:50 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 12:46 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:46 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 12:46 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:46 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 12:42 jayme@cumin1001: END (PASS) - Cookbook sre.k8s.wipe-cluster (exit_code=0) Wipe the K8s cluster aux-eqiad: T329826
  • 12:38 jayme@cumin1001: START - Cookbook sre.k8s.wipe-cluster Wipe the K8s cluster aux-eqiad: T329826
  • 11:15 elukey@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: JVM upgrades - elukey@cumin1001
  • 10:58 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: JVM upgrades - elukey@cumin1001
  • 10:28 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host idm2001.wikimedia.org with OS bullseye
  • 10:14 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idm2001.wikimedia.org with reason: host reimage
  • 10:11 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on idm2001.wikimedia.org with reason: host reimage
  • 10:00 slyngshede@cumin1001: START - Cookbook sre.ganeti.reimage for host idm2001.wikimedia.org with OS bullseye
  • 08:45 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idm2001.wikimedia.org
  • 08:35 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idm2001.wikimedia.org on all recursors
  • 08:35 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache idm2001.wikimedia.org on all recursors
  • 08:35 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:35 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idm2001.wikimedia.org - slyngshede@cumin1001"
  • 08:33 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idm2001.wikimedia.org - slyngshede@cumin1001"
  • 08:30 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 08:30 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host idm2001.wikimedia.org
  • 02:46 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase[2012,2015-2018,2020,2022,2023,2025-2027].codfw.wmnet: Restarting Cassandra to apply JVM 1.8.0_362 - eevans@cumin1001
  • 00:47 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase[2012,2015-2018,2020,2022,2023,2025-2027].codfw.wmnet: Restarting Cassandra to apply JVM 1.8.0_362 - eevans@cumin1001
  • 00:02 jhuneidi@deploy1002: Installation of scap version "4.37.0" completed for 564 hosts
  • 00:02 jhuneidi@deploy1002: Installing scap version "4.37.0" for 564 hosts

2023-02-16

  • 23:00 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase[2013-2014,2019,2021,2024].codfw.wmnet: Restarting Cassandra to apply JVM 1.8.0_362 - eevans@cumin1001
  • 22:15 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 22:15 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 22:15 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 22:13 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 22:13 rzl@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 22:13 rzl@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 22:13 rzl@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 22:13 rzl@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 22:12 rzl@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 22:12 rzl@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 22:12 rzl@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 22:12 rzl@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 22:12 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 22:11 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 22:11 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 22:10 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 22:10 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 22:09 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 22:09 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 22:08 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 22:08 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 22:07 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase[2013-2014,2019,2021,2024].codfw.wmnet: Restarting Cassandra to apply JVM 1.8.0_362 - eevans@cumin1001
  • 22:07 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 22:07 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 22:06 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 21:24 TheresNoTime: close UTC late backport window
  • 21:21 eileen: civicrm upgraded from efa4c485 to ffc16d2d
  • 21:11 samtar@deploy1002: Finished scap: Backport for Remove Research Incentive survey from swwiki (T321252) (duration: 08m 13s)
  • 21:09 SandraEbele: Added new field referer_data to wmf.webrequest table using the alter table statement
  • 21:04 samtar@deploy1002: samtar and dani: Backport for Remove Research Incentive survey from swwiki (T321252) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 21:03 samtar@deploy1002: Started scap: Backport for Remove Research Incentive survey from swwiki (T321252)
  • 19:29 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.23 refs T325586
  • 19:03 dancy@deploy1002: Installation of scap version "4.36.0" completed for 564 hosts
  • 19:03 dancy@deploy1002: Installing scap version "4.36.0" for 564 hosts
  • 18:56 ebysans@deploy1002: Finished deploy [analytics/refinery@0f1a930] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0f1a930] (duration: 01m 23s)
  • 18:54 ebysans@deploy1002: Started deploy [analytics/refinery@0f1a930] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0f1a930]
  • 18:54 ebysans@deploy1002: Finished deploy [analytics/refinery@0f1a930] (thin): Regular analytics weekly train THIN [analytics/refinery@0f1a930] (duration: 00m 07s)
  • 18:54 ebysans@deploy1002: Started deploy [analytics/refinery@0f1a930] (thin): Regular analytics weekly train THIN [analytics/refinery@0f1a930]
  • 18:52 ebysans@deploy1002: Finished deploy [analytics/refinery@0f1a930]: Regular analytics weekly train [analytics/refinery@0f1a930] (duration: 07m 11s)
  • 18:45 ebysans@deploy1002: Started deploy [analytics/refinery@0f1a930]: Regular analytics weekly train [analytics/refinery@0f1a930]
  • 18:37 SandraEbele: killed webrequest oozie bundle to deploy refinery changes.
  • 18:28 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 18:26 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 18:25 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 18:24 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 18:22 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 18:21 bd808@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 18:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on db2106.codfw.wmnet with reason: DB crashed T329864
  • 18:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on db2106.codfw.wmnet with reason: DB crashed T329864
  • 17:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2106.codfw.wmnet with reason: DB crashed T329864
  • 17:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2106.codfw.wmnet with reason: DB crashed T329864
  • 17:47 jynus@cumin1001: dbctl commit (dc=all): 'Depool db2106', diff saved to https://phabricator.wikimedia.org/P44678 and previous config saved to /var/cache/conftool/dbconfig/20230216-174704-jynus.json
  • 17:38 elukey@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: JVM upgrades - elukey@cumin1001
  • 17:25 papaul: PDU maintenance in rack A8
  • 17:25 papaul: PDU maintenance in rack A1 complete
  • 17:21 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: JVM upgrades - elukey@cumin1001
  • 17:07 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 17:07 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 17:06 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 17:05 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 16:55 SandraEbele: Deployed refinery-source change to remove Github.io from Mediasites definition of referees.
  • 16:21 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Move EventLogging settings from IS.php to ext-EventLogging.php, part III (T308932) (duration: 06m 54s)
  • 16:19 moritzm: installing net-snmp security updates on Buster
  • 16:11 ladsgroup@deploy1002: Synchronized multiversion/MWConfigCacheGenerator.php: Move EventLogging settings from IS.php to ext-EventLogging.php, part II (T308932) (duration: 06m 48s)
  • 16:11 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 16:11 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 16:04 ladsgroup@deploy1002: Synchronized wmf-config/ext-EventLogging.php: Move EventLogging settings from IS.php to ext-EventLogging.php, part I (T308932) (duration: 07m 05s)
  • 15:40 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:39 papaul: PDU maintenance in rack A1
  • 15:39 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:36 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:35 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:32 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:30 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:25 taavi@deploy1002: Finished scap: Backport for GrowthExperiments: Enable link recommendation for 6th round wikis (T304550) (duration: 09m 23s)
  • 14:17 taavi@deploy1002: taavi and sgimeno: Backport for GrowthExperiments: Enable link recommendation for 6th round wikis (T304550) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 14:16 taavi@deploy1002: Started scap: Backport for GrowthExperiments: Enable link recommendation for 6th round wikis (T304550)
  • 14:13 taavi: taavi@mwmaint1002:~$ mwscript updateCollation.php --wiki=simplewiki --previous-collation=uppercase | tee T329815.log # T329815
  • 14:13 taavi@deploy1002: Finished scap: Backport for [simplewiki] Change to 'uca-default-u-kn' category collation (T329815) (duration: 10m 38s)
  • 14:04 taavi@deploy1002: superpes and taavi: Backport for [simplewiki] Change to 'uca-default-u-kn' category collation (T329815) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 14:02 taavi@deploy1002: Started scap: Backport for [simplewiki] Change to 'uca-default-u-kn' category collation (T329815)
  • 12:06 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:06 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:05 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:05 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:05 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:05 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:04 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:04 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:03 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:03 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:03 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:03 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 11:39 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:39 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 11:38 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:38 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:55 claime: repool parse1012 for monitoring of possible CPU1 issues
  • 10:45 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host idm1001.wikimedia.org with OS bullseye
  • 10:37 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync
  • 10:36 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync
  • 10:34 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idm1001.wikimedia.org with reason: host reimage
  • 10:31 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on idm1001.wikimedia.org with reason: host reimage
  • 10:22 moritzm: installing postgresql-11 security updates on maps*
  • 10:20 slyngshede@cumin1001: START - Cookbook sre.ganeti.reimage for host idm1001.wikimedia.org with OS bullseye
  • 10:12 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idm1001.wikimedia.org
  • 10:02 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idm1001.wikimedia.org on all recursors
  • 10:02 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache idm1001.wikimedia.org on all recursors
  • 10:02 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:02 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idm1001.wikimedia.org - slyngshede@cumin1001"
  • 10:01 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idm1001.wikimedia.org - slyngshede@cumin1001"
  • 09:59 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 09:59 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host idm1001.wikimedia.org
  • 09:58 godog: issue test page with: amtool alert add TestPage address=6.6.6.6 team=sre severity=page job=testjob --annotation=runbook=lol --annotation=description='this is a test page, please ignore' --annotation=dashboard=no
  • 09:35 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2001.codfw.wmnet
  • 09:35 godog: puppet cert clean labstore100[67] - T319217
  • 09:27 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet
  • 09:07 moritzm: uploaded openjdk-8 8u362-ga-4~deb10u1 to component/jdk8 for buster-wikimedia (forward port of latest Java 8 security release)
  • 08:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 9584
  • 08:36 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 9584
  • 08:25 moritzm: upgrading cassandra-dev to Java 8u362-ga-4
  • 08:17 apergos: UTC morning backport and config training window done
  • 08:15 kartik@deploy1002: Finished scap: Backport for Enable Section Translation in 9 Wikipedias (T323825 T304865) (duration: 12m 38s)
  • 08:05 kartik@deploy1002: kartik: Backport for Enable Section Translation in 9 Wikipedias (T323825 T304865) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 08:03 kartik@deploy1002: Started scap: Backport for Enable Section Translation in 9 Wikipedias (T323825 T304865)
  • 07:41 elukey: depool parse1012 to allow the service ops team to check it
  • 07:39 elukey: powercycle parse1012 - CPU1 errors registered in `racadm getsel`
  • 07:25 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-airflow1005.eqiad.wmnet with OS buster
  • 06:16 kart_: Updated cxserver to 2023-02-15-085109-production (T328310, T110190, T116466)
  • 06:11 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 06:11 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 06:06 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 06:05 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 06:00 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 06:00 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply

2023-02-15

  • 23:30 dduvall@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.23 refs T325586 (duration: 06m 43s)
  • 23:23 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.23 refs T325586
  • 23:15 ladsgroup@deploy1002: Finished scap: Backport for Change linter maintenance scripts to use existing config varaibles (T329342) (duration: 08m 12s)
  • 23:08 ladsgroup@deploy1002: ladsgroup: Backport for Change linter maintenance scripts to use existing config varaibles (T329342) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 23:06 ladsgroup@deploy1002: Started scap: Backport for Change linter maintenance scripts to use existing config varaibles (T329342)
  • 23:04 dduvall@deploy1002: Finished scap: Backport for Bump wikimedia/parsoid to 0.17.0-a16 (T329740) (duration: 08m 47s)
  • 23:01 Amir1: running linter migrate namespace on all wikis (T329764)
  • 22:57 dduvall@deploy1002: cscott and dduvall: Backport for Bump wikimedia/parsoid to 0.17.0-a16 (T329740) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 22:55 dduvall@deploy1002: Started scap: Backport for Bump wikimedia/parsoid to 0.17.0-a16 (T329740)
  • 22:22 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool sessionstore in codfw: maintenance
  • 22:17 eevans@cumin1001: START - Cookbook sre.discovery.service-route pool sessionstore in codfw: maintenance
  • 21:15 urandom: rebooting sessionstore2001 w/o cookbook — T327954
  • 21:10 dancy@deploy1002: Installation of scap version "4.35.0" completed for 563 hosts
  • 21:09 dancy@deploy1002: Installing scap version "4.35.0" for 563 hosts
  • 20:39 urandom: rebooting sessionstore2001 w/o cookbook — T327954
  • 20:33 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2001.codfw.wmnet
  • 20:23 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2001.codfw.wmnet
  • 20:04 urandom: setting Cassandra query trace probability to 0 (disabled) on sessionstore cluster, codfw datacenter — T327954
  • 20:02 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2001.codfw.wmnet
  • 19:57 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2001.codfw.wmnet
  • 19:48 urandom: setting Cassandra query trace probability to 0.25 on sessionstore cluster, codfw datacenter — T327954
  • 19:42 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool sessionstore in codfw: Depooling while we attempt to reproduce errors — T327954
  • 19:36 eevans@cumin1001: START - Cookbook sre.discovery.service-route depool sessionstore in codfw: Depooling while we attempt to reproduce errors — T327954
  • 19:36 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) check 2 services: maintenance
  • 19:36 eevans@cumin1001: START - Cookbook sre.discovery.service-route check 2 services: maintenance
  • 19:32 dduvall@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.40.0-wmf.23"
  • 19:23 dduvall: rolling back due to spike in parsoid errors (T325586)
  • 19:23 dduvall@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.23 refs T325586 (duration: 06m 36s)
  • 19:18 dduvall: correction: spike may be temporary. holding (T325586)
  • 19:17 dduvall: large spike in undefined property errors. rolling back (T325586)
  • 19:16 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.23 refs T325586
  • 18:41 akosiaris: dummy entry
  • 18:24 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4004.wikimedia.org with OS buster
  • 18:12 moritzm: installing curl security updates on bullseye (not buster)
  • 18:12 moritzm: installing curl security updates on buster
  • 17:52 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-ui1001.eqiad.wmnet
  • 17:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage
  • 17:49 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage
  • 17:47 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-ui1001.eqiad.wmnet
  • 17:31 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS buster
  • 17:24 sukhe: [done] disable puppet on A:dns-auth: merging CR 889560
  • 17:18 sukhe: disable puppet on A:dns-auth: merging CR 889560
  • 16:52 moritzm: installing postgresql-11 security updates
  • 16:45 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling restart_daemons on A:ldap-replicas
  • 16:43 moritzm: restarting Exim on MXes to pick up gnutls security updates
  • 16:42 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling restart_daemons on A:ldap-replicas
  • 16:24 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-airflow1005.eqiad.wmnet with reason: host reimage
  • 16:21 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-airflow1005.eqiad.wmnet with reason: host reimage
  • 16:16 moritzm: installing gnutls28 security updates
  • 16:12 bking@cumin1001: START - Cookbook sre.ganeti.reimage for host an-airflow1005.eqiad.wmnet with OS buster
  • 16:12 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for an-airflow1005.eqiad.wmnet
  • 16:12 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for an-airflow1005.eqiad.wmnet
  • 16:03 thcipriani: restart ci jenkins for updates
  • 16:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P44673 and previous config saved to /var/cache/conftool/dbconfig/20230215-160100-ladsgroup.json
  • 15:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 35467
  • 15:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 35467
  • 15:49 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:49 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:48 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P44672 and previous config saved to /var/cache/conftool/dbconfig/20230215-154555-ladsgroup.json
  • 15:42 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:41 eevans@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99) depool sessionstore in codfw: maintenance
  • 15:41 eevans@cumin1001: START - Cookbook sre.discovery.service-route depool sessionstore in codfw: maintenance
  • 15:40 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:39 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) check 2 services: maintenance
  • 15:39 eevans@cumin1001: START - Cookbook sre.discovery.service-route check 2 services: maintenance
  • 15:39 eevans@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99) depool sessionstore in codfw: maintenance
  • 15:39 eevans@cumin1001: START - Cookbook sre.discovery.service-route depool sessionstore in codfw: maintenance
  • 15:33 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) check 2 services: maintenance
  • 15:33 eevans@cumin1001: START - Cookbook sre.discovery.service-route check 2 services: maintenance
  • 15:33 bking@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:33 bking@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P44671 and previous config saved to /var/cache/conftool/dbconfig/20230215-153050-ladsgroup.json
  • 15:30 bking@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:29 bking@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:27 bking@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:27 bking@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:26 bking@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:24 bking@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P44670 and previous config saved to /var/cache/conftool/dbconfig/20230215-151545-ladsgroup.json
  • 15:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 15:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 14:55 cmooney@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1001.eqiad.wmnet']
  • 14:55 cmooney@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1001.eqiad.wmnet']
  • 14:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 14:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 14:39 cmooney@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1001.eqiad.wmnet']
  • 14:38 cmooney@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1001.eqiad.wmnet']
  • 14:38 cmooney@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1001.eqiad.wmnet']
  • 14:38 cmooney@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1001.eqiad.wmnet']
  • 14:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 14:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 14:33 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:32 TheresNoTime: closing UTC afternoon backport window
  • 14:31 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:31 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:31 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:31 samtar@deploy1002: Finished scap: Backport for InitialiseSettings: install PageAssessments on newiki (T328224) (duration: 08m 25s)
  • 14:30 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:30 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:30 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:30 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:29 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:29 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:29 cmooney@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1001.eqiad.wmnet']
  • 14:27 cmooney@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1001.eqiad.wmnet']
  • 14:24 samtar@deploy1002: samtar: Backport for InitialiseSettings: install PageAssessments on newiki (T328224) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 14:22 samtar@deploy1002: Started scap: Backport for InitialiseSettings: install PageAssessments on newiki (T328224)
  • 14:22 cmooney@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1001.eqiad.wmnet']
  • 14:22 cmooney@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1001.eqiad.wmnet']
  • 14:19 TheresNoTime: `samtar@mwmaint1002:~$ mwscript extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --wiki enwikinews --current --all | tee persistRevisionThreadItems.out.txt` in screen session `25805.T315510` for T315510
  • 14:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 14:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 14:19 samtar@deploy1002: Finished scap: Backport for persistRevisionThreadItems: Avoid listing non-discussion pages (T329627), persistRevisionThreadItems: Avoid listing non-discussion pages (T329627) (duration: 07m 34s)
  • 14:19 elukey@cumin1001: END (PASS) - Cookbook sre.k8s.upgrade-cluster (exit_code=0) Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 14:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-staging2002.codfw.wmnet with OS bullseye
  • 14:14 cmooney@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1001.eqiad.wmnet']
  • 14:13 samtar@deploy1002: matmarex and samtar: Backport for persistRevisionThreadItems: Avoid listing non-discussion pages (T329627), persistRevisionThreadItems: Avoid listing non-discussion pages (T329627) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 14:13 cmooney@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1001.eqiad.wmnet']
  • 14:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 14:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 14:11 samtar@deploy1002: Started scap: Backport for persistRevisionThreadItems: Avoid listing non-discussion pages (T329627), persistRevisionThreadItems: Avoid listing non-discussion pages (T329627)
  • 14:11 samtar@deploy1002: Finished scap: Backport for Enable DiscussionTools on mobile at almost all wikis (T328940) (duration: 09m 13s)
  • 14:04 samtar@deploy1002: samtar and matmarex: Backport for Enable DiscussionTools on mobile at almost all wikis (T328940) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 14:02 samtar@deploy1002: Started scap: Backport for Enable DiscussionTools on mobile at almost all wikis (T328940)
  • 14:02 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging2002.codfw.wmnet with reason: host reimage
  • 13:57 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging2002.codfw.wmnet with reason: host reimage
  • 13:40 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-staging2002.codfw.wmnet with OS bullseye
  • 13:27 TheresNoTime: `[samtar@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php newiki pageassessments` T328224
  • 13:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 13:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 13:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 13:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 13:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2127 T329730', diff saved to https://phabricator.wikimedia.org/P44668 and previous config saved to /var/cache/conftool/dbconfig/20230215-130822-ladsgroup.json
  • 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2105 to s3 primary T329730', diff saved to https://phabricator.wikimedia.org/P44667 and previous config saved to /var/cache/conftool/dbconfig/20230215-130653-ladsgroup.json
  • 13:06 Amir1: Starting s3 codfw failover from db2127 to db2105 - T329730
  • 12:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2105 with weight 0 T329730', diff saved to https://phabricator.wikimedia.org/P44666 and previous config saved to /var/cache/conftool/dbconfig/20230215-124729-ladsgroup.json
  • 12:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Primary switchover s3 T329730
  • 12:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 23 hosts with reason: Primary switchover s3 T329730
  • 12:24 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4004.wikimedia.org with OS buster
  • 11:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-staging2001.codfw.wmnet with OS bullseye
  • 11:32 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging2001.codfw.wmnet with reason: host reimage
  • 11:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging2001.codfw.wmnet with reason: host reimage
  • 11:23 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-airflow1005.eqiad.wmnet with OS bullseye
  • 11:22 Emperor: thanos-be2001 rm /srv/swift-storage/sda3/tmp/b0e33b98-f8be-409b-a9d2-246ad5812db0
  • 11:21 Emperor: thanos-be2001 rm /srv/swift-storage/sda3/tmp/c10e5844-1b19-4c8d-b474-801ad3dd6849
  • 11:10 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-staging2001.codfw.wmnet with OS bullseye
  • 11:09 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ml-staging-ctrl2002.codfw.wmnet with OS bullseye
  • 10:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging-ctrl2002.codfw.wmnet with reason: host reimage
  • 10:52 elukey@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:51 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging-ctrl2002.codfw.wmnet with reason: host reimage
  • 10:45 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be2001.codfw.wmnet
  • 10:42 elukey@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:40 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-staging-ctrl2002.codfw.wmnet with OS bullseye
  • 10:39 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ml-staging-ctrl2001.codfw.wmnet with OS bullseye
  • 10:39 elukey@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:39 elukey@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:39 elukey@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:39 Emperor: discard /var/spool/rsyslog on thanos-be2001 T329712
  • 10:33 elukey@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:29 elukey@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:29 elukey@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:25 elukey@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging-ctrl2001.codfw.wmnet with reason: host reimage
  • 10:21 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging-ctrl2001.codfw.wmnet with reason: host reimage
  • 10:20 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet
  • 10:15 elukey@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:15 elukey@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:15 elukey@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:14 elukey@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:14 elukey@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:14 elukey@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:14 elukey@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:13 elukey@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:13 elukey@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:13 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade
  • 10:10 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-staging-ctrl2001.codfw.wmnet with OS bullseye
  • 10:09 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ml-staging-etcd2003.codfw.wmnet with OS bullseye
  • 09:54 moritzm: installing openjdk-11 security updates
  • 09:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging-etcd2003.codfw.wmnet with reason: host reimage
  • 09:51 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging-etcd2003.codfw.wmnet with reason: host reimage
  • 09:41 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-staging-etcd2003.codfw.wmnet with OS bullseye
  • 09:40 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ml-staging-etcd2002.codfw.wmnet with OS bullseye
  • 09:34 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade
  • 09:32 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Install software version upgrade
  • 09:29 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Install software version upgrade
  • 09:23 cdanis@cumin1001: END (PASS) - Cookbook sre.k8s.upgrade-cluster (exit_code=0) Upgrade K8s version: upgrade to v1.23
  • 09:23 cdanis@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host aux-k8s-worker1002.eqiad.wmnet with OS bullseye
  • 09:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging-etcd2002.codfw.wmnet with reason: host reimage
  • 09:16 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging-etcd2002.codfw.wmnet with reason: host reimage
  • 09:09 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1002.eqiad.wmnet with reason: host reimage
  • 09:06 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-staging-etcd2002.codfw.wmnet with OS bullseye
  • 09:06 cdanis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1002.eqiad.wmnet with reason: host reimage
  • 09:05 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ml-staging-etcd2001.codfw.wmnet with OS bullseye
  • 08:55 Emperor: truncate -s 2GB /srv/log/swift/server.log.1 on thanos-be2001 to free space in /
  • 08:55 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Install software version upgrade
  • 08:54 cdanis@cumin1001: START - Cookbook sre.ganeti.reimage for host aux-k8s-worker1002.eqiad.wmnet with OS bullseye
  • 08:53 cdanis@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host aux-k8s-worker1001.eqiad.wmnet with OS bullseye
  • 08:52 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Install software version upgrade
  • 08:39 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1001.eqiad.wmnet with reason: host reimage
  • 08:36 cdanis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1001.eqiad.wmnet with reason: host reimage
  • 08:25 cdanis@cumin1001: START - Cookbook sre.ganeti.reimage for host aux-k8s-worker1001.eqiad.wmnet with OS bullseye
  • 08:09 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging-etcd2001.codfw.wmnet with reason: host reimage
  • 08:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 08:06 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 08:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging-etcd2001.codfw.wmnet with reason: host reimage
  • 07:59 vgutierrez: rolling upgrade to HAProxy 2.6.8-2 in cp nodes
  • 07:56 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-staging-etcd2001.codfw.wmnet with OS bullseye
  • 07:54 elukey@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 04:27 fab@deploy1002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 11s)
  • 04:27 fab@deploy1002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
  • 03:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dns4004.wikimedia.org with reason: Puppet failure during reimaging
  • 03:22 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dns4004.wikimedia.org with reason: Puppet failure during reimaging
  • 02:03 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-test-worker1001.eqiad.wmnet with OS bullseye
  • 01:44 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage
  • 01:41 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage
  • 01:23 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS buster
  • 01:22 zabe@deploy1002: Synchronized wmf-config/InitialiseSettings.php: T299954 (duration: 06m 50s)
  • 01:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T328255)', diff saved to https://phabricator.wikimedia.org/P44663 and previous config saved to /var/cache/conftool/dbconfig/20230215-011110-ladsgroup.json
  • 00:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P44662 and previous config saved to /var/cache/conftool/dbconfig/20230215-005604-ladsgroup.json
  • 00:49 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) check 2 services: maintenance
  • 00:49 eevans@cumin1001: START - Cookbook sre.discovery.service-route check 2 services: maintenance
  • 00:47 krinkle@deploy1002: Synchronized wmf-config/: I3144a56e17ecb (duration: 06m 33s)
  • 00:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P44661 and previous config saved to /var/cache/conftool/dbconfig/20230215-004058-ladsgroup.json
  • 00:35 krinkle@deploy1002: Synchronized multiversion/: I3144a56e17ecb (duration: 06m 51s)
  • 00:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T328255)', diff saved to https://phabricator.wikimedia.org/P44660 and previous config saved to /var/cache/conftool/dbconfig/20230215-002552-ladsgroup.json
  • 00:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T328255)', diff saved to https://phabricator.wikimedia.org/P44659 and previous config saved to /var/cache/conftool/dbconfig/20230215-002245-ladsgroup.json
  • 00:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 00:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 00:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T328255)', diff saved to https://phabricator.wikimedia.org/P44658 and previous config saved to /var/cache/conftool/dbconfig/20230215-002224-ladsgroup.json
  • 00:15 krinkle@deploy1002: Synchronized src/Profiler.php: Ife7bde (duration: 07m 05s)
  • 00:08 krinkle@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: staging config patch --krinkle (duration: 04m 22s)
  • 00:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P44657 and previous config saved to /var/cache/conftool/dbconfig/20230215-000717-ladsgroup.json
  • 00:03 krinkle@deploy1002: Locking from deployment [ALL REPOSITORIES]: staging config patch --krinkle (planned duration: 60m 00s)
  • 00:02 cwhite@deploy1002: Finished deploy [releng/phatality@b1a2a70]: T314098 (duration: 00m 14s)
  • 00:01 cwhite@deploy1002: Started deploy [releng/phatality@b1a2a70]: T314098

2023-02-14

  • 23:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P44656 and previous config saved to /var/cache/conftool/dbconfig/20230214-235211-ladsgroup.json
  • 23:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T328255)', diff saved to https://phabricator.wikimedia.org/P44655 and previous config saved to /var/cache/conftool/dbconfig/20230214-233705-ladsgroup.json
  • 23:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T328255)', diff saved to https://phabricator.wikimedia.org/P44654 and previous config saved to /var/cache/conftool/dbconfig/20230214-233058-ladsgroup.json
  • 23:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 23:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 23:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T328255)', diff saved to https://phabricator.wikimedia.org/P44653 and previous config saved to /var/cache/conftool/dbconfig/20230214-233037-ladsgroup.json
  • 23:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P44652 and previous config saved to /var/cache/conftool/dbconfig/20230214-231531-ladsgroup.json
  • 23:14 cwhite@deploy1002: Finished deploy [releng/phatality@eaa4c16]: T314098 (duration: 00m 07s)
  • 23:14 cwhite@deploy1002: Started deploy [releng/phatality@eaa4c16]: T314098
  • 23:14 cwhite@deploy1002: Finished deploy [releng/phatality@eaa4c16]: T314098 (duration: 10m 40s)
  • 23:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2442.codfw.wmnet with OS buster
  • 23:11 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2443.codfw.wmnet with OS buster
  • 23:09 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:07 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:05 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:03 cwhite@deploy1002: Started deploy [releng/phatality@eaa4c16]: T314098
  • 23:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P44651 and previous config saved to /var/cache/conftool/dbconfig/20230214-230025-ladsgroup.json
  • 22:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2443.codfw.wmnet with reason: host reimage
  • 22:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2442.codfw.wmnet with reason: host reimage
  • 22:46 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2443.codfw.wmnet with reason: host reimage
  • 22:45 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2442.codfw.wmnet with reason: host reimage
  • 22:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T328255)', diff saved to https://phabricator.wikimedia.org/P44650 and previous config saved to /var/cache/conftool/dbconfig/20230214-224519-ladsgroup.json
  • 22:39 eoghan@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host aphlict2001.codfw.wmnet with OS bullseye
  • 22:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T328255)', diff saved to https://phabricator.wikimedia.org/P44649 and previous config saved to /var/cache/conftool/dbconfig/20230214-223931-ladsgroup.json
  • 22:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 22:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 22:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T328255)', diff saved to https://phabricator.wikimedia.org/P44648 and previous config saved to /var/cache/conftool/dbconfig/20230214-223910-ladsgroup.json
  • 22:28 eoghan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aphlict2001.codfw.wmnet with reason: host reimage
  • 22:26 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-airflow1005.eqiad.wmnet with reason: new OS but some puppet stuff doesn't work yet
  • 22:26 bking@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on an-airflow1005.eqiad.wmnet with reason: new OS but some puppet stuff doesn't work yet
  • 22:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2443.codfw.wmnet with OS buster
  • 22:25 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2442.codfw.wmnet with OS buster
  • 22:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2441.codfw.wmnet with OS buster
  • 22:25 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:25 eoghan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on aphlict2001.codfw.wmnet with reason: host reimage
  • 22:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2440.codfw.wmnet with OS buster
  • 22:25 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P44647 and previous config saved to /var/cache/conftool/dbconfig/20230214-222403-ladsgroup.json
  • 22:10 eoghan@cumin2002: START - Cookbook sre.ganeti.reimage for host aphlict2001.codfw.wmnet with OS bullseye
  • 22:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P44646 and previous config saved to /var/cache/conftool/dbconfig/20230214-220857-ladsgroup.json
  • 22:04 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:59 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T328255)', diff saved to https://phabricator.wikimedia.org/P44645 and previous config saved to /var/cache/conftool/dbconfig/20230214-215351-ladsgroup.json
  • 21:47 dancy@deploy1002: Finished scap: Backport for Change linter maintenance scripts to use existing config varaibles (T329342) (duration: 07m 44s)
  • 21:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2441.codfw.wmnet with reason: host reimage
  • 21:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T328255)', diff saved to https://phabricator.wikimedia.org/P44644 and previous config saved to /var/cache/conftool/dbconfig/20230214-214642-ladsgroup.json
  • 21:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 21:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 21:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T328255)', diff saved to https://phabricator.wikimedia.org/P44643 and previous config saved to /var/cache/conftool/dbconfig/20230214-214621-ladsgroup.json
  • 21:44 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2441.codfw.wmnet with reason: host reimage
  • 21:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2440.codfw.wmnet with reason: host reimage
  • 21:41 dancy@deploy1002: dancy and sbailey: Backport for Change linter maintenance scripts to use existing config varaibles (T329342) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 21:39 dancy@deploy1002: Started scap: Backport for Change linter maintenance scripts to use existing config varaibles (T329342)
  • 21:39 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2440.codfw.wmnet with reason: host reimage
  • 21:35 dancy@deploy1002: Backport cancelled.
  • 21:34 dancy@deploy1002: Finished scap: Backport for Enable Page Tools for logged in users across all wikis (T328692) (duration: 12m 50s)
  • 21:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P44642 and previous config saved to /var/cache/conftool/dbconfig/20230214-213115-ladsgroup.json
  • 21:23 dancy@deploy1002: dancy and bwang: Backport for Enable Page Tools for logged in users across all wikis (T328692) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 21:21 dancy@deploy1002: Started scap: Backport for Enable Page Tools for logged in users across all wikis (T328692)
  • 21:20 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2441.codfw.wmnet with OS buster
  • 21:20 dancy@deploy1002: Backport cancelled.
  • 21:19 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2440.codfw.wmnet with OS buster
  • 21:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P44640 and previous config saved to /var/cache/conftool/dbconfig/20230214-211608-ladsgroup.json
  • 21:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T328255)', diff saved to https://phabricator.wikimedia.org/P44639 and previous config saved to /var/cache/conftool/dbconfig/20230214-210102-ladsgroup.json
  • 20:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T328255)', diff saved to https://phabricator.wikimedia.org/P44638 and previous config saved to /var/cache/conftool/dbconfig/20230214-205709-ladsgroup.json
  • 20:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 20:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 20:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 20:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 20:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T328255)', diff saved to https://phabricator.wikimedia.org/P44637 and previous config saved to /var/cache/conftool/dbconfig/20230214-205633-ladsgroup.json
  • 20:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P44636 and previous config saved to /var/cache/conftool/dbconfig/20230214-204126-ladsgroup.json
  • 20:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P44635 and previous config saved to /var/cache/conftool/dbconfig/20230214-202620-ladsgroup.json
  • 20:24 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-airflow1005.eqiad.wmnet with reason: host reimage
  • 20:21 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-airflow1005.eqiad.wmnet with reason: host reimage
  • 20:21 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dns4004.wikimedia.org with reason: failure during reimaging
  • 20:21 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dns4004.wikimedia.org with reason: failure during reimaging
  • 20:20 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4004.wikimedia.org with OS buster
  • 20:12 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.23 refs T325586
  • 20:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T328255)', diff saved to https://phabricator.wikimedia.org/P44634 and previous config saved to /var/cache/conftool/dbconfig/20230214-201114-ladsgroup.json
  • 20:09 bking@cumin1001: START - Cookbook sre.ganeti.reimage for host an-airflow1005.eqiad.wmnet with OS bullseye
  • 20:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T328255)', diff saved to https://phabricator.wikimedia.org/P44633 and previous config saved to /var/cache/conftool/dbconfig/20230214-200822-ladsgroup.json
  • 20:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 20:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 20:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T328255)', diff saved to https://phabricator.wikimedia.org/P44632 and previous config saved to /var/cache/conftool/dbconfig/20230214-200801-ladsgroup.json
  • 19:55 AndyRussG: update SmashPig 683df497 -> c6775c60
  • 19:53 dduvall@deploy1002: Pruned MediaWiki: 1.40.0-wmf.21 (duration: 02m 10s)
  • 19:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P44631 and previous config saved to /var/cache/conftool/dbconfig/20230214-195255-ladsgroup.json
  • 19:50 dduvall@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.23 refs T325586 (duration: 09m 14s)
  • 19:46 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: name=elastic2069.codfw.wmnet,service=elasticsearch-psi-ssl
  • 19:46 cdanis@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host aux-k8s-ctrl1002.eqiad.wmnet with OS bullseye
  • 19:46 gehel@puppetmaster1001: conftool action : set/weight=10; selector: name=elastic2069.codfw.wmnet,service=elasticsearch-psi-ssl
  • 19:45 gehel@puppetmaster1001: conftool action : set/weight=10; selector: name=elastic2069.cofdw.wmnet,service=elasticsearch-psi-ssl
  • 19:43 gehel@puppetmaster1001: conftool action : set/weight=10; selector: name=elastic2069.cofdw.wmnet
  • 19:43 gehel@puppetmaster1001: conftool action : set/pooled=active; selector: name=elastic2069.cofdw.wmnet
  • 19:41 dduvall@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.23 refs T325586
  • 19:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P44630 and previous config saved to /var/cache/conftool/dbconfig/20230214-193748-ladsgroup.json
  • 19:37 dduvall: did not run `docker system prune` due to objections
  • 19:36 mutante: root@deploy1002:/srv# rm -rf deployment.T307349/
  • 19:34 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['mc-gp1003']
  • 19:34 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-ctrl1002.eqiad.wmnet with reason: host reimage
  • 19:33 dduvall: running `docker system prune` on deploy1002 to free up disk space on /srv
  • 19:31 dduvall: scap sync-world failed due to lack of disk space on deploy1002 /srv (cc T325586)
  • 19:31 cdanis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-ctrl1002.eqiad.wmnet with reason: host reimage
  • 19:27 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp1003']
  • 19:25 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['mc-gp1003']
  • 19:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T328255)', diff saved to https://phabricator.wikimedia.org/P44629 and previous config saved to /var/cache/conftool/dbconfig/20230214-192242-ladsgroup.json
  • 19:22 cdanis@cumin1001: START - Cookbook sre.ganeti.reimage for host aux-k8s-ctrl1002.eqiad.wmnet with OS bullseye
  • 19:21 cdanis@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host aux-k8s-ctrl1001.eqiad.wmnet with OS bullseye
  • 19:17 papaul: upgrading firmware on mc-gp1003
  • 19:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T328255)', diff saved to https://phabricator.wikimedia.org/P44628 and previous config saved to /var/cache/conftool/dbconfig/20230214-191550-ladsgroup.json
  • 19:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 19:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 19:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 19:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 19:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 19:09 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-ctrl1001.eqiad.wmnet with reason: host reimage
  • 19:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 19:08 dduvall@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.23 refs T325586 (duration: 51m 03s)
  • 19:07 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp1003']
  • 19:07 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['mc-gp1003']
  • 19:06 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp1003']
  • 19:06 cdanis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-ctrl1001.eqiad.wmnet with reason: host reimage
  • 18:58 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage
  • 18:56 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage
  • 18:55 cdanis@cumin1001: START - Cookbook sre.ganeti.reimage for host aux-k8s-ctrl1001.eqiad.wmnet with OS bullseye
  • 18:54 cdanis@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: upgrade to v1.23
  • 18:47 cdanis@cumin1001: END (FAIL) - Cookbook sre.k8s.upgrade-cluster (exit_code=99) Upgrade K8s version: upgrade to v1.23
  • 18:46 cdanis@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: upgrade to v1.23
  • 18:38 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS buster
  • 18:38 sukhe: reimage dns4004 back to buster to resolve pdns-rec Prometheus endpoit issues: T321309
  • 18:37 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['mc-gp1002']
  • 18:37 cdanis@cumin1001: END (FAIL) - Cookbook sre.k8s.upgrade-cluster (exit_code=99) Upgrade K8s version: upgrade to v1.23
  • 18:36 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns4004.wikimedia.org with OS bullseye
  • 18:35 cdanis@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: upgrade to v1.23
  • 18:29 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp1002']
  • 18:24 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp1002']
  • 18:17 dduvall@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.23 refs T325586
  • 18:15 papaul: upgrading firmware on mc-gp1002
  • 18:10 dduvall: refactored failed security patch for T278365
  • 18:06 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp1002']
  • 17:48 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage
  • 17:45 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage
  • 17:31 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:31 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for cloudcephosd1002 - cmooney@cumin1001"
  • 17:30 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for cloudcephosd1002 - cmooney@cumin1001"
  • 17:29 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1002.eqiad.wmnet with OS bullseye
  • 17:28 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 17:28 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS bullseye
  • 17:28 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:28 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for cloudcephosd1002 - cmooney@cumin1001"
  • 17:27 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for cloudcephosd1002 - cmooney@cumin1001"
  • 17:25 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 17:24 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:24 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for cloudcephosd1002 - cmooney@cumin1001"
  • 17:23 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for cloudcephosd1002 - cmooney@cumin1001"
  • 17:20 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 17:18 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
  • 17:18 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
  • 17:17 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:16 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 17:12 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1002.eqiad.wmnet with reason: host reimage
  • 17:09 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1002.eqiad.wmnet with reason: host reimage
  • 17:05 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams - rc1.mediawiki.page_change: enable on all wikis (duration: 07m 11s)
  • 16:58 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1002.eqiad.wmnet with OS bullseye
  • 16:56 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1002.eqiad.wmnet with OS bullseye
  • 16:48 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1002.eqiad.wmnet with reason: host reimage
  • 16:45 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1002.eqiad.wmnet with reason: host reimage
  • 16:38 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 16:37 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 16:36 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 16:36 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 16:36 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 16:35 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 16:34 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 16:34 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 16:34 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 16:34 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 16:33 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 16:33 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1002.eqiad.wmnet with OS bullseye
  • 16:33 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 16:31 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 16:29 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 16:28 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 16:27 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 16:27 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1002.eqiad.wmnet with OS bullseye
  • 16:27 andrew@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin2002"
  • 16:22 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 16:17 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams - Produce rc1.mediawiki.page_change to eventgate-main (duration: 09m 01s)
  • 16:16 andrew@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin2002"
  • 16:12 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 16:09 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 16:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2451.codfw.wmnet with OS buster
  • 16:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2448.codfw.wmnet with OS buster
  • 16:04 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2450.codfw.wmnet with OS buster
  • 16:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2449.codfw.wmnet with OS buster
  • 16:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:02 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1002.eqiad.wmnet with reason: host reimage
  • 15:59 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1002.eqiad.wmnet with reason: host reimage
  • 15:59 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:54 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:53 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:52 moritzm: uploaded src:icu67 67.1-7~wmf1 to buster-wikimedia/component/icu67 T329491
  • 15:50 inflatador: bking@deploy1002 'deploying rdf-streaming-updater prod eqiad T304914'
  • 15:50 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 15:49 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 15:48 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1002.eqiad.wmnet with OS bullseye
  • 15:45 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:43 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1002.eqiad.wmnet with OS bullseye
  • 15:41 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:39 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:38 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1002.eqiad.wmnet with reason: host reimage
  • 15:34 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1002.eqiad.wmnet with reason: host reimage
  • 15:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2451.codfw.wmnet with reason: host reimage
  • 15:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2450.codfw.wmnet with reason: host reimage
  • 15:27 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2451.codfw.wmnet with reason: host reimage
  • 15:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2449.codfw.wmnet with reason: host reimage
  • 15:24 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1002.eqiad.wmnet with OS bullseye
  • 15:23 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2450.codfw.wmnet with reason: host reimage
  • 15:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2448.codfw.wmnet with reason: host reimage
  • 15:21 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1002.eqiad.wmnet with OS bullseye
  • 15:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2449.codfw.wmnet with reason: host reimage
  • 15:19 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2448.codfw.wmnet with reason: host reimage
  • 15:13 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1002.eqiad.wmnet with reason: host reimage
  • 15:10 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1002.eqiad.wmnet with reason: host reimage
  • 15:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2451.codfw.wmnet with OS buster
  • 15:05 moritzm: installing openjdk-11 security updates
  • 15:04 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2450.codfw.wmnet with OS buster
  • 15:01 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2449.codfw.wmnet with OS buster
  • 14:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2448.codfw.wmnet with OS buster
  • 14:57 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1002.eqiad.wmnet with OS bullseye
  • 14:54 godog: roll-restart pybal in eqiad/codfw to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/889083
  • 14:43 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.upgrade-cluster (exit_code=99) Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 14:41 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1001.eqiad.wmnet with OS bullseye
  • 14:41 andrew@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin2002"
  • 14:41 elukey@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 14:40 andrew@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin2002"
  • 14:30 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 14:28 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
  • 14:28 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 14:26 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 14:25 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 14:24 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
  • 14:23 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 14:23 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 14:23 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 14:23 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
  • 14:22 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 14:22 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
  • 14:21 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 14:20 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 14:19 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 14:18 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 14:18 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 14:17 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 14:11 moritzm: installing libde265 security updates
  • 13:23 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1001.eqiad.wmnet with reason: host reimage
  • 13:20 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1001.eqiad.wmnet with reason: host reimage
  • 13:08 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1001.eqiad.wmnet with OS bullseye
  • 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44624 and previous config saved to /var/cache/conftool/dbconfig/20230214-130708-root.json
  • 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44623 and previous config saved to /var/cache/conftool/dbconfig/20230214-125203-root.json
  • 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44622 and previous config saved to /var/cache/conftool/dbconfig/20230214-123659-root.json
  • 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44621 and previous config saved to /var/cache/conftool/dbconfig/20230214-122154-root.json
  • 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44620 and previous config saved to /var/cache/conftool/dbconfig/20230214-120649-root.json
  • 11:53 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema2004.codfw.wmnet
  • 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44619 and previous config saved to /var/cache/conftool/dbconfig/20230214-115144-root.json
  • 11:49 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host schema2004.codfw.wmnet
  • 11:47 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema2003.codfw.wmnet
  • 11:44 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host schema2003.codfw.wmnet
  • 11:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema1004.eqiad.wmnet
  • 11:28 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host schema1004.eqiad.wmnet
  • 11:24 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema1003.eqiad.wmnet
  • 11:20 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host schema1003.eqiad.wmnet
  • 10:58 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9 to netbox-next - volans@cumin1001
  • 10:56 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9 to netbox-next - volans@cumin1001
  • 10:56 volans@cumin1001: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9 to netbox-next - volans@cumin1001
  • 10:56 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9 to netbox-next - volans@cumin1001
  • 10:19 moritzm: installing imagemagick security updates on bullseye
  • 10:09 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9 to netbox-next - volans@cumin1001
  • 10:08 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9 to netbox-next - volans@cumin1001
  • 10:03 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9 to netbox-next - volans@cumin1001
  • 10:01 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9 to netbox-next - volans@cumin1001
  • 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1193 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44618 and previous config saved to /var/cache/conftool/dbconfig/20230214-095544-root.json
  • 09:50 godog: roll-restart pybal in eqiad/codfw to pick up logs-api service - T320702
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1193 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44616 and previous config saved to /var/cache/conftool/dbconfig/20230214-094040-root.json
  • 09:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1109.eqiad.wmnet with reason: Maintenance
  • 09:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1109.eqiad.wmnet with reason: Maintenance
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1193 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44615 and previous config saved to /var/cache/conftool/dbconfig/20230214-092535-root.json
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1193 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44614 and previous config saved to /var/cache/conftool/dbconfig/20230214-091030-root.json
  • 09:04 filippo@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: service=logs-api
  • 08:57 filippo@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,service=logs-api
  • 08:55 filippo@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: service=logs-api,dc=codfw
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1193 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44613 and previous config saved to /var/cache/conftool/dbconfig/20230214-085525-root.json
  • 08:54 filippo@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: service=logs-api
  • 08:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm6001.drmrs.wmnet
  • 08:50 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:50 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: testvm6001.drmrs.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:44 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: testvm6001.drmrs.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1193 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44612 and previous config saved to /var/cache/conftool/dbconfig/20230214-084020-root.json
  • 08:33 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T328817)', diff saved to https://phabricator.wikimedia.org/P44611 and previous config saved to /var/cache/conftool/dbconfig/20230214-083022-marostegui.json
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1193 (T328817)', diff saved to https://phabricator.wikimedia.org/P44610 and previous config saved to /var/cache/conftool/dbconfig/20230214-082915-marostegui.json
  • 08:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 08:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T328817)', diff saved to https://phabricator.wikimedia.org/P44609 and previous config saved to /var/cache/conftool/dbconfig/20230214-082854-marostegui.json
  • 08:26 vgutierrez: rolling upgrade to HAProxy 2.6.8 in eqsin - T321775
  • 08:25 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm6001.drmrs.wmnet
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P44608 and previous config saved to /var/cache/conftool/dbconfig/20230214-081348-marostegui.json
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P44607 and previous config saved to /var/cache/conftool/dbconfig/20230214-075842-marostegui.json
  • 07:58 XioNoX: enable CF in esams
  • 07:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 07:58 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 07:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 07:57 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 07:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T328817)', diff saved to https://phabricator.wikimedia.org/P44606 and previous config saved to /var/cache/conftool/dbconfig/20230214-074335-marostegui.json
  • 07:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 07:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 07:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 07:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 07:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T329203)', diff saved to https://phabricator.wikimedia.org/P44605 and previous config saved to /var/cache/conftool/dbconfig/20230214-072452-marostegui.json
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1192 (T328817)', diff saved to https://phabricator.wikimedia.org/P44604 and previous config saved to /var/cache/conftool/dbconfig/20230214-072157-marostegui.json
  • 07:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 07:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T328817)', diff saved to https://phabricator.wikimedia.org/P44603 and previous config saved to /var/cache/conftool/dbconfig/20230214-072136-marostegui.json
  • 07:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1099.eqiad.wmnet
  • 07:12 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:12 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1099.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 07:10 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1099.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P44602 and previous config saved to /var/cache/conftool/dbconfig/20230214-070946-marostegui.json
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P44601 and previous config saved to /var/cache/conftool/dbconfig/20230214-070630-marostegui.json
  • 07:01 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 06:57 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1099.eqiad.wmnet
  • 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P44600 and previous config saved to /var/cache/conftool/dbconfig/20230214-065440-marostegui.json
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P44599 and previous config saved to /var/cache/conftool/dbconfig/20230214-065123-marostegui.json
  • 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T329203)', diff saved to https://phabricator.wikimedia.org/P44598 and previous config saved to /var/cache/conftool/dbconfig/20230214-063933-marostegui.json
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T328817)', diff saved to https://phabricator.wikimedia.org/P44597 and previous config saved to /var/cache/conftool/dbconfig/20230214-063617-marostegui.json
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1202 (T329203)', diff saved to https://phabricator.wikimedia.org/P44596 and previous config saved to /var/cache/conftool/dbconfig/20230214-062118-marostegui.json
  • 06:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 06:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T329203)', diff saved to https://phabricator.wikimedia.org/P44595 and previous config saved to /var/cache/conftool/dbconfig/20230214-062057-marostegui.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T328817)', diff saved to https://phabricator.wikimedia.org/P44594 and previous config saved to /var/cache/conftool/dbconfig/20230214-061434-marostegui.json
  • 06:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 06:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T328817)', diff saved to https://phabricator.wikimedia.org/P44593 and previous config saved to /var/cache/conftool/dbconfig/20230214-061413-marostegui.json
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P44592 and previous config saved to /var/cache/conftool/dbconfig/20230214-060551-marostegui.json
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P44591 and previous config saved to /var/cache/conftool/dbconfig/20230214-055906-marostegui.json
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P44590 and previous config saved to /var/cache/conftool/dbconfig/20230214-055044-marostegui.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P44589 and previous config saved to /var/cache/conftool/dbconfig/20230214-054400-marostegui.json
  • 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T329203)', diff saved to https://phabricator.wikimedia.org/P44588 and previous config saved to /var/cache/conftool/dbconfig/20230214-053538-marostegui.json
  • 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T329203)', diff saved to https://phabricator.wikimedia.org/P44587 and previous config saved to /var/cache/conftool/dbconfig/20230214-053325-marostegui.json
  • 05:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 05:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T329203)', diff saved to https://phabricator.wikimedia.org/P44586 and previous config saved to /var/cache/conftool/dbconfig/20230214-053304-marostegui.json
  • 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T328817)', diff saved to https://phabricator.wikimedia.org/P44585 and previous config saved to /var/cache/conftool/dbconfig/20230214-052854-marostegui.json
  • 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P44584 and previous config saved to /var/cache/conftool/dbconfig/20230214-051758-marostegui.json
  • 05:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T328817)', diff saved to https://phabricator.wikimedia.org/P44583 and previous config saved to /var/cache/conftool/dbconfig/20230214-050644-marostegui.json
  • 05:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 05:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 05:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T328817)', diff saved to https://phabricator.wikimedia.org/P44582 and previous config saved to /var/cache/conftool/dbconfig/20230214-050623-marostegui.json
  • 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P44581 and previous config saved to /var/cache/conftool/dbconfig/20230214-050252-marostegui.json
  • 04:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P44580 and previous config saved to /var/cache/conftool/dbconfig/20230214-045117-marostegui.json
  • 04:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T329203)', diff saved to https://phabricator.wikimedia.org/P44579 and previous config saved to /var/cache/conftool/dbconfig/20230214-044745-marostegui.json
  • 04:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1191 (T329203)', diff saved to https://phabricator.wikimedia.org/P44578 and previous config saved to /var/cache/conftool/dbconfig/20230214-044432-marostegui.json
  • 04:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 04:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 04:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T329203)', diff saved to https://phabricator.wikimedia.org/P44577 and previous config saved to /var/cache/conftool/dbconfig/20230214-044411-marostegui.json
  • 04:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P44576 and previous config saved to /var/cache/conftool/dbconfig/20230214-043610-marostegui.json
  • 04:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P44575 and previous config saved to /var/cache/conftool/dbconfig/20230214-042905-marostegui.json
  • 04:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T328817)', diff saved to https://phabricator.wikimedia.org/P44574 and previous config saved to /var/cache/conftool/dbconfig/20230214-042104-marostegui.json
  • 04:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P44573 and previous config saved to /var/cache/conftool/dbconfig/20230214-041359-marostegui.json
  • 03:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T328817)', diff saved to https://phabricator.wikimedia.org/P44572 and previous config saved to /var/cache/conftool/dbconfig/20230214-035922-marostegui.json
  • 03:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 03:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 03:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T329203)', diff saved to https://phabricator.wikimedia.org/P44571 and previous config saved to /var/cache/conftool/dbconfig/20230214-035852-marostegui.json
  • 03:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T329203)', diff saved to https://phabricator.wikimedia.org/P44570 and previous config saved to /var/cache/conftool/dbconfig/20230214-035639-marostegui.json
  • 03:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 03:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 03:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T329203)', diff saved to https://phabricator.wikimedia.org/P44569 and previous config saved to /var/cache/conftool/dbconfig/20230214-035618-marostegui.json
  • 03:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P44568 and previous config saved to /var/cache/conftool/dbconfig/20230214-034112-marostegui.json
  • 03:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2447.codfw.wmnet with OS buster
  • 03:29 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 03:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P44567 and previous config saved to /var/cache/conftool/dbconfig/20230214-032606-marostegui.json
  • 03:22 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 03:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T329203)', diff saved to https://phabricator.wikimedia.org/P44566 and previous config saved to /var/cache/conftool/dbconfig/20230214-031059-marostegui.json
  • 03:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2447.codfw.wmnet with reason: host reimage
  • 03:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2445.codfw.wmnet with OS buster
  • 03:04 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 03:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2446.codfw.wmnet with OS buster
  • 03:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 03:04 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2447.codfw.wmnet with reason: host reimage
  • 03:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T329203)', diff saved to https://phabricator.wikimedia.org/P44565 and previous config saved to /var/cache/conftool/dbconfig/20230214-030345-marostegui.json
  • 03:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 03:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 02:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 02:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 02:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T329203)', diff saved to https://phabricator.wikimedia.org/P44564 and previous config saved to /var/cache/conftool/dbconfig/20230214-025917-marostegui.json
  • 02:58 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:57 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:55 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2444.codfw.wmnet with OS buster
  • 02:55 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:53 tgr:: Deployed security patch for T328643
  • 02:52 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P44563 and previous config saved to /var/cache/conftool/dbconfig/20230214-024410-marostegui.json
  • 02:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2447.codfw.wmnet with OS buster
  • 02:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2446.codfw.wmnet with reason: host reimage
  • 02:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2445.codfw.wmnet with reason: host reimage
  • 02:39 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2446.codfw.wmnet with reason: host reimage
  • 02:38 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2445.codfw.wmnet with reason: host reimage
  • 02:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2444.codfw.wmnet with reason: host reimage
  • 02:31 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2444.codfw.wmnet with reason: host reimage
  • 02:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P44562 and previous config saved to /var/cache/conftool/dbconfig/20230214-022904-marostegui.json
  • 02:19 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2446.codfw.wmnet with OS buster
  • 02:18 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2445.codfw.wmnet with OS buster
  • 02:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T329203)', diff saved to https://phabricator.wikimedia.org/P44561 and previous config saved to /var/cache/conftool/dbconfig/20230214-021358-marostegui.json
  • 02:12 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2444.codfw.wmnet with OS buster
  • 02:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T329203)', diff saved to https://phabricator.wikimedia.org/P44560 and previous config saved to /var/cache/conftool/dbconfig/20230214-020852-marostegui.json
  • 02:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 02:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 02:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T329203)', diff saved to https://phabricator.wikimedia.org/P44559 and previous config saved to /var/cache/conftool/dbconfig/20230214-020831-marostegui.json
  • 02:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 02:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 02:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T328817)', diff saved to https://phabricator.wikimedia.org/P44558 and previous config saved to /var/cache/conftool/dbconfig/20230214-020748-marostegui.json
  • 02:01 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mw2440.codfw.wmnet with OS buster
  • 01:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P44557 and previous config saved to /var/cache/conftool/dbconfig/20230214-015325-marostegui.json
  • 01:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P44556 and previous config saved to /var/cache/conftool/dbconfig/20230214-015242-marostegui.json
  • 01:52 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2440.codfw.wmnet with OS buster
  • 01:42 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw2443.codfw.wmnet with OS buster
  • 01:39 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw2442.codfw.wmnet with OS buster
  • 01:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P44555 and previous config saved to /var/cache/conftool/dbconfig/20230214-013818-marostegui.json
  • 01:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P44554 and previous config saved to /var/cache/conftool/dbconfig/20230214-013736-marostegui.json
  • 01:37 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw2441.codfw.wmnet with OS buster
  • 01:35 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw2440.codfw.wmnet with OS buster
  • 01:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T329203)', diff saved to https://phabricator.wikimedia.org/P44553 and previous config saved to /var/cache/conftool/dbconfig/20230214-012312-marostegui.json
  • 01:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T328817)', diff saved to https://phabricator.wikimedia.org/P44552 and previous config saved to /var/cache/conftool/dbconfig/20230214-012230-marostegui.json
  • 01:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T329203)', diff saved to https://phabricator.wikimedia.org/P44551 and previous config saved to /var/cache/conftool/dbconfig/20230214-011758-marostegui.json
  • 01:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 01:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 01:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 01:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 01:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T329203)', diff saved to https://phabricator.wikimedia.org/P44550 and previous config saved to /var/cache/conftool/dbconfig/20230214-011720-marostegui.json
  • 01:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P44549 and previous config saved to /var/cache/conftool/dbconfig/20230214-010214-marostegui.json
  • 00:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P44548 and previous config saved to /var/cache/conftool/dbconfig/20230214-004707-marostegui.json
  • 00:46 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2443.codfw.wmnet with OS buster
  • 00:43 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2442.codfw.wmnet with OS buster
  • 00:40 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2441.codfw.wmnet with OS buster
  • 00:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2440.codfw.wmnet with OS buster
  • 00:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T329203)', diff saved to https://phabricator.wikimedia.org/P44547 and previous config saved to /var/cache/conftool/dbconfig/20230214-003201-marostegui.json
  • 00:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T329203)', diff saved to https://phabricator.wikimedia.org/P44546 and previous config saved to /var/cache/conftool/dbconfig/20230214-002620-marostegui.json
  • 00:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 00:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 00:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T329203)', diff saved to https://phabricator.wikimedia.org/P44545 and previous config saved to /var/cache/conftool/dbconfig/20230214-002559-marostegui.json
  • 00:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2438.codfw.wmnet with OS buster
  • 00:22 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2439.codfw.wmnet with OS buster
  • 00:22 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T328817)', diff saved to https://phabricator.wikimedia.org/P44544 and previous config saved to /var/cache/conftool/dbconfig/20230214-002214-marostegui.json
  • 00:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 00:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 00:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 00:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 00:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T328817)', diff saved to https://phabricator.wikimedia.org/P44543 and previous config saved to /var/cache/conftool/dbconfig/20230214-002136-marostegui.json
  • 00:17 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:13 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P44542 and previous config saved to /var/cache/conftool/dbconfig/20230214-001053-marostegui.json
  • 00:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P44541 and previous config saved to /var/cache/conftool/dbconfig/20230214-000629-marostegui.json
  • 00:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['mc-gp2003']
  • 00:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P44540 and previous config saved to /var/cache/conftool/dbconfig/20230214-000419-ladsgroup.json
  • 00:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2439.codfw.wmnet with reason: host reimage

2023-02-13

  • 23:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2438.codfw.wmnet with reason: host reimage
  • 23:57 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2003']
  • 23:56 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2439.codfw.wmnet with reason: host reimage
  • 23:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2003']
  • 23:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P44539 and previous config saved to /var/cache/conftool/dbconfig/20230213-235546-marostegui.json
  • 23:55 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2438.codfw.wmnet with reason: host reimage
  • 23:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P44538 and previous config saved to /var/cache/conftool/dbconfig/20230213-235123-marostegui.json
  • 23:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P44537 and previous config saved to /var/cache/conftool/dbconfig/20230213-234912-ladsgroup.json
  • 23:48 papaul: upgrading firmware on mc-gp2003
  • 23:40 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2003']
  • 23:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T329203)', diff saved to https://phabricator.wikimedia.org/P44536 and previous config saved to /var/cache/conftool/dbconfig/20230213-234040-marostegui.json
  • 23:36 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2439.codfw.wmnet with OS buster
  • 23:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T328817)', diff saved to https://phabricator.wikimedia.org/P44535 and previous config saved to /var/cache/conftool/dbconfig/20230213-233617-marostegui.json
  • 23:35 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2438.codfw.wmnet with OS buster
  • 23:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T329203)', diff saved to https://phabricator.wikimedia.org/P44534 and previous config saved to /var/cache/conftool/dbconfig/20230213-233407-marostegui.json
  • 23:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P44533 and previous config saved to /var/cache/conftool/dbconfig/20230213-233406-ladsgroup.json
  • 23:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 23:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 23:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T329203)', diff saved to https://phabricator.wikimedia.org/P44532 and previous config saved to /var/cache/conftool/dbconfig/20230213-233356-marostegui.json
  • 23:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P44531 and previous config saved to /var/cache/conftool/dbconfig/20230213-231900-ladsgroup.json
  • 23:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P44530 and previous config saved to /var/cache/conftool/dbconfig/20230213-231850-marostegui.json
  • 23:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2436.codfw.wmnet with OS buster
  • 23:18 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2437.codfw.wmnet with OS buster
  • 23:18 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1126 (T328817)', diff saved to https://phabricator.wikimedia.org/P44529 and previous config saved to /var/cache/conftool/dbconfig/20230213-231402-marostegui.json
  • 23:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 23:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 23:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P44528 and previous config saved to /var/cache/conftool/dbconfig/20230213-231137-ladsgroup.json
  • 23:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 23:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 23:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P44527 and previous config saved to /var/cache/conftool/dbconfig/20230213-231116-ladsgroup.json
  • 23:10 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['mc-gp2002']
  • 23:04 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:04 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2002']
  • 23:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P44526 and previous config saved to /var/cache/conftool/dbconfig/20230213-230343-marostegui.json
  • 22:59 zabe@deploy1002: Finished scap: Backport for stop setting checkuser actor/comment migration variables (T233004) (duration: 07m 46s)
  • 22:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P44525 and previous config saved to /var/cache/conftool/dbconfig/20230213-225610-ladsgroup.json
  • 22:53 zabe@deploy1002: zabe: Backport for stop setting checkuser actor/comment migration variables (T233004) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 22:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 22:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 22:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T328817)', diff saved to https://phabricator.wikimedia.org/P44524 and previous config saved to /var/cache/conftool/dbconfig/20230213-225312-marostegui.json
  • 22:53 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:52 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['mc-gp2002']
  • 22:51 zabe@deploy1002: Started scap: Backport for stop setting checkuser actor/comment migration variables (T233004)
  • 22:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T329203)', diff saved to https://phabricator.wikimedia.org/P44523 and previous config saved to /var/cache/conftool/dbconfig/20230213-224837-marostegui.json
  • 22:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2437.codfw.wmnet with reason: host reimage
  • 22:45 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2002']
  • 22:45 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2437.codfw.wmnet with reason: host reimage
  • 22:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2002']
  • 22:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T329203)', diff saved to https://phabricator.wikimedia.org/P44522 and previous config saved to /var/cache/conftool/dbconfig/20230213-224240-marostegui.json
  • 22:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 22:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 22:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T329203)', diff saved to https://phabricator.wikimedia.org/P44521 and previous config saved to /var/cache/conftool/dbconfig/20230213-224219-marostegui.json
  • 22:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P44520 and previous config saved to /var/cache/conftool/dbconfig/20230213-224102-ladsgroup.json
  • 22:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P44519 and previous config saved to /var/cache/conftool/dbconfig/20230213-223806-marostegui.json
  • 22:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2436.codfw.wmnet with reason: host reimage
  • 22:36 papaul: upgrading firmware on mc-gp2002
  • 22:33 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2436.codfw.wmnet with reason: host reimage
  • 22:27 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2002']
  • 22:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P44518 and previous config saved to /var/cache/conftool/dbconfig/20230213-222713-marostegui.json
  • 22:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P44517 and previous config saved to /var/cache/conftool/dbconfig/20230213-222556-ladsgroup.json
  • 22:25 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2437.codfw.wmnet with OS buster
  • 22:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P44516 and previous config saved to /var/cache/conftool/dbconfig/20230213-222300-marostegui.json
  • 22:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P44515 and previous config saved to /var/cache/conftool/dbconfig/20230213-221840-ladsgroup.json
  • 22:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 22:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 22:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 22:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 22:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P44514 and previous config saved to /var/cache/conftool/dbconfig/20230213-221815-ladsgroup.json
  • 22:13 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2436.codfw.wmnet with OS buster
  • 22:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P44513 and previous config saved to /var/cache/conftool/dbconfig/20230213-221207-marostegui.json
  • 22:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T328817)', diff saved to https://phabricator.wikimedia.org/P44512 and previous config saved to /var/cache/conftool/dbconfig/20230213-220753-marostegui.json
  • 22:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P44511 and previous config saved to /var/cache/conftool/dbconfig/20230213-220308-ladsgroup.json
  • 21:57 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:57 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for cloudcephosd1002 - cmooney@cumin1001"
  • 21:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T329203)', diff saved to https://phabricator.wikimedia.org/P44510 and previous config saved to /var/cache/conftool/dbconfig/20230213-215701-marostegui.json
  • 21:56 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for cloudcephosd1002 - cmooney@cumin1001"
  • 21:53 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 21:51 cmooney@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1002.eqiad.wmnet']
  • 21:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T329203)', diff saved to https://phabricator.wikimedia.org/P44509 and previous config saved to /var/cache/conftool/dbconfig/20230213-215055-marostegui.json
  • 21:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 21:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 21:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T329203)', diff saved to https://phabricator.wikimedia.org/P44508 and previous config saved to /var/cache/conftool/dbconfig/20230213-215034-marostegui.json
  • 21:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P44507 and previous config saved to /var/cache/conftool/dbconfig/20230213-214802-ladsgroup.json
  • 21:44 taavi@deploy1002: Finished scap: Backport for Revert "Revert "Enable mediawiki.page_change on group1 wikis"" (duration: 09m 00s)
  • 21:42 cmooney@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1002.eqiad.wmnet']
  • 21:40 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@5edcd7b]: deploying section_topics v0.5.0 on platform_eng Airflow instance (duration: 00m 17s)
  • 21:39 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@5edcd7b]: deploying section_topics v0.5.0 on platform_eng Airflow instance
  • 21:37 taavi@deploy1002: taavi: Backport for Revert "Revert "Enable mediawiki.page_change on group1 wikis"" synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 21:35 taavi@deploy1002: Started scap: Backport for Revert "Revert "Enable mediawiki.page_change on group1 wikis""
  • 21:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P44506 and previous config saved to /var/cache/conftool/dbconfig/20230213-213526-marostegui.json
  • 21:34 taavi@deploy1002: Finished scap: Backport for ReplyLinksController: Fix teardown failing when reloading (T329523) (duration: 08m 38s)
  • 21:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P44505 and previous config saved to /var/cache/conftool/dbconfig/20230213-213256-ladsgroup.json
  • 21:27 taavi@deploy1002: taavi and matmarex: Backport for ReplyLinksController: Fix teardown failing when reloading (T329523) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:26 taavi@deploy1002: Started scap: Backport for ReplyLinksController: Fix teardown failing when reloading (T329523)
  • 21:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P44504 and previous config saved to /var/cache/conftool/dbconfig/20230213-212529-ladsgroup.json
  • 21:25 taavi@deploy1002: Finished scap: lmowiktionary: Create extendedmover group (T327340) (duration: 08m 28s)
  • 21:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 21:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 21:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P44503 and previous config saved to /var/cache/conftool/dbconfig/20230213-212020-marostegui.json
  • 21:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 21:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 21:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P44502 and previous config saved to /var/cache/conftool/dbconfig/20230213-211932-ladsgroup.json
  • 21:18 taavi@deploy1002: taavi: lmowiktionary: Create extendedmover group (T327340) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:16 taavi@deploy1002: Started scap: lmowiktionary: Create extendedmover group (T327340)
  • 21:15 taavi@deploy1002: Backport cancelled.
  • 21:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1114 (T328817)', diff saved to https://phabricator.wikimedia.org/P44501 and previous config saved to /var/cache/conftool/dbconfig/20230213-210738-marostegui.json
  • 21:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 21:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 21:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T328817)', diff saved to https://phabricator.wikimedia.org/P44500 and previous config saved to /var/cache/conftool/dbconfig/20230213-210717-marostegui.json
  • 21:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T329203)', diff saved to https://phabricator.wikimedia.org/P44499 and previous config saved to /var/cache/conftool/dbconfig/20230213-210513-marostegui.json
  • 21:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P44498 and previous config saved to /var/cache/conftool/dbconfig/20230213-210426-ladsgroup.json
  • 20:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T329203)', diff saved to https://phabricator.wikimedia.org/P44497 and previous config saved to /var/cache/conftool/dbconfig/20230213-205905-marostegui.json
  • 20:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 20:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 20:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T329203)', diff saved to https://phabricator.wikimedia.org/P44496 and previous config saved to /var/cache/conftool/dbconfig/20230213-205855-marostegui.json
  • 20:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P44495 and previous config saved to /var/cache/conftool/dbconfig/20230213-205211-marostegui.json
  • 20:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P44494 and previous config saved to /var/cache/conftool/dbconfig/20230213-204920-ladsgroup.json
  • 20:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P44493 and previous config saved to /var/cache/conftool/dbconfig/20230213-204348-marostegui.json
  • 20:39 cmooney@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1001.eqiad.wmnet']
  • 20:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P44492 and previous config saved to /var/cache/conftool/dbconfig/20230213-203704-marostegui.json
  • 20:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P44491 and previous config saved to /var/cache/conftool/dbconfig/20230213-203413-ladsgroup.json
  • 20:32 dcausse: restarting blazegraph on wdqs1012 (BlazegraphFreeAllocatorsDecreasingRapidly)
  • 20:30 cmooney@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1001.eqiad.wmnet']
  • 20:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P44490 and previous config saved to /var/cache/conftool/dbconfig/20230213-202842-marostegui.json
  • 20:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P44489 and previous config saved to /var/cache/conftool/dbconfig/20230213-202656-ladsgroup.json
  • 20:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 20:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 20:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P44488 and previous config saved to /var/cache/conftool/dbconfig/20230213-202635-ladsgroup.json
  • 20:24 cmooney@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1001.eqiad.wmnet']
  • 20:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T328817)', diff saved to https://phabricator.wikimedia.org/P44487 and previous config saved to /var/cache/conftool/dbconfig/20230213-202157-marostegui.json
  • 20:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T329203)', diff saved to https://phabricator.wikimedia.org/P44486 and previous config saved to /var/cache/conftool/dbconfig/20230213-201336-marostegui.json
  • 20:13 cmooney@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1001.eqiad.wmnet']
  • 20:12 elukey@cumin1001: END (PASS) - Cookbook sre.k8s.upgrade-cluster (exit_code=0) Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 20:12 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-staging2002.codfw.wmnet with OS bullseye
  • 20:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P44485 and previous config saved to /var/cache/conftool/dbconfig/20230213-201129-ladsgroup.json
  • 20:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T329203)', diff saved to https://phabricator.wikimedia.org/P44484 and previous config saved to /var/cache/conftool/dbconfig/20230213-200742-marostegui.json
  • 20:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 20:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 20:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 20:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 20:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T329203)', diff saved to https://phabricator.wikimedia.org/P44483 and previous config saved to /var/cache/conftool/dbconfig/20230213-200654-marostegui.json
  • 19:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1111 (T328817)', diff saved to https://phabricator.wikimedia.org/P44482 and previous config saved to /var/cache/conftool/dbconfig/20230213-195743-marostegui.json
  • 19:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 19:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 19:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T328817)', diff saved to https://phabricator.wikimedia.org/P44481 and previous config saved to /var/cache/conftool/dbconfig/20230213-195722-marostegui.json
  • 19:56 cmooney@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudcephosd1001.eqiad.wmnet']
  • 19:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P44480 and previous config saved to /var/cache/conftool/dbconfig/20230213-195623-ladsgroup.json
  • 19:56 cmooney@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1001.eqiad.wmnet']
  • 16:27 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 16:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P44436 and previous config saved to /var/cache/conftool/dbconfig/20230213-162456-marostegui.json
  • 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2438.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44435 and previous config saved to /var/cache/conftool/dbconfig/20230213-161824-root.json
  • 16:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3318 (T328817)', diff saved to https://phabricator.wikimedia.org/P44434 and previous config saved to /var/cache/conftool/dbconfig/20230213-161605-marostegui.json
  • 16:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 16:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 16:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T328817)', diff saved to https://phabricator.wikimedia.org/P44433 and previous config saved to /var/cache/conftool/dbconfig/20230213-161543-marostegui.json
  • 16:10 jmm@cumin2002: END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:cloudelastic
  • 16:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P44432 and previous config saved to /var/cache/conftool/dbconfig/20230213-160950-marostegui.json
  • 16:07 jmm@cumin2002: START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:cloudelastic
  • 16:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44431 and previous config saved to /var/cache/conftool/dbconfig/20230213-160320-root.json
  • 16:02 jmm@cumin2002: END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:relforge
  • 16:02 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudcephosd1001.eqiad.wmnet
  • 16:02 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:02 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1001"
  • 16:01 jmm@cumin2002: START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:relforge
  • 16:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P44429 and previous config saved to /var/cache/conftool/dbconfig/20230213-160037-marostegui.json
  • 15:59 nfraison@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host an-test-presto1001.eqiad.wmnet with OS bullseye
  • 15:58 elukey@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1001"
  • 15:56 elukey@cumin1001: START - Cookbook sre.dns.netbox
  • 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T329203)', diff saved to https://phabricator.wikimedia.org/P44428 and previous config saved to /var/cache/conftool/dbconfig/20230213-155444-marostegui.json
  • 15:51 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 15:51 elukey@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1001.eqiad.wmnet
  • 15:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging-etcd2002.codfw.wmnet with reason: host reimage
  • 15:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T329203)', diff saved to https://phabricator.wikimedia.org/P44427 and previous config saved to /var/cache/conftool/dbconfig/20230213-154850-marostegui.json
  • 15:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 15:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 15:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44426 and previous config saved to /var/cache/conftool/dbconfig/20230213-154815-root.json
  • 15:46 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging-etcd2002.codfw.wmnet with reason: host reimage
  • 15:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P44425 and previous config saved to /var/cache/conftool/dbconfig/20230213-154531-marostegui.json
  • 15:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 15:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 15:42 dcausse: restarting blazegraph on wdqs1004 (BlazegraphFreeAllocatorsDecreasingRapidly)
  • 15:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 15:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 15:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 15:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 15:38 sukhe: disable puppet on A:dns-rec; merging CR 888236
  • 15:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 15:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 15:34 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-staging-etcd2002.codfw.wmnet with OS bullseye
  • 15:33 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ml-staging-etcd2001.codfw.wmnet with OS bullseye
  • 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P44424 and previous config saved to /var/cache/conftool/dbconfig/20230213-153309-root.json
  • 15:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T328817)', diff saved to https://phabricator.wikimedia.org/P44423 and previous config saved to /var/cache/conftool/dbconfig/20230213-153025-marostegui.json
  • 15:27 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2438.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:22 dcausse: T327878: rebuilding CirrusSearch completion index on mnwiki from mwmaint1002
  • 15:20 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Test Upgrade GitLab Replica gitlab1003 same version (noop)
  • 15:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P44422 and previous config saved to /var/cache/conftool/dbconfig/20230213-151805-marostegui.json
  • 15:15 Lucas_WMDE: UTC afternoon backport+config window done
  • 15:14 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php guwwiktionary --fix | tee T309054-namespaceDupes.out # T309054 [0 pages to fix, 0 were resolvable; 0 links to fix, 0 were resolvable; 0 were deleted]
  • 15:13 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Rename project namespace in guwwiktionary (T309054) (duration: 08m 33s)
  • 15:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3318 (T328817)', diff saved to https://phabricator.wikimedia.org/P44421 and previous config saved to /var/cache/conftool/dbconfig/20230213-150644-marostegui.json
  • 15:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 15:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 15:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T328817)', diff saved to https://phabricator.wikimedia.org/P44420 and previous config saved to /var/cache/conftool/dbconfig/20230213-150623-marostegui.json
  • 15:06 lucaswerkmeister-wmde@deploy1002: jhsoby and lucaswerkmeister-wmde: Backport for Rename project namespace in guwwiktionary (T309054) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 15:04 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Rename project namespace in guwwiktionary (T309054)
  • 15:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T329203)', diff saved to https://phabricator.wikimedia.org/P44419 and previous config saved to /var/cache/conftool/dbconfig/20230213-150259-marostegui.json
  • 15:02 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Follow-up I3412c53cc: Fix reference to target in ve.ce.MWWikitextSurface (T329439) (duration: 08m 25s)
  • 14:59 nfraison@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-presto1001.eqiad.wmnet with reason: host reimage
  • 14:56 nfraison@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-presto1001.eqiad.wmnet with reason: host reimage
  • 14:55 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and matmarex: Backport for Follow-up I3412c53cc: Fix reference to target in ve.ce.MWWikitextSurface (T329439) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 14:53 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Follow-up I3412c53cc: Fix reference to target in ve.ce.MWWikitextSurface (T329439)
  • 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P44418 and previous config saved to /var/cache/conftool/dbconfig/20230213-145117-marostegui.json
  • 14:47 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for [cirrus] enable CirrusSearchCompletionSuggesterUseDefaultSort on mnwiki (T327878) (duration: 11m 18s)
  • 14:46 nfraison@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-presto1001.eqiad.wmnet with OS bullseye
  • 14:45 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Test Upgrade GitLab Replica gitlab1003 same version (noop)
  • 14:42 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: sync
  • 14:42 nfraison@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-presto1001.eqiad.wmnet with OS bullseye
  • 14:42 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1001.eqiad.wmnet with OS bullseye
  • 14:42 nfraison@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-presto1001.eqiad.wmnet with OS bullseye
  • 14:42 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1001.eqiad.wmnet with OS bullseye
  • 14:41 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: sync
  • 14:38 nfraison@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-presto1001.eqiad.wmnet with OS bullseye
  • 14:38 nfraison@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-presto1001.eqiad.wmnet with OS bullseye
  • 14:38 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: sync
  • 14:37 lucaswerkmeister-wmde@deploy1002: dcausse and lucaswerkmeister-wmde: Backport for [cirrus] enable CirrusSearchCompletionSuggesterUseDefaultSort on mnwiki (T327878) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 14:37 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: sync
  • 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P44417 and previous config saved to /var/cache/conftool/dbconfig/20230213-143611-marostegui.json
  • 14:36 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for [cirrus] enable CirrusSearchCompletionSuggesterUseDefaultSort on mnwiki (T327878)
  • 14:35 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: sync
  • 14:35 btullis@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: sync
  • 14:34 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: sync
  • 14:34 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: sync
  • 14:31 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging-etcd2001.codfw.wmnet with reason: host reimage
  • 14:28 filippo@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:28 filippo@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add logs-api VIP - filippo@cumin1001"
  • 14:27 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add logs-api VIP - filippo@cumin1001"
  • 14:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging-etcd2001.codfw.wmnet with reason: host reimage
  • 14:26 lucaswerkmeister-wmde@deploy1002: backport aborted: (duration: 10m 13s)
  • 14:24 filippo@cumin1001: START - Cookbook sre.dns.netbox
  • 14:22 nfraison@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-presto1001.eqiad.wmnet with OS bullseye
  • 14:21 nfraison@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-presto1001.eqiad.wmnet with OS bullseye
  • 14:21 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: sync
  • 14:21 volans@cumin1001: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9 to netbox-next - volans@cumin1001
  • 14:21 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: sync
  • 14:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T328817)', diff saved to https://phabricator.wikimedia.org/P44416 and previous config saved to /var/cache/conftool/dbconfig/20230213-142105-marostegui.json
  • 14:21 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: sync
  • 14:20 btullis@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: sync
  • 14:17 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
  • 14:16 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
  • 14:16 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
  • 14:16 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
  • 14:15 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync
  • 14:15 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Add iOS stream config (duration: 10m 06s)
  • 14:15 btullis@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: sync
  • 14:13 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-staging-etcd2001.codfw.wmnet with OS bullseye
  • 14:13 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cloudcephosd1002.eqiad.wmnet with reason: moving racks
  • 14:12 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on cloudcephosd1002.eqiad.wmnet with reason: moving racks
  • 14:12 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cloudcephosd1001.eqiad.wmnet with reason: moving racks
  • 14:12 elukey@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 14:12 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on cloudcephosd1001.eqiad.wmnet with reason: moving racks
  • 14:11 nfraison@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-presto1001.eqiad.wmnet with OS bullseye
  • 14:10 nfraison@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-presto1001.eqiad.wmnet with OS bullseye
  • 14:07 lucaswerkmeister-wmde@deploy1002: mazevedo and lucaswerkmeister-wmde: Backport for Add iOS stream config synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 14:07 jbond: upload node-bgpalerter_1.31.2 to apt
  • 14:05 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 14:05 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 14:05 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Add iOS stream config
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1130 (T329203)', diff saved to https://phabricator.wikimedia.org/P44415 and previous config saved to /var/cache/conftool/dbconfig/20230213-140243-marostegui.json
  • 14:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 14:02 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.upgrade-cluster (exit_code=99) Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 14:02 elukey@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 14:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T329203)', diff saved to https://phabricator.wikimedia.org/P44414 and previous config saved to /var/cache/conftool/dbconfig/20230213-140222-marostegui.json
  • 13:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2166 (T328817)', diff saved to https://phabricator.wikimedia.org/P44413 and previous config saved to /var/cache/conftool/dbconfig/20230213-135753-marostegui.json
  • 13:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 13:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 13:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T328817)', diff saved to https://phabricator.wikimedia.org/P44412 and previous config saved to /var/cache/conftool/dbconfig/20230213-135732-marostegui.json
  • 13:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 6677
  • 13:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 6677
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P44411 and previous config saved to /var/cache/conftool/dbconfig/20230213-134716-marostegui.json
  • 13:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P44410 and previous config saved to /var/cache/conftool/dbconfig/20230213-134226-marostegui.json
  • 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P44409 and previous config saved to /var/cache/conftool/dbconfig/20230213-133210-marostegui.json
  • 13:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P44408 and previous config saved to /var/cache/conftool/dbconfig/20230213-132719-marostegui.json
  • 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T329203)', diff saved to https://phabricator.wikimedia.org/P44407 and previous config saved to /var/cache/conftool/dbconfig/20230213-131703-marostegui.json
  • 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T329203)', diff saved to https://phabricator.wikimedia.org/P44406 and previous config saved to /var/cache/conftool/dbconfig/20230213-131348-marostegui.json
  • 13:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 13:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T329203)', diff saved to https://phabricator.wikimedia.org/P44405 and previous config saved to /var/cache/conftool/dbconfig/20230213-131327-marostegui.json
  • 13:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T328817)', diff saved to https://phabricator.wikimedia.org/P44404 and previous config saved to /var/cache/conftool/dbconfig/20230213-131213-marostegui.json
  • 13:05 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9 to netbox-next - volans@cumin1001
  • 13:01 volans@cumin1001: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9 to netbox-next - volans@cumin1001
  • 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P44403 and previous config saved to /var/cache/conftool/dbconfig/20230213-125821-marostegui.json
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2164 (T328817)', diff saved to https://phabricator.wikimedia.org/P44402 and previous config saved to /var/cache/conftool/dbconfig/20230213-124853-marostegui.json
  • 12:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 12:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 12:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 12:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T328817)', diff saved to https://phabricator.wikimedia.org/P44401 and previous config saved to /var/cache/conftool/dbconfig/20230213-124828-marostegui.json
  • 12:47 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P44400 and previous config saved to /var/cache/conftool/dbconfig/20230213-124314-marostegui.json
  • 12:41 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 12:40 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 12:34 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P44399 and previous config saved to /var/cache/conftool/dbconfig/20230213-123322-marostegui.json
  • 12:31 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 12:30 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 12:28 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9 to netbox-next - volans@cumin1001
  • 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T329203)', diff saved to https://phabricator.wikimedia.org/P44398 and previous config saved to /var/cache/conftool/dbconfig/20230213-122808-marostegui.json
  • 12:25 claime: thumbor roll-restarts done
  • 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T329203)', diff saved to https://phabricator.wikimedia.org/P44397 and previous config saved to /var/cache/conftool/dbconfig/20230213-122401-marostegui.json
  • 12:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 12:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T329203)', diff saved to https://phabricator.wikimedia.org/P44396 and previous config saved to /var/cache/conftool/dbconfig/20230213-122339-marostegui.json
  • 12:22 claime: Roll-restart thumbor in eqiad - Deploying CR 888657
  • 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P44395 and previous config saved to /var/cache/conftool/dbconfig/20230213-121816-marostegui.json
  • 12:15 marostegui: Upgrade db1205 and db2184 to mariadb 10.6.12 T329499
  • 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P44394 and previous config saved to /var/cache/conftool/dbconfig/20230213-120833-marostegui.json
  • 12:07 claime: Roll-restart thumbor in codfw - Deploying CR 888657
  • 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T328817)', diff saved to https://phabricator.wikimedia.org/P44393 and previous config saved to /var/cache/conftool/dbconfig/20230213-120309-marostegui.json
  • 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P44392 and previous config saved to /var/cache/conftool/dbconfig/20230213-115327-marostegui.json
  • 11:49 volans@cumin1001: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9 to netbox-next - volans@cumin1001
  • 11:45 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9 to netbox-next - volans@cumin1001
  • 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install1003.wikimedia.org
  • 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install1003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2163 (T328817)', diff saved to https://phabricator.wikimedia.org/P44391 and previous config saved to /var/cache/conftool/dbconfig/20230213-114002-marostegui.json
  • 11:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 11:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T328817)', diff saved to https://phabricator.wikimedia.org/P44390 and previous config saved to /var/cache/conftool/dbconfig/20230213-113941-marostegui.json
  • 11:39 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install1003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T329203)', diff saved to https://phabricator.wikimedia.org/P44389 and previous config saved to /var/cache/conftool/dbconfig/20230213-113821-marostegui.json
  • 11:37 volans@cumin1001: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9 to netbox-next - volans@cumin1001
  • 11:37 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9 to netbox-next - volans@cumin1001
  • 11:35 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T329203)', diff saved to https://phabricator.wikimedia.org/P44388 and previous config saved to /var/cache/conftool/dbconfig/20230213-113430-marostegui.json
  • 11:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 11:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T329203)', diff saved to https://phabricator.wikimedia.org/P44387 and previous config saved to /var/cache/conftool/dbconfig/20230213-113408-marostegui.json
  • 11:31 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install1003.wikimedia.org
  • 11:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install2003.wikimedia.org
  • 11:26 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:26 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install2003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P44386 and previous config saved to /var/cache/conftool/dbconfig/20230213-112435-marostegui.json
  • 11:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install2003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:19 nfraison@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
  • 11:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P44385 and previous config saved to /var/cache/conftool/dbconfig/20230213-111902-marostegui.json
  • 11:18 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:14 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install2003.wikimedia.org
  • 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P44384 and previous config saved to /var/cache/conftool/dbconfig/20230213-110928-marostegui.json
  • 11:08 jbond: rolling out no_proxy change https://gerrit.wikimedia.org/r/c/operations/puppet/+/879418
  • 11:07 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 7 hosts with reason: Cluster half broken, in the middle of upgrading
  • 11:07 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 7 hosts with reason: Cluster half broken, in the middle of upgrading
  • 11:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P44383 and previous config saved to /var/cache/conftool/dbconfig/20230213-110356-marostegui.json
  • 11:00 marostegui: Failover m1 from db1176 to db1164 - T329259
  • 10:56 godog: roll-restart pybal in eqiad to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/888648 - T320702
  • 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T328817)', diff saved to https://phabricator.wikimedia.org/P44382 and previous config saved to /var/cache/conftool/dbconfig/20230213-105422-marostegui.json
  • 10:49 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-worker1001.eqiad.wmnet with reason: host reimage
  • 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T329203)', diff saved to https://phabricator.wikimedia.org/P44381 and previous config saved to /var/cache/conftool/dbconfig/20230213-104850-marostegui.json
  • 10:48 nfraison@cumin1001: START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
  • 10:46 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-worker1001.eqiad.wmnet with reason: host reimage
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T329203)', diff saved to https://phabricator.wikimedia.org/P44380 and previous config saved to /var/cache/conftool/dbconfig/20230213-104337-marostegui.json
  • 10:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 10:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T329203)', diff saved to https://phabricator.wikimedia.org/P44379 and previous config saved to /var/cache/conftool/dbconfig/20230213-104316-marostegui.json
  • 10:34 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-worker1001.eqiad.wmnet with OS bullseye
  • 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2162 (T328817)', diff saved to https://phabricator.wikimedia.org/P44378 and previous config saved to /var/cache/conftool/dbconfig/20230213-103126-marostegui.json
  • 10:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 10:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T328817)', diff saved to https://phabricator.wikimedia.org/P44377 and previous config saved to /var/cache/conftool/dbconfig/20230213-103105-marostegui.json
  • 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P44376 and previous config saved to /var/cache/conftool/dbconfig/20230213-102810-marostegui.json
  • 10:16 jynus: stopping bacula and disabling puppet at backup1001 for m1 switchover T329259
  • 10:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P44375 and previous config saved to /var/cache/conftool/dbconfig/20230213-101559-marostegui.json
  • 10:15 volans@cumin1001: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9 to netbox-next - volans@cumin1001
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P44374 and previous config saved to /var/cache/conftool/dbconfig/20230213-101304-marostegui.json
  • 10:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2132,2160].codfw.wmnet,db[1117,1164,1176].eqiad.wmnet with reason: Primary switchover m1 T329259
  • 10:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2132,2160].codfw.wmnet,db[1117,1164,1176].eqiad.wmnet with reason: Primary switchover m1 T329259
  • 10:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 46375
  • 10:06 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 46375
  • 10:05 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host test-reimage2001.codfw.wmnet with OS bullseye
  • 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P44373 and previous config saved to /var/cache/conftool/dbconfig/20230213-100053-marostegui.json
  • 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T329203)', diff saved to https://phabricator.wikimedia.org/P44372 and previous config saved to /var/cache/conftool/dbconfig/20230213-095757-marostegui.json
  • 09:54 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9 to netbox-next - volans@cumin1001
  • 09:54 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on test-reimage2001.codfw.wmnet with reason: host reimage
  • 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T329203)', diff saved to https://phabricator.wikimedia.org/P44371 and previous config saved to /var/cache/conftool/dbconfig/20230213-095257-marostegui.json
  • 09:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 09:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T329203)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20230213-095231-marostegui.json
  • 09:51 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on test-reimage2001.codfw.wmnet with reason: host reimage
  • 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T328817)', diff saved to https://phabricator.wikimedia.org/P44369 and previous config saved to /var/cache/conftool/dbconfig/20230213-094546-marostegui.json
  • 09:44 vgutierrez: rolling upgrade to HAProxy 2.6.8 in ulsfo - T321775
  • 09:43 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.upgrade-cluster (exit_code=99) Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 09:43 elukey@cumin1001: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host ml-staging-etcd2002.codfw.wmnet with OS bullseye
  • 09:41 slyngshede@cumin1001: START - Cookbook sre.ganeti.reimage for host test-reimage2001.codfw.wmnet with OS bullseye
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P44368 and previous config saved to /var/cache/conftool/dbconfig/20230213-093725-marostegui.json
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2161 (T328817)', diff saved to https://phabricator.wikimedia.org/P44367 and previous config saved to /var/cache/conftool/dbconfig/20230213-092228-marostegui.json
  • 09:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P44366 and previous config saved to /var/cache/conftool/dbconfig/20230213-092218-marostegui.json
  • 09:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T328817)', diff saved to https://phabricator.wikimedia.org/P44365 and previous config saved to /var/cache/conftool/dbconfig/20230213-092207-marostegui.json
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T329203)', diff saved to https://phabricator.wikimedia.org/P44363 and previous config saved to /var/cache/conftool/dbconfig/20230213-090712-marostegui.json
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P44362 and previous config saved to /var/cache/conftool/dbconfig/20230213-090701-marostegui.json
  • 09:03 moritzm: rolling restart of Apache on mw/codfw servers to pick up updated libxml
  • 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T329203)', diff saved to https://phabricator.wikimedia.org/P44361 and previous config saved to /var/cache/conftool/dbconfig/20230213-090302-marostegui.json
  • 09:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 09:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T329203)', diff saved to https://phabricator.wikimedia.org/P44360 and previous config saved to /var/cache/conftool/dbconfig/20230213-090241-marostegui.json
  • 08:54 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P44358 and previous config saved to /var/cache/conftool/dbconfig/20230213-085154-marostegui.json
  • 08:51 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 08:51 Emperor: rolling-restart of codfw swift frontends
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P44357 and previous config saved to /var/cache/conftool/dbconfig/20230213-084735-marostegui.json
  • 08:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging-etcd2002.codfw.wmnet with reason: host reimage
  • 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T328817)', diff saved to https://phabricator.wikimedia.org/P44356 and previous config saved to /var/cache/conftool/dbconfig/20230213-083648-marostegui.json
  • 08:36 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging-etcd2002.codfw.wmnet with reason: host reimage
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P44355 and previous config saved to /var/cache/conftool/dbconfig/20230213-083229-marostegui.json
  • 08:29 taavi@deploy1002: Finished scap: Backport for Add a temporary logo to trwikiquote (Vector legacy + Vector 2022) (T329399), [bjnwiki] Change time zone setting (T328887), [fawiki] Add an alias to Help namespace (T329465) (duration: 19m 24s)
  • 08:27 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-staging-etcd2002.codfw.wmnet with OS bullseye
  • 08:26 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ml-staging-etcd2001.codfw.wmnet with OS bullseye
  • 08:20 taavi@deploy1002: taavi and superpes: Backport for Add a temporary logo to trwikiquote (Vector legacy + Vector 2022) (T329399), [bjnwiki] Change time zone setting (T328887), [fawiki] Add an alias to Help namespace (T329465) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T329203)', diff saved to https://phabricator.wikimedia.org/P44354 and previous config saved to /var/cache/conftool/dbconfig/20230213-081722-marostegui.json
  • 08:17 moritzm: installing curl security updates
  • 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T329203)', diff saved to https://phabricator.wikimedia.org/P44353 and previous config saved to /var/cache/conftool/dbconfig/20230213-081431-marostegui.json
  • 08:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 08:14 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging-etcd2001.codfw.wmnet with reason: host reimage
  • 08:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 08:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 08:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T329203)', diff saved to https://phabricator.wikimedia.org/P44352 and previous config saved to /var/cache/conftool/dbconfig/20230213-081344-marostegui.json
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2154 (T328817)', diff saved to https://phabricator.wikimedia.org/P44351 and previous config saved to /var/cache/conftool/dbconfig/20230213-081332-marostegui.json
  • 08:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 08:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T328817)', diff saved to https://phabricator.wikimedia.org/P44350 and previous config saved to /var/cache/conftool/dbconfig/20230213-081311-marostegui.json
  • 08:11 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging-etcd2001.codfw.wmnet with reason: host reimage
  • 08:10 taavi@deploy1002: Started scap: Backport for Add a temporary logo to trwikiquote (Vector legacy + Vector 2022) (T329399), [bjnwiki] Change time zone setting (T328887), [fawiki] Add an alias to Help namespace (T329465)
  • 07:59 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-staging-etcd2001.codfw.wmnet with OS bullseye
  • 07:59 elukey@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P44349 and previous config saved to /var/cache/conftool/dbconfig/20230213-075838-marostegui.json
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P44348 and previous config saved to /var/cache/conftool/dbconfig/20230213-075805-marostegui.json
  • 07:55 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.upgrade-cluster (exit_code=99) Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 07:54 elukey@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host ml-staging-etcd2001.codfw.wmnet with OS bullseye
  • 07:54 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-staging-etcd2001.codfw.wmnet with OS bullseye
  • 07:53 elukey@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 07:47 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.upgrade-cluster (exit_code=99) Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 07:46 elukey@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P44347 and previous config saved to /var/cache/conftool/dbconfig/20230213-074331-marostegui.json
  • 07:43 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.upgrade-cluster (exit_code=99) Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P44346 and previous config saved to /var/cache/conftool/dbconfig/20230213-074258-marostegui.json
  • 07:41 elukey@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host ml-staging-etcd2001.codfw.wmnet with OS bullseye
  • 07:41 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-staging-etcd2001.codfw.wmnet with OS bullseye
  • 07:39 elukey@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 07:37 marostegui: Deploy schema change on db2151 T329260
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T329203)', diff saved to https://phabricator.wikimedia.org/P44345 and previous config saved to /var/cache/conftool/dbconfig/20230213-072825-marostegui.json
  • 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T328817)', diff saved to https://phabricator.wikimedia.org/P44344 and previous config saved to /var/cache/conftool/dbconfig/20230213-072752-marostegui.json
  • 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T329203)', diff saved to https://phabricator.wikimedia.org/P44343 and previous config saved to /var/cache/conftool/dbconfig/20230213-072535-marostegui.json
  • 07:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 07:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T329203)', diff saved to https://phabricator.wikimedia.org/P44342 and previous config saved to /var/cache/conftool/dbconfig/20230213-072514-marostegui.json
  • 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P44341 and previous config saved to /var/cache/conftool/dbconfig/20230213-071007-marostegui.json
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2152 (T328817)', diff saved to https://phabricator.wikimedia.org/P44340 and previous config saved to /var/cache/conftool/dbconfig/20230213-070717-marostegui.json
  • 07:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 07:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 06:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2132,2160].codfw.wmnet,db[1117,1164,1176].eqiad.wmnet with reason: Primary switchover m1 T329259
  • 06:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2132,2160].codfw.wmnet,db[1117,1164,1176].eqiad.wmnet with reason: Primary switchover m1 T329259
  • 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P44339 and previous config saved to /var/cache/conftool/dbconfig/20230213-065501-marostegui.json
  • 06:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 06:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1099 from dbctl T329181', diff saved to https://phabricator.wikimedia.org/P44338 and previous config saved to /var/cache/conftool/dbconfig/20230213-064051-marostegui.json
  • 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T329203)', diff saved to https://phabricator.wikimedia.org/P44337 and previous config saved to /var/cache/conftool/dbconfig/20230213-063955-marostegui.json
  • 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T329203)', diff saved to https://phabricator.wikimedia.org/P44336 and previous config saved to /var/cache/conftool/dbconfig/20230213-063449-marostegui.json
  • 06:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 06:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 06:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 06:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 06:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 06:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 06:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 06:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 06:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 06:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 06:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 06:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 06:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 06:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 06:05 AndyRussG: DjangoBannerStats upgraded from c9926cfc to 5dc35ea2 on fran1001

2023-02-11

  • 01:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 01:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 01:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T329203)', diff saved to https://phabricator.wikimedia.org/P44335 and previous config saved to /var/cache/conftool/dbconfig/20230211-015530-marostegui.json
  • 01:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2451.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P44334 and previous config saved to /var/cache/conftool/dbconfig/20230211-014023-marostegui.json
  • 01:37 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2451.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2449.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2450.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:28 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2450.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:28 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2449.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2448.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2447.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P44333 and previous config saved to /var/cache/conftool/dbconfig/20230211-012517-marostegui.json
  • 01:10 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2448.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:10 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2447.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T329203)', diff saved to https://phabricator.wikimedia.org/P44332 and previous config saved to /var/cache/conftool/dbconfig/20230211-011010-marostegui.json
  • 01:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2446.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2445.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1197 (T329203)', diff saved to https://phabricator.wikimedia.org/P44331 and previous config saved to /var/cache/conftool/dbconfig/20230211-010454-marostegui.json
  • 01:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 01:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 01:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T329203)', diff saved to https://phabricator.wikimedia.org/P44330 and previous config saved to /var/cache/conftool/dbconfig/20230211-010433-marostegui.json
  • 01:01 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2446.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:00 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2445.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2444.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2443.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P44329 and previous config saved to /var/cache/conftool/dbconfig/20230211-004927-marostegui.json
  • 00:46 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2444.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:46 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2443.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2442.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2441.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:35 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2442.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:35 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2441.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2440.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2439.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P44328 and previous config saved to /var/cache/conftool/dbconfig/20230211-003420-marostegui.json
  • 00:27 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2440.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:27 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2439.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2437.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2436.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T329203)', diff saved to https://phabricator.wikimedia.org/P44327 and previous config saved to /var/cache/conftool/dbconfig/20230211-001914-marostegui.json
  • 00:18 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2437.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:18 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2436.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:18 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2437.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:17 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2436.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:17 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2437.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:17 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2436.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1188 (T329203)', diff saved to https://phabricator.wikimedia.org/P44326 and previous config saved to /var/cache/conftool/dbconfig/20230211-001658-marostegui.json
  • 00:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 00:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 00:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T329203)', diff saved to https://phabricator.wikimedia.org/P44325 and previous config saved to /var/cache/conftool/dbconfig/20230211-001637-marostegui.json
  • 00:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2451.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P44324 and previous config saved to /var/cache/conftool/dbconfig/20230211-000131-marostegui.json

2023-02-10

  • 23:52 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2451.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2450.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P44323 and previous config saved to /var/cache/conftool/dbconfig/20230210-234624-marostegui.json
  • 23:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T329203)', diff saved to https://phabricator.wikimedia.org/P44322 and previous config saved to /var/cache/conftool/dbconfig/20230210-233118-marostegui.json
  • 23:30 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2450.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T329203)', diff saved to https://phabricator.wikimedia.org/P44321 and previous config saved to /var/cache/conftool/dbconfig/20230210-232526-marostegui.json
  • 23:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 23:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 23:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T329203)', diff saved to https://phabricator.wikimedia.org/P44320 and previous config saved to /var/cache/conftool/dbconfig/20230210-232505-marostegui.json
  • 23:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P44319 and previous config saved to /var/cache/conftool/dbconfig/20230210-230958-marostegui.json
  • 22:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P44318 and previous config saved to /var/cache/conftool/dbconfig/20230210-225452-marostegui.json
  • 22:44 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: debugging
  • 22:44 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: debugging
  • 22:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T329203)', diff saved to https://phabricator.wikimedia.org/P44317 and previous config saved to /var/cache/conftool/dbconfig/20230210-223946-marostegui.json
  • 22:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T329203)', diff saved to https://phabricator.wikimedia.org/P44316 and previous config saved to /var/cache/conftool/dbconfig/20230210-223430-marostegui.json
  • 22:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 22:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 22:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T329203)', diff saved to https://phabricator.wikimedia.org/P44315 and previous config saved to /var/cache/conftool/dbconfig/20230210-223420-marostegui.json
  • 22:25 demon@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.22 refs T325585
  • 22:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: debugging
  • 22:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: debugging
  • 22:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P44314 and previous config saved to /var/cache/conftool/dbconfig/20230210-221914-marostegui.json
  • 22:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 22:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 22:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T328817)', diff saved to https://phabricator.wikimedia.org/P44313 and previous config saved to /var/cache/conftool/dbconfig/20230210-220816-marostegui.json
  • 22:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P44312 and previous config saved to /var/cache/conftool/dbconfig/20230210-220408-marostegui.json
  • 21:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P44311 and previous config saved to /var/cache/conftool/dbconfig/20230210-215310-marostegui.json
  • 21:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T329203)', diff saved to https://phabricator.wikimedia.org/P44310 and previous config saved to /var/cache/conftool/dbconfig/20230210-214901-marostegui.json
  • 21:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T329203)', diff saved to https://phabricator.wikimedia.org/P44309 and previous config saved to /var/cache/conftool/dbconfig/20230210-214308-marostegui.json
  • 21:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 21:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 21:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 21:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 21:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T329203)', diff saved to https://phabricator.wikimedia.org/P44308 and previous config saved to /var/cache/conftool/dbconfig/20230210-214241-marostegui.json
  • 21:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P44307 and previous config saved to /var/cache/conftool/dbconfig/20230210-213803-marostegui.json
  • 21:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P44306 and previous config saved to /var/cache/conftool/dbconfig/20230210-212734-marostegui.json
  • 21:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T328817)', diff saved to https://phabricator.wikimedia.org/P44305 and previous config saved to /var/cache/conftool/dbconfig/20230210-212257-marostegui.json
  • 21:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1202 (T328817)', diff saved to https://phabricator.wikimedia.org/P44304 and previous config saved to /var/cache/conftool/dbconfig/20230210-212046-marostegui.json
  • 21:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 21:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 21:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T328817)', diff saved to https://phabricator.wikimedia.org/P44303 and previous config saved to /var/cache/conftool/dbconfig/20230210-212025-marostegui.json
  • 21:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P44302 and previous config saved to /var/cache/conftool/dbconfig/20230210-211228-marostegui.json
  • 21:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P44301 and previous config saved to /var/cache/conftool/dbconfig/20230210-210519-marostegui.json
  • 20:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T329203)', diff saved to https://phabricator.wikimedia.org/P44300 and previous config saved to /var/cache/conftool/dbconfig/20230210-205722-marostegui.json
  • 20:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T329203)', diff saved to https://phabricator.wikimedia.org/P44299 and previous config saved to /var/cache/conftool/dbconfig/20230210-205059-marostegui.json
  • 20:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 20:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 20:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P44298 and previous config saved to /var/cache/conftool/dbconfig/20230210-205012-marostegui.json
  • 20:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 20:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 20:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T329203)', diff saved to https://phabricator.wikimedia.org/P44297 and previous config saved to /var/cache/conftool/dbconfig/20230210-204638-marostegui.json
  • 20:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T328817)', diff saved to https://phabricator.wikimedia.org/P44296 and previous config saved to /var/cache/conftool/dbconfig/20230210-203506-marostegui.json
  • 20:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T328817)', diff saved to https://phabricator.wikimedia.org/P44295 and previous config saved to /var/cache/conftool/dbconfig/20230210-203255-marostegui.json
  • 20:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 20:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 20:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T328817)', diff saved to https://phabricator.wikimedia.org/P44294 and previous config saved to /var/cache/conftool/dbconfig/20230210-203234-marostegui.json
  • 20:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P44293 and previous config saved to /var/cache/conftool/dbconfig/20230210-203131-marostegui.json
  • 20:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P44292 and previous config saved to /var/cache/conftool/dbconfig/20230210-201728-marostegui.json
  • 20:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P44291 and previous config saved to /var/cache/conftool/dbconfig/20230210-201625-marostegui.json
  • 20:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P44290 and previous config saved to /var/cache/conftool/dbconfig/20230210-200221-marostegui.json
  • 20:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T329203)', diff saved to https://phabricator.wikimedia.org/P44289 and previous config saved to /var/cache/conftool/dbconfig/20230210-200118-marostegui.json
  • 19:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T329203)', diff saved to https://phabricator.wikimedia.org/P44288 and previous config saved to /var/cache/conftool/dbconfig/20230210-195902-marostegui.json
  • 19:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 19:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 19:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T329203)', diff saved to https://phabricator.wikimedia.org/P44287 and previous config saved to /var/cache/conftool/dbconfig/20230210-195841-marostegui.json
  • 19:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T328817)', diff saved to https://phabricator.wikimedia.org/P44286 and previous config saved to /var/cache/conftool/dbconfig/20230210-194715-marostegui.json
  • 19:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1191 (T328817)', diff saved to https://phabricator.wikimedia.org/P44285 and previous config saved to /var/cache/conftool/dbconfig/20230210-194504-marostegui.json
  • 19:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 19:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 19:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T328817)', diff saved to https://phabricator.wikimedia.org/P44284 and previous config saved to /var/cache/conftool/dbconfig/20230210-194443-marostegui.json
  • 19:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P44283 and previous config saved to /var/cache/conftool/dbconfig/20230210-194335-marostegui.json
  • 19:39 demon@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.22 refs T325585 (duration: 06m 34s)
  • 19:33 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.22 refs T325585
  • 19:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P44282 and previous config saved to /var/cache/conftool/dbconfig/20230210-192935-marostegui.json
  • 19:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P44281 and previous config saved to /var/cache/conftool/dbconfig/20230210-192828-marostegui.json
  • 19:23 demon@deploy1002: Finished scap: Updating wikibase to fix T329233 (duration: 07m 49s)
  • 19:16 demon@deploy1002: Started scap: Updating wikibase to fix T329233
  • 19:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P44280 and previous config saved to /var/cache/conftool/dbconfig/20230210-191429-marostegui.json
  • 19:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T329203)', diff saved to https://phabricator.wikimedia.org/P44279 and previous config saved to /var/cache/conftool/dbconfig/20230210-191322-marostegui.json
  • 19:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1122 (T329203)', diff saved to https://phabricator.wikimedia.org/P44278 and previous config saved to /var/cache/conftool/dbconfig/20230210-190711-marostegui.json
  • 19:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 19:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 19:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T329203)', diff saved to https://phabricator.wikimedia.org/P44277 and previous config saved to /var/cache/conftool/dbconfig/20230210-190650-marostegui.json
  • 18:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T328817)', diff saved to https://phabricator.wikimedia.org/P44276 and previous config saved to /var/cache/conftool/dbconfig/20230210-185923-marostegui.json
  • 18:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T328817)', diff saved to https://phabricator.wikimedia.org/P44275 and previous config saved to /var/cache/conftool/dbconfig/20230210-185712-marostegui.json
  • 18:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 18:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 18:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T328817)', diff saved to https://phabricator.wikimedia.org/P44274 and previous config saved to /var/cache/conftool/dbconfig/20230210-185651-marostegui.json
  • 18:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P44273 and previous config saved to /var/cache/conftool/dbconfig/20230210-185144-marostegui.json
  • 18:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P44272 and previous config saved to /var/cache/conftool/dbconfig/20230210-184144-marostegui.json
  • 18:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P44271 and previous config saved to /var/cache/conftool/dbconfig/20230210-183638-marostegui.json
  • 18:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P44270 and previous config saved to /var/cache/conftool/dbconfig/20230210-182638-marostegui.json
  • 18:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T329203)', diff saved to https://phabricator.wikimedia.org/P44269 and previous config saved to /var/cache/conftool/dbconfig/20230210-182131-marostegui.json
  • 18:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T329203)', diff saved to https://phabricator.wikimedia.org/P44268 and previous config saved to /var/cache/conftool/dbconfig/20230210-181456-marostegui.json
  • 18:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 18:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 18:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T328817)', diff saved to https://phabricator.wikimedia.org/P44267 and previous config saved to /var/cache/conftool/dbconfig/20230210-181132-marostegui.json
  • 18:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 18:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 18:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T329203)', diff saved to https://phabricator.wikimedia.org/P44266 and previous config saved to /var/cache/conftool/dbconfig/20230210-180953-marostegui.json
  • 18:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T328817)', diff saved to https://phabricator.wikimedia.org/P44265 and previous config saved to /var/cache/conftool/dbconfig/20230210-180921-marostegui.json
  • 18:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 18:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 18:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 18:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 18:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T328817)', diff saved to https://phabricator.wikimedia.org/P44264 and previous config saved to /var/cache/conftool/dbconfig/20230210-180502-marostegui.json
  • 17:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P44263 and previous config saved to /var/cache/conftool/dbconfig/20230210-175447-marostegui.json
  • 17:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P44262 and previous config saved to /var/cache/conftool/dbconfig/20230210-174956-marostegui.json
  • 17:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P44261 and previous config saved to /var/cache/conftool/dbconfig/20230210-173941-marostegui.json
  • 17:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P44260 and previous config saved to /var/cache/conftool/dbconfig/20230210-173450-marostegui.json
  • 17:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T329203)', diff saved to https://phabricator.wikimedia.org/P44259 and previous config saved to /var/cache/conftool/dbconfig/20230210-172434-marostegui.json
  • 17:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T328817)', diff saved to https://phabricator.wikimedia.org/P44258 and previous config saved to /var/cache/conftool/dbconfig/20230210-171943-marostegui.json
  • 17:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T329203)', diff saved to https://phabricator.wikimedia.org/P44257 and previous config saved to /var/cache/conftool/dbconfig/20230210-171818-marostegui.json
  • 17:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 17:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 17:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T329203)', diff saved to https://phabricator.wikimedia.org/P44256 and previous config saved to /var/cache/conftool/dbconfig/20230210-171757-marostegui.json
  • 17:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T328817)', diff saved to https://phabricator.wikimedia.org/P44255 and previous config saved to /var/cache/conftool/dbconfig/20230210-171349-marostegui.json
  • 17:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 17:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 17:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T328817)', diff saved to https://phabricator.wikimedia.org/P44254 and previous config saved to /var/cache/conftool/dbconfig/20230210-171328-marostegui.json
  • 17:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P44253 and previous config saved to /var/cache/conftool/dbconfig/20230210-170250-marostegui.json
  • 16:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P44252 and previous config saved to /var/cache/conftool/dbconfig/20230210-165822-marostegui.json
  • 16:48 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2450.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:48 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2451.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:48 sukhe: reprepro -C main include bullseye-wikimedia gdnsd_3.8.0-1~wmf2_amd64.changes: T321309
  • 16:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P44248 and previous config saved to /var/cache/conftool/dbconfig/20230210-164744-marostegui.json
  • 16:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P44247 and previous config saved to /var/cache/conftool/dbconfig/20230210-164316-marostegui.json
  • 16:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T329203)', diff saved to https://phabricator.wikimedia.org/P44246 and previous config saved to /var/cache/conftool/dbconfig/20230210-163238-marostegui.json
  • 16:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T328817)', diff saved to https://phabricator.wikimedia.org/P44245 and previous config saved to /var/cache/conftool/dbconfig/20230210-162809-marostegui.json
  • 16:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T329203)', diff saved to https://phabricator.wikimedia.org/P44244 and previous config saved to /var/cache/conftool/dbconfig/20230210-162615-marostegui.json
  • 16:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 16:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T328817)', diff saved to https://phabricator.wikimedia.org/P44243 and previous config saved to /var/cache/conftool/dbconfig/20230210-162559-marostegui.json
  • 16:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 16:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T329203)', diff saved to https://phabricator.wikimedia.org/P44242 and previous config saved to /var/cache/conftool/dbconfig/20230210-162553-marostegui.json
  • 16:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 16:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 16:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 16:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 16:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T328817)', diff saved to https://phabricator.wikimedia.org/P44241 and previous config saved to /var/cache/conftool/dbconfig/20230210-162520-marostegui.json
  • 16:11 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 16:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P44240 and previous config saved to /var/cache/conftool/dbconfig/20230210-161047-marostegui.json
  • 16:10 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 16:10 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2451.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:10 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2450.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P44239 and previous config saved to /var/cache/conftool/dbconfig/20230210-161014-marostegui.json
  • 16:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2449.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2448.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:06 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 16:05 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 16:03 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2449.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:03 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2448.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2446.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2447.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ml-staging[2001-2002].codfw.wmnet,ml-staging-ctrl[2001-2002].codfw.wmnet,ml-staging-etcd2003.codfw.wmnet with reason: Cluster half broken, in the middle of upgrading
  • 15:56 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ml-staging[2001-2002].codfw.wmnet,ml-staging-ctrl[2001-2002].codfw.wmnet,ml-staging-etcd2003.codfw.wmnet with reason: Cluster half broken, in the middle of upgrading
  • 15:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P44238 and previous config saved to /var/cache/conftool/dbconfig/20230210-155541-marostegui.json
  • 15:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P44237 and previous config saved to /var/cache/conftool/dbconfig/20230210-155508-marostegui.json
  • 15:49 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.upgrade-cluster (exit_code=99) Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 15:49 elukey@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host ml-staging-etcd2002.codfw.wmnet with OS bullseye
  • 15:41 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2447.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:41 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2446.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T329203)', diff saved to https://phabricator.wikimedia.org/P44236 and previous config saved to /var/cache/conftool/dbconfig/20230210-154034-marostegui.json
  • 15:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T328817)', diff saved to https://phabricator.wikimedia.org/P44235 and previous config saved to /var/cache/conftool/dbconfig/20230210-154001-marostegui.json
  • 15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T329203)', diff saved to https://phabricator.wikimedia.org/P44234 and previous config saved to /var/cache/conftool/dbconfig/20230210-153411-marostegui.json
  • 15:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 15:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T329203)', diff saved to https://phabricator.wikimedia.org/P44233 and previous config saved to /var/cache/conftool/dbconfig/20230210-153349-marostegui.json
  • 15:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T328817)', diff saved to https://phabricator.wikimedia.org/P44232 and previous config saved to /var/cache/conftool/dbconfig/20230210-153112-marostegui.json
  • 15:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 15:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 15:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T328817)', diff saved to https://phabricator.wikimedia.org/P44231 and previous config saved to /var/cache/conftool/dbconfig/20230210-153051-marostegui.json
  • 15:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P44230 and previous config saved to /var/cache/conftool/dbconfig/20230210-151843-marostegui.json
  • 15:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P44229 and previous config saved to /var/cache/conftool/dbconfig/20230210-151544-marostegui.json
  • 15:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetdb2003.codfw.wmnet
  • 15:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P44228 and previous config saved to /var/cache/conftool/dbconfig/20230210-150337-marostegui.json
  • 15:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P44227 and previous config saved to /var/cache/conftool/dbconfig/20230210-150038-marostegui.json
  • 15:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetdb2003.codfw.wmnet
  • 14:53 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-staging-etcd2002.codfw.wmnet with OS bullseye
  • 14:52 elukey@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T329203)', diff saved to https://phabricator.wikimedia.org/P44226 and previous config saved to /var/cache/conftool/dbconfig/20230210-144830-marostegui.json
  • 14:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T328817)', diff saved to https://phabricator.wikimedia.org/P44225 and previous config saved to /var/cache/conftool/dbconfig/20230210-144530-marostegui.json
  • 14:43 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.upgrade-cluster (exit_code=99) Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3312 (T329203)', diff saved to https://phabricator.wikimedia.org/P44224 and previous config saved to /var/cache/conftool/dbconfig/20230210-144204-marostegui.json
  • 14:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 14:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 14:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T329203)', diff saved to https://phabricator.wikimedia.org/P44223 and previous config saved to /var/cache/conftool/dbconfig/20230210-144143-marostegui.json
  • 14:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T328817)', diff saved to https://phabricator.wikimedia.org/P44222 and previous config saved to /var/cache/conftool/dbconfig/20230210-143815-marostegui.json
  • 14:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 14:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 14:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T328817)', diff saved to https://phabricator.wikimedia.org/P44221 and previous config saved to /var/cache/conftool/dbconfig/20230210-143753-marostegui.json
  • 14:36 elukey@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host ml-staging-etcd2001.codfw.wmnet with OS bullseye
  • 14:36 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-staging-etcd2001.codfw.wmnet with OS bullseye
  • 14:33 elukey@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P44220 and previous config saved to /var/cache/conftool/dbconfig/20230210-142636-marostegui.json
  • 14:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P44219 and previous config saved to /var/cache/conftool/dbconfig/20230210-142247-marostegui.json
  • 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P44218 and previous config saved to /var/cache/conftool/dbconfig/20230210-141128-marostegui.json
  • 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P44217 and previous config saved to /var/cache/conftool/dbconfig/20230210-140741-marostegui.json
  • 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T329203)', diff saved to https://phabricator.wikimedia.org/P44216 and previous config saved to /var/cache/conftool/dbconfig/20230210-135622-marostegui.json
  • 13:56 eoghan@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab-runner2004.codfw.wmnet with OS bullseye
  • 13:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T329203)', diff saved to https://phabricator.wikimedia.org/P44215 and previous config saved to /var/cache/conftool/dbconfig/20230210-135345-marostegui.json
  • 13:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 13:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 13:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 13:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 13:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T329203)', diff saved to https://phabricator.wikimedia.org/P44214 and previous config saved to /var/cache/conftool/dbconfig/20230210-135319-marostegui.json
  • 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T328817)', diff saved to https://phabricator.wikimedia.org/P44213 and previous config saved to /var/cache/conftool/dbconfig/20230210-135235-marostegui.json
  • 13:49 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.upgrade-cluster (exit_code=99) Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 13:48 elukey@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T328817)', diff saved to https://phabricator.wikimedia.org/P44212 and previous config saved to /var/cache/conftool/dbconfig/20230210-134544-marostegui.json
  • 13:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 13:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T328817)', diff saved to https://phabricator.wikimedia.org/P44211 and previous config saved to /var/cache/conftool/dbconfig/20230210-134523-marostegui.json
  • 13:39 eoghan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab-runner2004.codfw.wmnet with reason: host reimage
  • 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44210 and previous config saved to /var/cache/conftool/dbconfig/20230210-133823-root.json
  • 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P44209 and previous config saved to /var/cache/conftool/dbconfig/20230210-133813-marostegui.json
  • 13:36 eoghan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab-runner2004.codfw.wmnet with reason: host reimage
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P44208 and previous config saved to /var/cache/conftool/dbconfig/20230210-133016-marostegui.json
  • 13:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44207 and previous config saved to /var/cache/conftool/dbconfig/20230210-132318-root.json
  • 13:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P44206 and previous config saved to /var/cache/conftool/dbconfig/20230210-132307-marostegui.json
  • 13:21 eoghan@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab-runner2004.codfw.wmnet with OS bullseye
  • 13:19 volans: upgraded spicerack to 6.1.0 on the cumin hosts
  • 13:19 topranks: Adjusting evpn route export policy on lsw1-e2-eqiad to include host routes
  • 13:18 eoghan@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab-runner2003.codfw.wmnet with OS bullseye
  • 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P44205 and previous config saved to /var/cache/conftool/dbconfig/20230210-131509-marostegui.json
  • 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetdb1003.eqiad.wmnet
  • 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44204 and previous config saved to /var/cache/conftool/dbconfig/20230210-130813-root.json
  • 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T329203)', diff saved to https://phabricator.wikimedia.org/P44203 and previous config saved to /var/cache/conftool/dbconfig/20230210-130801-marostegui.json
  • 13:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetdb1003.eqiad.wmnet
  • 13:03 eoghan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab-runner2003.codfw.wmnet with reason: host reimage
  • 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T329203)', diff saved to https://phabricator.wikimedia.org/P44202 and previous config saved to /var/cache/conftool/dbconfig/20230210-130110-marostegui.json
  • 13:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 13:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T329203)', diff saved to https://phabricator.wikimedia.org/P44201 and previous config saved to /var/cache/conftool/dbconfig/20230210-130049-marostegui.json
  • 13:00 eoghan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab-runner2003.codfw.wmnet with reason: host reimage
  • 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T328817)', diff saved to https://phabricator.wikimedia.org/P44200 and previous config saved to /var/cache/conftool/dbconfig/20230210-130002-marostegui.json
  • 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44199 and previous config saved to /var/cache/conftool/dbconfig/20230210-125308-root.json
  • 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T328817)', diff saved to https://phabricator.wikimedia.org/P44198 and previous config saved to /var/cache/conftool/dbconfig/20230210-125301-marostegui.json
  • 12:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 12:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T328817)', diff saved to https://phabricator.wikimedia.org/P44197 and previous config saved to /var/cache/conftool/dbconfig/20230210-125240-marostegui.json
  • 12:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P44196 and previous config saved to /var/cache/conftool/dbconfig/20230210-124543-marostegui.json
  • 12:45 eoghan@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab-runner2003.codfw.wmnet with OS bullseye
  • 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44195 and previous config saved to /var/cache/conftool/dbconfig/20230210-123757-root.json
  • 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P44194 and previous config saved to /var/cache/conftool/dbconfig/20230210-123733-marostegui.json
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P44193 and previous config saved to /var/cache/conftool/dbconfig/20230210-123036-marostegui.json
  • 12:26 eoghan@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab-runner2002.codfw.wmnet with OS bullseye
  • 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44192 and previous config saved to /var/cache/conftool/dbconfig/20230210-122252-root.json
  • 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P44191 and previous config saved to /var/cache/conftool/dbconfig/20230210-122227-marostegui.json
  • 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T329203)', diff saved to https://phabricator.wikimedia.org/P44190 and previous config saved to /var/cache/conftool/dbconfig/20230210-121530-marostegui.json
  • 12:13 jbond@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sretest1001.eqiad.wmnet
  • 12:13 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
  • 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T329203)', diff saved to https://phabricator.wikimedia.org/P44189 and previous config saved to /var/cache/conftool/dbconfig/20230210-121252-marostegui.json
  • 12:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 12:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 12:10 eoghan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab-runner2002.codfw.wmnet with reason: host reimage
  • 12:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 12:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P44188 and previous config saved to /var/cache/conftool/dbconfig/20230210-120747-root.json
  • 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T328817)', diff saved to https://phabricator.wikimedia.org/P44187 and previous config saved to /var/cache/conftool/dbconfig/20230210-120721-marostegui.json
  • 12:06 eoghan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab-runner2002.codfw.wmnet with reason: host reimage
  • 12:04 jbond@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 12:03 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "testing spicerack 6.1.0 - jbond@cumin2002"
  • 12:02 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "testing spicerack 6.1.0 - jbond@cumin2002"
  • 12:02 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "testing spicerack 6.1.0 - jbond@cumin2002"
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T329203)', diff saved to https://phabricator.wikimedia.org/P44186 and previous config saved to /var/cache/conftool/dbconfig/20230210-120123-marostegui.json
  • 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T328817)', diff saved to https://phabricator.wikimedia.org/P44185 and previous config saved to /var/cache/conftool/dbconfig/20230210-120014-marostegui.json
  • 12:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 11:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T328817)', diff saved to https://phabricator.wikimedia.org/P44184 and previous config saved to /var/cache/conftool/dbconfig/20230210-115953-marostegui.json
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T329203)', diff saved to https://phabricator.wikimedia.org/P44183 and previous config saved to /var/cache/conftool/dbconfig/20230210-115913-marostegui.json
  • 11:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 11:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T329203)', diff saved to https://phabricator.wikimedia.org/P44182 and previous config saved to /var/cache/conftool/dbconfig/20230210-115852-marostegui.json
  • 11:58 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "testing spicerack 6.1.0 - jbond@cumin2002"
  • 11:56 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1001.eqiad.wmnet
  • 11:54 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "testing spicerack 6.1.0 - jbond@cumin2002"
  • 11:53 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "testing spicerack 6.1.0 - jbond@cumin2002"
  • 11:51 eoghan@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab-runner2002.codfw.wmnet with OS bullseye
  • 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P44181 and previous config saved to /var/cache/conftool/dbconfig/20230210-114447-marostegui.json
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P44180 and previous config saved to /var/cache/conftool/dbconfig/20230210-114346-marostegui.json
  • 11:34 volans: uploaded spicerack_6.1.0 to apt.wikimedia.org bullseye-wikimedia
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P44179 and previous config saved to /var/cache/conftool/dbconfig/20230210-112940-marostegui.json
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P44178 and previous config saved to /var/cache/conftool/dbconfig/20230210-112840-marostegui.json
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T328817)', diff saved to https://phabricator.wikimedia.org/P44177 and previous config saved to /var/cache/conftool/dbconfig/20230210-111434-marostegui.json
  • 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T329203)', diff saved to https://phabricator.wikimedia.org/P44176 and previous config saved to /var/cache/conftool/dbconfig/20230210-111333-marostegui.json
  • 11:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T329203)', diff saved to https://phabricator.wikimedia.org/P44175 and previous config saved to /var/cache/conftool/dbconfig/20230210-111124-marostegui.json
  • 11:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 11:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 11:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T329203)', diff saved to https://phabricator.wikimedia.org/P44174 and previous config saved to /var/cache/conftool/dbconfig/20230210-111103-marostegui.json
  • 11:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T328817)', diff saved to https://phabricator.wikimedia.org/P44173 and previous config saved to /var/cache/conftool/dbconfig/20230210-110740-marostegui.json
  • 11:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 11:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 11:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 11:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 11:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T328817)', diff saved to https://phabricator.wikimedia.org/P44172 and previous config saved to /var/cache/conftool/dbconfig/20230210-110715-marostegui.json
  • 11:05 moritzm: upgrade puppetdb[12]003 to bookworm T321783
  • 10:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P44171 and previous config saved to /var/cache/conftool/dbconfig/20230210-105557-marostegui.json
  • 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P44170 and previous config saved to /var/cache/conftool/dbconfig/20230210-105208-marostegui.json
  • 10:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P44169 and previous config saved to /var/cache/conftool/dbconfig/20230210-104051-marostegui.json
  • 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P44168 and previous config saved to /var/cache/conftool/dbconfig/20230210-103702-marostegui.json
  • 10:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34177
  • 10:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 34177
  • 10:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6677
  • 10:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6677
  • 10:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9145
  • 10:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9145
  • 10:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install3001.wikimedia.org
  • 10:33 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:33 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install3001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 10:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4764
  • 10:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4764
  • 10:31 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install3001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 10:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 138886
  • 10:29 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 138886
  • 10:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8966
  • 10:28 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8966
  • 10:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 35467
  • 10:28 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 35467
  • 10:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 10:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T329203)', diff saved to https://phabricator.wikimedia.org/P44167 and previous config saved to /var/cache/conftool/dbconfig/20230210-102544-marostegui.json
  • 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T328817)', diff saved to https://phabricator.wikimedia.org/P44166 and previous config saved to /var/cache/conftool/dbconfig/20230210-102156-marostegui.json
  • 10:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install3001.wikimedia.org
  • 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install4001.wikimedia.org
  • 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install4001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 10:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T329203)', diff saved to https://phabricator.wikimedia.org/P44165 and previous config saved to /var/cache/conftool/dbconfig/20230210-102035-marostegui.json
  • 10:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 10:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 10:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T329203)', diff saved to https://phabricator.wikimedia.org/P44164 and previous config saved to /var/cache/conftool/dbconfig/20230210-102014-marostegui.json
  • 10:18 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install4001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 10:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T328817)', diff saved to https://phabricator.wikimedia.org/P44163 and previous config saved to /var/cache/conftool/dbconfig/20230210-101600-marostegui.json
  • 10:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 10:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T328817)', diff saved to https://phabricator.wikimedia.org/P44162 and previous config saved to /var/cache/conftool/dbconfig/20230210-101539-marostegui.json
  • 10:12 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 10:08 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install4001.wikimedia.org
  • 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install5001.wikimedia.org
  • 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 10:06 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 10:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P44161 and previous config saved to /var/cache/conftool/dbconfig/20230210-100508-marostegui.json
  • 10:01 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P44160 and previous config saved to /var/cache/conftool/dbconfig/20230210-100033-marostegui.json
  • 09:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install5001.wikimedia.org
  • 09:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install6001.wikimedia.org
  • 09:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install6001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 09:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install6001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 09:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P44159 and previous config saved to /var/cache/conftool/dbconfig/20230210-095001-marostegui.json
  • 09:45 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P44158 and previous config saved to /var/cache/conftool/dbconfig/20230210-094526-marostegui.json
  • 09:41 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install6001.wikimedia.org
  • 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T329203)', diff saved to https://phabricator.wikimedia.org/P44157 and previous config saved to /var/cache/conftool/dbconfig/20230210-093455-marostegui.json
  • 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1173 (T329203)', diff saved to https://phabricator.wikimedia.org/P44156 and previous config saved to /var/cache/conftool/dbconfig/20230210-093246-marostegui.json
  • 09:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 09:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T329203)', diff saved to https://phabricator.wikimedia.org/P44155 and previous config saved to /var/cache/conftool/dbconfig/20230210-093225-marostegui.json
  • 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T328817)', diff saved to https://phabricator.wikimedia.org/P44154 and previous config saved to /var/cache/conftool/dbconfig/20230210-093020-marostegui.json
  • 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T328817)', diff saved to https://phabricator.wikimedia.org/P44153 and previous config saved to /var/cache/conftool/dbconfig/20230210-092417-marostegui.json
  • 09:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 09:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T328817)', diff saved to https://phabricator.wikimedia.org/P44152 and previous config saved to /var/cache/conftool/dbconfig/20230210-092355-marostegui.json
  • 09:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 42184
  • 09:22 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 42184
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P44151 and previous config saved to /var/cache/conftool/dbconfig/20230210-091719-marostegui.json
  • 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P44150 and previous config saved to /var/cache/conftool/dbconfig/20230210-090848-marostegui.json
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P44149 and previous config saved to /var/cache/conftool/dbconfig/20230210-090213-marostegui.json
  • 08:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P44148 and previous config saved to /var/cache/conftool/dbconfig/20230210-085342-marostegui.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T329203)', diff saved to https://phabricator.wikimedia.org/P44147 and previous config saved to /var/cache/conftool/dbconfig/20230210-084706-marostegui.json
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T329203)', diff saved to https://phabricator.wikimedia.org/P44146 and previous config saved to /var/cache/conftool/dbconfig/20230210-084457-marostegui.json
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099 (s1, s8) T329181', diff saved to https://phabricator.wikimedia.org/P44145 and previous config saved to /var/cache/conftool/dbconfig/20230210-084452-root.json
  • 08:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 08:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T329203)', diff saved to https://phabricator.wikimedia.org/P44144 and previous config saved to /var/cache/conftool/dbconfig/20230210-084430-marostegui.json
  • 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T328817)', diff saved to https://phabricator.wikimedia.org/P44143 and previous config saved to /var/cache/conftool/dbconfig/20230210-083836-marostegui.json
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T328817)', diff saved to https://phabricator.wikimedia.org/P44142 and previous config saved to /var/cache/conftool/dbconfig/20230210-083140-marostegui.json
  • 08:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 08:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T328817)', diff saved to https://phabricator.wikimedia.org/P44141 and previous config saved to /var/cache/conftool/dbconfig/20230210-083119-marostegui.json
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P44140 and previous config saved to /var/cache/conftool/dbconfig/20230210-082923-marostegui.json
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P44139 and previous config saved to /var/cache/conftool/dbconfig/20230210-081612-marostegui.json
  • 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P44138 and previous config saved to /var/cache/conftool/dbconfig/20230210-081417-marostegui.json
  • 08:12 moritzm: installing virglrenderer security updates
  • 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44137 and previous config saved to /var/cache/conftool/dbconfig/20230210-080841-root.json
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P44136 and previous config saved to /var/cache/conftool/dbconfig/20230210-080106-marostegui.json
  • 07:59 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.upgrade-cluster (exit_code=99) Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T329203)', diff saved to https://phabricator.wikimedia.org/P44135 and previous config saved to /var/cache/conftool/dbconfig/20230210-075911-marostegui.json
  • 07:59 elukey@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T329203)', diff saved to https://phabricator.wikimedia.org/P44134 and previous config saved to /var/cache/conftool/dbconfig/20230210-075702-marostegui.json
  • 07:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 07:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 07:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 07:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44133 and previous config saved to /var/cache/conftool/dbconfig/20230210-075336-root.json
  • 07:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 07:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T329203)', diff saved to https://phabricator.wikimedia.org/P44132 and previous config saved to /var/cache/conftool/dbconfig/20230210-075314-marostegui.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T328817)', diff saved to https://phabricator.wikimedia.org/P44131 and previous config saved to /var/cache/conftool/dbconfig/20230210-074600-marostegui.json
  • 07:43 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.upgrade-cluster (exit_code=99) Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 07:41 elukey@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: Upgrade ml-staging-codfw cluster to 1.23
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T328817)', diff saved to https://phabricator.wikimedia.org/P44130 and previous config saved to /var/cache/conftool/dbconfig/20230210-073902-marostegui.json
  • 07:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 07:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T328817)', diff saved to https://phabricator.wikimedia.org/P44129 and previous config saved to /var/cache/conftool/dbconfig/20230210-073841-marostegui.json
  • 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44128 and previous config saved to /var/cache/conftool/dbconfig/20230210-073831-root.json
  • 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P44127 and previous config saved to /var/cache/conftool/dbconfig/20230210-073808-marostegui.json
  • 07:38 moritzm: installing wireshark security updates
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P44126 and previous config saved to /var/cache/conftool/dbconfig/20230210-072335-marostegui.json
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44125 and previous config saved to /var/cache/conftool/dbconfig/20230210-072327-root.json
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P44124 and previous config saved to /var/cache/conftool/dbconfig/20230210-072301-marostegui.json
  • 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P44123 and previous config saved to /var/cache/conftool/dbconfig/20230210-070829-marostegui.json
  • 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44122 and previous config saved to /var/cache/conftool/dbconfig/20230210-070822-root.json
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T329203)', diff saved to https://phabricator.wikimedia.org/P44121 and previous config saved to /var/cache/conftool/dbconfig/20230210-070755-marostegui.json
  • 06:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1098.eqiad.wmnet
  • 06:57 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:57 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1098.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T328817)', diff saved to https://phabricator.wikimedia.org/P44120 and previous config saved to /var/cache/conftool/dbconfig/20230210-065322-marostegui.json
  • 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44119 and previous config saved to /var/cache/conftool/dbconfig/20230210-065317-root.json
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T328817)', diff saved to https://phabricator.wikimedia.org/P44118 and previous config saved to /var/cache/conftool/dbconfig/20230210-064728-marostegui.json
  • 06:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 06:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 06:46 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1098.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 06:44 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 06:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 06:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 06:40 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1098.eqiad.wmnet
  • 06:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 06:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P44117 and previous config saved to /var/cache/conftool/dbconfig/20230210-063812-root.json
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T329203)', diff saved to https://phabricator.wikimedia.org/P44116 and previous config saved to /var/cache/conftool/dbconfig/20230210-063543-marostegui.json
  • 06:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 06:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 06:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 06:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1130 (T328817)', diff saved to https://phabricator.wikimedia.org/P44115 and previous config saved to /var/cache/conftool/dbconfig/20230210-063249-marostegui.json
  • 06:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 06:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 06:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 06:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 06:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 06:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 06:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 06:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 02:55 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2445.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:48 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2445.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2444.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2443.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:07 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2444.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2442.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:07 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 02:07 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new mw nodes row D - pt1979@cumin2002"
  • 02:06 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2443.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:06 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new mw nodes row D - pt1979@cumin2002"
  • 02:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2441.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:05 zabe: deployed mitigations for T326691
  • 02:03 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 02:01 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2442.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:00 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2441.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:00 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2441.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:00 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2441.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2440.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2439.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:50 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2440.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:49 Reedy: creating wbc_entity_usage on foundationwiki - T321967
  • 01:49 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2438.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:43 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2439.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:43 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2438.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2437.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2436.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:35 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2437.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:34 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2436.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:32 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:32 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new mw nodes in row C - pt1979@cumin2002"
  • 01:31 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new mw nodes in row C - pt1979@cumin2002"
  • 01:29 pt1979@cumin2002: START - Cookbook sre.dns.netbox

2023-02-09

  • 23:49 ladsgroup@deploy1002: Finished scap: Backport for Revert "Start reading from rev_comment_id in cebwiki" (duration: 09m 04s)
  • 23:41 ladsgroup@deploy1002: ladsgroup: Backport for Revert "Start reading from rev_comment_id in cebwiki" synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 23:39 ladsgroup@deploy1002: Started scap: Backport for Revert "Start reading from rev_comment_id in cebwiki"
  • 23:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P44114 and previous config saved to /var/cache/conftool/dbconfig/20230209-231522-ladsgroup.json
  • 23:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P44113 and previous config saved to /var/cache/conftool/dbconfig/20230209-230016-ladsgroup.json
  • 22:50 zabe@deploy1002: Finished scap: Backport for Start reading from rev_comment_id in cebwiki (T275246) (duration: 08m 26s)
  • 22:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P44112 and previous config saved to /var/cache/conftool/dbconfig/20230209-224509-ladsgroup.json
  • 22:44 zabe@deploy1002: zabe: Backport for Start reading from rev_comment_id in cebwiki (T275246) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 22:42 zabe@deploy1002: Started scap: Backport for Start reading from rev_comment_id in cebwiki (T275246)
  • 22:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P44111 and previous config saved to /var/cache/conftool/dbconfig/20230209-223003-ladsgroup.json
  • 22:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P44110 and previous config saved to /var/cache/conftool/dbconfig/20230209-222137-ladsgroup.json
  • 22:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 22:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 22:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P44109 and previous config saved to /var/cache/conftool/dbconfig/20230209-222126-ladsgroup.json
  • 22:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P44108 and previous config saved to /var/cache/conftool/dbconfig/20230209-220620-ladsgroup.json
  • 21:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P44107 and previous config saved to /var/cache/conftool/dbconfig/20230209-215114-ladsgroup.json
  • 21:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P44106 and previous config saved to /var/cache/conftool/dbconfig/20230209-213607-ladsgroup.json
  • 21:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P44105 and previous config saved to /var/cache/conftool/dbconfig/20230209-212747-ladsgroup.json
  • 21:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 21:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 21:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 21:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 21:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P44104 and previous config saved to /var/cache/conftool/dbconfig/20230209-212732-ladsgroup.json
  • 21:19 thcipriani@deploy1002: Sync cancelled.
  • 21:18 thcipriani@deploy1002: trainbranchbot and thcipriani: Backport for Revert "InitialiseSettings: install PageAssessments on newiki" synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 21:17 thcipriani@deploy1002: Started scap: Backport for Revert "InitialiseSettings: install PageAssessments on newiki"
  • 21:13 thcipriani@deploy1002: backport aborted: (duration: 06m 05s)
  • 21:13 thcipriani@deploy1002: sync-world aborted: Backport for InitialiseSettings: install PageAssessments on newiki (T328224) (duration: 04m 24s)
  • 21:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P44103 and previous config saved to /var/cache/conftool/dbconfig/20230209-211226-ladsgroup.json
  • 21:11 thcipriani@deploy1002: musikanimal and thcipriani: Backport for InitialiseSettings: install PageAssessments on newiki (T328224) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 21:09 thcipriani@deploy1002: Started scap: Backport for InitialiseSettings: install PageAssessments on newiki (T328224)
  • 20:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P44102 and previous config saved to /var/cache/conftool/dbconfig/20230209-205720-ladsgroup.json
  • 20:50 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge logging config change - bking@cumin1001 - T324335
  • 20:47 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge logging config change - bking@cumin1001 - T324335
  • 20:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P44101 and previous config saved to /var/cache/conftool/dbconfig/20230209-204214-ladsgroup.json
  • 20:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P44100 and previous config saved to /var/cache/conftool/dbconfig/20230209-203236-ladsgroup.json
  • 20:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 20:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 20:31 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge logging config change - bking@cumin1001 - T324335
  • 20:27 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge logging config change - bking@cumin1001 - T324335
  • 20:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 20:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 20:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P44099 and previous config saved to /var/cache/conftool/dbconfig/20230209-202551-ladsgroup.json
  • 20:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P44098 and previous config saved to /var/cache/conftool/dbconfig/20230209-201045-ladsgroup.json
  • 19:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P44097 and previous config saved to /var/cache/conftool/dbconfig/20230209-195539-ladsgroup.json
  • 19:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T328817)', diff saved to https://phabricator.wikimedia.org/P44096 and previous config saved to /var/cache/conftool/dbconfig/20230209-194730-marostegui.json
  • 19:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P44095 and previous config saved to /var/cache/conftool/dbconfig/20230209-194032-ladsgroup.json
  • 19:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P44094 and previous config saved to /var/cache/conftool/dbconfig/20230209-193223-marostegui.json
  • 19:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P44093 and previous config saved to /var/cache/conftool/dbconfig/20230209-193107-ladsgroup.json
  • 19:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 19:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 19:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P44092 and previous config saved to /var/cache/conftool/dbconfig/20230209-193057-ladsgroup.json
  • 19:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P44091 and previous config saved to /var/cache/conftool/dbconfig/20230209-191717-marostegui.json
  • 19:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P44090 and previous config saved to /var/cache/conftool/dbconfig/20230209-191551-ladsgroup.json
  • 19:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T328817)', diff saved to https://phabricator.wikimedia.org/P44089 and previous config saved to /var/cache/conftool/dbconfig/20230209-190211-marostegui.json
  • 19:01 ebernhardson: start full-cluster in-place reindexing of all wiki elasticsearch clusters T147505
  • 19:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P44088 and previous config saved to /var/cache/conftool/dbconfig/20230209-190044-ladsgroup.json
  • 18:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1200 (T328817)', diff saved to https://phabricator.wikimedia.org/P44087 and previous config saved to /var/cache/conftool/dbconfig/20230209-185933-marostegui.json
  • 18:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 18:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 18:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T328817)', diff saved to https://phabricator.wikimedia.org/P44086 and previous config saved to /var/cache/conftool/dbconfig/20230209-185912-marostegui.json
  • 18:55 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2001.codfw.wmnet with OS bullseye
  • 18:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2435.codfw.wmnet with OS buster
  • 18:49 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 18:48 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P44085 and previous config saved to /var/cache/conftool/dbconfig/20230209-184538-ladsgroup.json
  • 18:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P44084 and previous config saved to /var/cache/conftool/dbconfig/20230209-184405-marostegui.json
  • 18:41 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2001.codfw.wmnet with reason: host reimage
  • 18:38 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2001.codfw.wmnet with reason: host reimage
  • 18:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P44083 and previous config saved to /var/cache/conftool/dbconfig/20230209-183611-ladsgroup.json
  • 18:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 18:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 18:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2431.codfw.wmnet with OS buster
  • 18:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 18:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2433.codfw.wmnet with OS buster
  • 18:32 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 18:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2435.codfw.wmnet with reason: host reimage
  • 18:30 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 18:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P44082 and previous config saved to /var/cache/conftool/dbconfig/20230209-182859-marostegui.json
  • 18:28 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2435.codfw.wmnet with reason: host reimage
  • 18:22 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc-gp2001.codfw.wmnet with OS bullseye
  • 18:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 18:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2433.codfw.wmnet with reason: host reimage
  • 18:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T328817)', diff saved to https://phabricator.wikimedia.org/P44081 and previous config saved to /var/cache/conftool/dbconfig/20230209-181353-marostegui.json
  • 18:12 jiji@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts mc-gp2001.codfw.wmnet
  • 18:11 jiji@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2001.codfw.wmnet
  • 18:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1185 (T328817)', diff saved to https://phabricator.wikimedia.org/P44080 and previous config saved to /var/cache/conftool/dbconfig/20230209-181115-marostegui.json
  • 18:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 18:10 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2433.codfw.wmnet with reason: host reimage
  • 18:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T328817)', diff saved to https://phabricator.wikimedia.org/P44079 and previous config saved to /var/cache/conftool/dbconfig/20230209-181043-marostegui.json
  • 18:09 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 18:09 jiji@cumin1001: Updating IPMI password on 1 hosts - jiji@cumin1001
  • 18:09 jiji@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 18:08 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2435.codfw.wmnet with OS buster
  • 18:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2431.codfw.wmnet with reason: host reimage
  • 18:04 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2431.codfw.wmnet with reason: host reimage
  • 18:03 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 18:03 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 18:03 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 18:02 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 18:02 jiji@cumin2002: START - Cookbook sre.hosts.reboot-single for host mc-gp2001.codfw.wmnet
  • 18:02 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 18:01 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 18:01 jiji@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mc-gp2001.codfw.wmnet
  • 18:00 jiji@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mc-gp2001.codfw.wmnet
  • 18:00 jiji@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mc-gp2001.codfw.wmnet
  • 17:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P44078 and previous config saved to /var/cache/conftool/dbconfig/20230209-175536-marostegui.json
  • 17:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2433.codfw.wmnet with OS buster
  • 17:50 jiji@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mc-gp2001.codfw.wmnet
  • 17:43 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2431.codfw.wmnet with OS buster
  • 17:41 jiji@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mc-gp2001.codfw.wmnet
  • 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P44077 and previous config saved to /var/cache/conftool/dbconfig/20230209-174030-marostegui.json
  • 17:32 mforns@deploy1002: Finished deploy [airflow-dags/analytics@e84e692]: (no justification provided) (duration: 00m 16s)
  • 17:32 mforns@deploy1002: Started deploy [airflow-dags/analytics@e84e692]: (no justification provided)
  • 17:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P44076 and previous config saved to /var/cache/conftool/dbconfig/20230209-173045-ladsgroup.json
  • 17:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T328817)', diff saved to https://phabricator.wikimedia.org/P44075 and previous config saved to /var/cache/conftool/dbconfig/20230209-172524-marostegui.json
  • 17:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T329203)', diff saved to https://phabricator.wikimedia.org/P44074 and previous config saved to /var/cache/conftool/dbconfig/20230209-172239-marostegui.json
  • 17:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T328817)', diff saved to https://phabricator.wikimedia.org/P44073 and previous config saved to /var/cache/conftool/dbconfig/20230209-172129-marostegui.json
  • 17:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 17:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 17:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 17:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 17:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 17:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 17:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T328817)', diff saved to https://phabricator.wikimedia.org/P44072 and previous config saved to /var/cache/conftool/dbconfig/20230209-171846-marostegui.json
  • 17:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P44071 and previous config saved to /var/cache/conftool/dbconfig/20230209-171539-ladsgroup.json
  • 17:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P44070 and previous config saved to /var/cache/conftool/dbconfig/20230209-170732-marostegui.json
  • 17:07 moritzm: rolling restart of FPM/Apache on mw canaries to pick up curl security updates
  • 17:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P44069 and previous config saved to /var/cache/conftool/dbconfig/20230209-170340-marostegui.json
  • 17:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P44068 and previous config saved to /var/cache/conftool/dbconfig/20230209-170031-ladsgroup.json
  • 16:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P44067 and previous config saved to /var/cache/conftool/dbconfig/20230209-165226-marostegui.json
  • 16:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P44066 and previous config saved to /var/cache/conftool/dbconfig/20230209-164834-marostegui.json
  • 16:48 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab-runner1004.eqiad.wmnet with OS bullseye
  • 16:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P44065 and previous config saved to /var/cache/conftool/dbconfig/20230209-164525-ladsgroup.json
  • 16:44 moritzm: installing curl security updates on buster
  • 16:44 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2055.codfw.wmnet with OS bullseye
  • 16:38 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@caf4808]: T329089: proper reconciliation of missed page-undelete events (duration: 02m 24s)
  • 16:37 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw2431.codfw.wmnet with OS buster
  • 16:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T329203)', diff saved to https://phabricator.wikimedia.org/P44064 and previous config saved to /var/cache/conftool/dbconfig/20230209-163720-marostegui.json
  • 16:36 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@caf4808]: T329089: proper reconciliation of missed page-undelete events
  • 16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P44063 and previous config saved to /var/cache/conftool/dbconfig/20230209-163559-ladsgroup.json
  • 16:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 16:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P44062 and previous config saved to /var/cache/conftool/dbconfig/20230209-163538-ladsgroup.json
  • 16:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T329203)', diff saved to https://phabricator.wikimedia.org/P44061 and previous config saved to /var/cache/conftool/dbconfig/20230209-163459-marostegui.json
  • 16:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 16:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 16:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T329203)', diff saved to https://phabricator.wikimedia.org/P44060 and previous config saved to /var/cache/conftool/dbconfig/20230209-163438-marostegui.json
  • 16:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T328817)', diff saved to https://phabricator.wikimedia.org/P44059 and previous config saved to /var/cache/conftool/dbconfig/20230209-163327-marostegui.json
  • 16:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T328817)', diff saved to https://phabricator.wikimedia.org/P44058 and previous config saved to /var/cache/conftool/dbconfig/20230209-162927-marostegui.json
  • 16:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 16:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 16:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T328817)', diff saved to https://phabricator.wikimedia.org/P44057 and previous config saved to /var/cache/conftool/dbconfig/20230209-162855-marostegui.json
  • 16:28 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2055.codfw.wmnet with reason: host reimage
  • 16:25 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2055.codfw.wmnet with reason: host reimage
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P44056 and previous config saved to /var/cache/conftool/dbconfig/20230209-162032-ladsgroup.json
  • 16:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P44055 and previous config saved to /var/cache/conftool/dbconfig/20230209-161931-marostegui.json
  • 16:17 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab-runner1004.eqiad.wmnet with reason: host reimage
  • 16:14 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab-runner1004.eqiad.wmnet with reason: host reimage
  • 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P44054 and previous config saved to /var/cache/conftool/dbconfig/20230209-161349-marostegui.json
  • 16:10 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1001.eqiad.wmnet with OS bullseye
  • 16:09 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2055.codfw.wmnet with OS bullseye
  • 16:09 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc2055.codfw.wmnet with OS bullseye
  • 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P44053 and previous config saved to /var/cache/conftool/dbconfig/20230209-160525-ladsgroup.json
  • 16:05 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw2435.codfw.wmnet with OS buster
  • 16:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P44052 and previous config saved to /var/cache/conftool/dbconfig/20230209-160425-marostegui.json
  • 16:02 eoghan@cumin1001: START - Cookbook sre.hosts.reimage for host gitlab-runner1004.eqiad.wmnet with OS bullseye
  • 15:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P44051 and previous config saved to /var/cache/conftool/dbconfig/20230209-155843-marostegui.json
  • 15:56 jiji@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts mc-gp1002.eqiad.wmnet
  • 15:56 jiji@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mc-gp1002.eqiad.wmnet
  • 15:55 jiji@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mc-gp1002.eqiad.wmnet
  • 15:55 jiji@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mc-gp1002.eqiad.wmnet
  • 15:55 sukhe: restart esitest.service on A:cp-text
  • 15:55 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2055.codfw.wmnet with OS bullseye
  • 15:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1001.eqiad.wmnet with reason: host reimage
  • 15:52 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1001.eqiad.wmnet with reason: host reimage
  • 15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P44050 and previous config saved to /var/cache/conftool/dbconfig/20230209-155019-ladsgroup.json
  • 15:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T329203)', diff saved to https://phabricator.wikimedia.org/P44049 and previous config saved to /var/cache/conftool/dbconfig/20230209-154919-marostegui.json
  • 15:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T329203)', diff saved to https://phabricator.wikimedia.org/P44048 and previous config saved to /var/cache/conftool/dbconfig/20230209-154347-marostegui.json
  • 15:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T328817)', diff saved to https://phabricator.wikimedia.org/P44047 and previous config saved to /var/cache/conftool/dbconfig/20230209-154337-marostegui.json
  • 15:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 15:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 15:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T329203)', diff saved to https://phabricator.wikimedia.org/P44046 and previous config saved to /var/cache/conftool/dbconfig/20230209-154330-marostegui.json
  • 15:42 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw2433.codfw.wmnet with OS buster
  • 15:42 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2054.codfw.wmnet with OS bullseye
  • 15:41 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2431.codfw.wmnet with OS buster
  • 15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P44045 and previous config saved to /var/cache/conftool/dbconfig/20230209-154058-ladsgroup.json
  • 15:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 15:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 15:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 15:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P44044 and previous config saved to /var/cache/conftool/dbconfig/20230209-154032-ladsgroup.json
  • 15:39 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw2431.codfw.wmnet with OS buster
  • 15:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc-gp1001.eqiad.wmnet with OS bullseye
  • 15:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2431.codfw.wmnet with OS buster
  • 15:34 jiji@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts mc-gp1001.eqiad.wmnet
  • 15:34 jiji@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1001.eqiad.wmnet
  • 15:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2434.codfw.wmnet with OS buster
  • 15:31 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P44043 and previous config saved to /var/cache/conftool/dbconfig/20230209-152824-marostegui.json
  • 15:25 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2054.codfw.wmnet with reason: host reimage
  • 15:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P44042 and previous config saved to /var/cache/conftool/dbconfig/20230209-152525-ladsgroup.json
  • 15:24 jiji@cumin2002: START - Cookbook sre.hosts.reboot-single for host mc-gp1001.eqiad.wmnet
  • 15:23 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2054.codfw.wmnet with reason: host reimage
  • 15:23 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:16 jiji@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mc-gp1001.eqiad.wmnet
  • 15:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P44041 and previous config saved to /var/cache/conftool/dbconfig/20230209-151317-marostegui.json
  • 15:12 jiji@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mc-gp1001.eqiad.wmnet
  • 15:10 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw2431.codfw.wmnet with OS buster
  • 15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P44040 and previous config saved to /var/cache/conftool/dbconfig/20230209-151019-ladsgroup.json
  • 15:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2434.codfw.wmnet with reason: host reimage
  • 15:09 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2435.codfw.wmnet with OS buster
  • 15:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2432.codfw.wmnet with OS buster
  • 15:08 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:07 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2054.codfw.wmnet with OS bullseye
  • 15:06 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2434.codfw.wmnet with reason: host reimage
  • 15:04 jiji@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mc-gp1001.eqiad.wmnet
  • 15:03 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2053.codfw.wmnet with OS bullseye
  • 14:59 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:58 jiji@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mc-gp1001.eqiad.wmnet
  • 14:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T329203)', diff saved to https://phabricator.wikimedia.org/P44039 and previous config saved to /var/cache/conftool/dbconfig/20230209-145811-marostegui.json
  • 14:57 jiji@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mc-gp1001.eqiad.wmnet
  • 14:57 jiji@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mc-gp1001.eqiad.wmnet
  • 14:56 jiji@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mc-gp1001.eqiad.wmnet
  • 14:56 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mc-gp1001.eqiad.wmnet
  • 14:55 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mc-gp1001.eqiad.wmnet
  • 14:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P44038 and previous config saved to /var/cache/conftool/dbconfig/20230209-145513-ladsgroup.json
  • 14:52 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mc-gp1001.eqiad.wmnet
  • 14:52 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mc-gp1001.eqiad.wmnet
  • 14:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T329203)', diff saved to https://phabricator.wikimedia.org/P44037 and previous config saved to /var/cache/conftool/dbconfig/20230209-145232-marostegui.json
  • 14:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 14:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 14:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T329203)', diff saved to https://phabricator.wikimedia.org/P44036 and previous config saved to /var/cache/conftool/dbconfig/20230209-145210-marostegui.json
  • 14:52 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mc-gp1001.eqiad.wmnet
  • 14:52 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mc-gp1001.eqiad.wmnet
  • 14:51 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['mc-gp1001.eqiad.wmnet']
  • 14:51 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp1001.eqiad.wmnet']
  • 14:50 jiji@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['mc-gp1001.eqiad.wmnet']
  • 14:49 jiji@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp1001.eqiad.wmnet']
  • 14:46 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2434.codfw.wmnet with OS buster
  • 14:46 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2053.codfw.wmnet with reason: host reimage
  • 14:46 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@dc3cd56]: T329089: proper reconciliation of missed page-undelete events (duration: 20m 48s)
  • 14:46 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2433.codfw.wmnet with OS buster
  • 14:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P44035 and previous config saved to /var/cache/conftool/dbconfig/20230209-144535-ladsgroup.json
  • 14:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 14:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 14:45 jiji@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mc-gp1001.eqiad.wmnet
  • 14:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2430.codfw.wmnet with OS buster
  • 14:44 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2429.codfw.wmnet with OS buster
  • 14:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:44 jiji@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mc-gp1001.eqiad.wmnet
  • 14:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2432.codfw.wmnet with reason: host reimage
  • 14:43 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2053.codfw.wmnet with reason: host reimage
  • 14:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1130 (T328817)', diff saved to https://phabricator.wikimedia.org/P44034 and previous config saved to /var/cache/conftool/dbconfig/20230209-144321-marostegui.json
  • 14:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 14:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 14:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T328817)', diff saved to https://phabricator.wikimedia.org/P44033 and previous config saved to /var/cache/conftool/dbconfig/20230209-144300-marostegui.json
  • 14:41 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2432.codfw.wmnet with reason: host reimage
  • 14:38 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 14:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 14:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P44032 and previous config saved to /var/cache/conftool/dbconfig/20230209-143828-ladsgroup.json
  • 14:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P44031 and previous config saved to /var/cache/conftool/dbconfig/20230209-143704-marostegui.json
  • 14:32 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P44030 and previous config saved to /var/cache/conftool/dbconfig/20230209-142754-marostegui.json
  • 14:27 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2053.codfw.wmnet with OS bullseye
  • 14:25 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@dc3cd56]: T329089: proper reconciliation of missed page-undelete events
  • 14:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P44029 and previous config saved to /var/cache/conftool/dbconfig/20230209-142321-ladsgroup.json
  • 14:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2430.codfw.wmnet with reason: host reimage
  • 14:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P44028 and previous config saved to /var/cache/conftool/dbconfig/20230209-142157-marostegui.json
  • 14:21 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2432.codfw.wmnet with OS buster
  • 14:19 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2430.codfw.wmnet with reason: host reimage
  • 14:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2429.codfw.wmnet with reason: host reimage
  • 14:14 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2431.codfw.wmnet with OS buster
  • 14:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2429.codfw.wmnet with reason: host reimage
  • 14:14 dcausse: T329089: re-playing detected inconsistencies (missing mediawiki.page-undelete events) from 2022-10-31 to 2023-02-07 to WDQS
  • 14:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P44027 and previous config saved to /var/cache/conftool/dbconfig/20230209-141247-marostegui.json
  • 14:09 Lucas_WMDE: lucaswerkmeister-wmde@mwdebug1001:~$ mwscript namespaceDupes.php shnwikibooks --fix | tee T328634-1-unpatched.out # T328634 – finished successfully, to my surprise
  • 14:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P44026 and previous config saved to /var/cache/conftool/dbconfig/20230209-140815-ladsgroup.json
  • 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T329203)', diff saved to https://phabricator.wikimedia.org/P44025 and previous config saved to /var/cache/conftool/dbconfig/20230209-140650-marostegui.json
  • 14:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T329203)', diff saved to https://phabricator.wikimedia.org/P44024 and previous config saved to /var/cache/conftool/dbconfig/20230209-140124-marostegui.json
  • 14:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 14:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 14:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 14:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 14:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T329203)', diff saved to https://phabricator.wikimedia.org/P44023 and previous config saved to /var/cache/conftool/dbconfig/20230209-140059-marostegui.json
  • 13:58 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2430.codfw.wmnet with OS buster
  • 13:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T328817)', diff saved to https://phabricator.wikimedia.org/P44022 and previous config saved to /var/cache/conftool/dbconfig/20230209-135741-marostegui.json
  • 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T328817)', diff saved to https://phabricator.wikimedia.org/P44021 and previous config saved to /var/cache/conftool/dbconfig/20230209-135441-marostegui.json
  • 13:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2429.codfw.wmnet with OS buster
  • 13:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 13:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T328817)', diff saved to https://phabricator.wikimedia.org/P44020 and previous config saved to /var/cache/conftool/dbconfig/20230209-135420-marostegui.json
  • 13:53 joal@deploy1002: Finished deploy [airflow-dags/analytics@fbebd61]: Update analytics actor dags spark resources (duration: 00m 13s)
  • 13:53 joal@deploy1002: Started deploy [airflow-dags/analytics@fbebd61]: Update analytics actor dags spark resources
  • 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P44019 and previous config saved to /var/cache/conftool/dbconfig/20230209-135309-ladsgroup.json
  • 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P44018 and previous config saved to /var/cache/conftool/dbconfig/20230209-134553-marostegui.json
  • 13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P44017 and previous config saved to /var/cache/conftool/dbconfig/20230209-134343-ladsgroup.json
  • 13:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 13:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P44016 and previous config saved to /var/cache/conftool/dbconfig/20230209-134322-ladsgroup.json
  • 13:40 elukey: restart prometheus-statsd-exporter on ores nodes to pick up label change - T325763
  • 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P44014 and previous config saved to /var/cache/conftool/dbconfig/20230209-133914-marostegui.json
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P44013 and previous config saved to /var/cache/conftool/dbconfig/20230209-133046-marostegui.json
  • 13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P44012 and previous config saved to /var/cache/conftool/dbconfig/20230209-132815-ladsgroup.json
  • 13:27 hashar: phab2002: manually stopped `phd` service. It can't start due to the MariaDB server being set read-only and failed to start every 10 seconds since forever
  • 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P44011 and previous config saved to /var/cache/conftool/dbconfig/20230209-132407-marostegui.json
  • 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T329203)', diff saved to https://phabricator.wikimedia.org/P44010 and previous config saved to /var/cache/conftool/dbconfig/20230209-131540-marostegui.json
  • 13:15 moritzm: restarting Exim on MXes to pick up OpenSSL update
  • 13:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P44009 and previous config saved to /var/cache/conftool/dbconfig/20230209-131309-ladsgroup.json
  • 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2151 (T329203)', diff saved to https://phabricator.wikimedia.org/P44008 and previous config saved to /var/cache/conftool/dbconfig/20230209-131010-marostegui.json
  • 13:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 13:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T328817)', diff saved to https://phabricator.wikimedia.org/P44007 and previous config saved to /var/cache/conftool/dbconfig/20230209-130901-marostegui.json
  • 13:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 13:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T329203)', diff saved to https://phabricator.wikimedia.org/P44006 and previous config saved to /var/cache/conftool/dbconfig/20230209-130555-marostegui.json
  • 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T328817)', diff saved to https://phabricator.wikimedia.org/P44005 and previous config saved to /var/cache/conftool/dbconfig/20230209-130504-marostegui.json
  • 13:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 13:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T328817)', diff saved to https://phabricator.wikimedia.org/P44004 and previous config saved to /var/cache/conftool/dbconfig/20230209-130442-marostegui.json
  • 12:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P44003 and previous config saved to /var/cache/conftool/dbconfig/20230209-125803-ladsgroup.json
  • 12:52 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1001.eqiad.wmnet with OS buster
  • 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P44002 and previous config saved to /var/cache/conftool/dbconfig/20230209-125048-marostegui.json
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P44001 and previous config saved to /var/cache/conftool/dbconfig/20230209-124936-marostegui.json
  • 12:49 joal@deploy1002: Finished deploy [airflow-dags/analytics@cf9d978]: Fix analytics pageview_actor_hourly (duration: 00m 13s)
  • 12:48 joal@deploy1002: Started deploy [airflow-dags/analytics@cf9d978]: Fix analytics pageview_actor_hourly
  • 12:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P44000 and previous config saved to /var/cache/conftool/dbconfig/20230209-124837-ladsgroup.json
  • 12:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 12:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 12:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetdb2003.codfw.wmnet with OS bullseye
  • 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P43999 and previous config saved to /var/cache/conftool/dbconfig/20230209-123542-marostegui.json
  • 12:34 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1001.eqiad.wmnet with reason: host reimage
  • 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P43998 and previous config saved to /var/cache/conftool/dbconfig/20230209-123430-marostegui.json
  • 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetdb2003.codfw.wmnet with reason: host reimage
  • 12:31 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1001.eqiad.wmnet with reason: host reimage
  • 12:27 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetdb2003.codfw.wmnet with reason: host reimage
  • 12:22 phedenskog@deploy1002: Finished deploy [performance/navtiming@bb224a1]: (no justification provided) (duration: 00m 08s)
  • 12:22 phedenskog@deploy1002: Started deploy [performance/navtiming@bb224a1]: (no justification provided)
  • 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T329203)', diff saved to https://phabricator.wikimedia.org/P43997 and previous config saved to /var/cache/conftool/dbconfig/20230209-122036-marostegui.json
  • 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T328817)', diff saved to https://phabricator.wikimedia.org/P43996 and previous config saved to /var/cache/conftool/dbconfig/20230209-121923-marostegui.json
  • 12:18 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc-gp1001.eqiad.wmnet with OS buster
  • 12:17 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Move Babel settings from IS.php to ext-Babel.php, part III (T308932) (duration: 06m 47s)
  • 12:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T328817)', diff saved to https://phabricator.wikimedia.org/P43995 and previous config saved to /var/cache/conftool/dbconfig/20230209-121705-marostegui.json
  • 12:17 jiji@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mc-gp1001.eqiad.wmnet with OS bullseye
  • 12:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 12:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T328817)', diff saved to https://phabricator.wikimedia.org/P43994 and previous config saved to /var/cache/conftool/dbconfig/20230209-121644-marostegui.json
  • 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T329203)', diff saved to https://phabricator.wikimedia.org/P43993 and previous config saved to /var/cache/conftool/dbconfig/20230209-121507-marostegui.json
  • 12:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 12:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T329203)', diff saved to https://phabricator.wikimedia.org/P43992 and previous config saved to /var/cache/conftool/dbconfig/20230209-121446-marostegui.json
  • 12:12 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host puppetdb2003.codfw.wmnet with OS bullseye
  • 12:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetdb1003.eqiad.wmnet with OS bullseye
  • 12:10 ladsgroup@deploy1002: Synchronized multiversion/MWConfigCacheGenerator.php: Move Babel settings from IS.php to ext-Babel.php, part II (T308932) (duration: 06m 40s)
  • 12:06 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dse-k8s-worker1002.eqiad.wmnet with reason: Attempting to move some GPUs
  • 12:06 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dse-k8s-worker1002.eqiad.wmnet with reason: Attempting to move some GPUs
  • 12:03 ladsgroup@deploy1002: Synchronized wmf-config/ext-Babel.php: Move Babel settings from IS.php to ext-Babel.php, part I (T308932) (duration: 07m 06s)
  • 12:02 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-worker1099.eqiad.wmnet with reason: Attempting to move some GPUs
  • 12:02 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on an-worker1099.eqiad.wmnet with reason: Attempting to move some GPUs
  • 12:02 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-worker1098.eqiad.wmnet with reason: Attempting to move some GPUs
  • 12:02 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on an-worker1098.eqiad.wmnet with reason: Attempting to move some GPUs
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P43991 and previous config saved to /var/cache/conftool/dbconfig/20230209-120138-marostegui.json
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P43990 and previous config saved to /var/cache/conftool/dbconfig/20230209-115940-marostegui.json
  • 11:57 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc-gp1001.eqiad.wmnet with OS bullseye
  • 11:57 jiji@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mc-gp1001.eqiad.wmnet with OS bullseye
  • 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetdb1003.eqiad.wmnet with reason: host reimage
  • 11:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetdb1003.eqiad.wmnet with reason: host reimage
  • 11:52 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc-gp1001.eqiad.wmnet with OS bullseye
  • 11:52 jiji@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mc-gp1001.eqiad.wmnet with OS bullseye
  • 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P43989 and previous config saved to /var/cache/conftool/dbconfig/20230209-114632-marostegui.json
  • 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P43988 and previous config saved to /var/cache/conftool/dbconfig/20230209-114434-marostegui.json
  • 11:40 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host puppetdb1003.eqiad.wmnet with OS bullseye
  • 11:34 marostegui: Stop mariadb on db1098 (s6 and s7) T329171
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T328817)', diff saved to https://phabricator.wikimedia.org/P43986 and previous config saved to /var/cache/conftool/dbconfig/20230209-113125-marostegui.json
  • 11:31 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab-runner1003.eqiad.wmnet with OS bullseye
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T329203)', diff saved to https://phabricator.wikimedia.org/P43985 and previous config saved to /var/cache/conftool/dbconfig/20230209-112927-marostegui.json
  • 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T328817)', diff saved to https://phabricator.wikimedia.org/P43984 and previous config saved to /var/cache/conftool/dbconfig/20230209-112748-marostegui.json
  • 11:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 11:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T328817)', diff saved to https://phabricator.wikimedia.org/P43983 and previous config saved to /var/cache/conftool/dbconfig/20230209-112727-marostegui.json
  • 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T329203)', diff saved to https://phabricator.wikimedia.org/P43982 and previous config saved to /var/cache/conftool/dbconfig/20230209-112359-marostegui.json
  • 11:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 11:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 (T329203)', diff saved to https://phabricator.wikimedia.org/P43981 and previous config saved to /var/cache/conftool/dbconfig/20230209-112338-marostegui.json
  • 11:20 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc-gp1001.eqiad.wmnet with OS bullseye
  • 11:20 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp1001.eqiad.wmnet with OS bullseye
  • 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P43980 and previous config saved to /var/cache/conftool/dbconfig/20230209-111220-marostegui.json
  • 11:10 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab-runner1003.eqiad.wmnet with reason: host reimage
  • 11:08 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2052.codfw.wmnet with OS bullseye
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P43979 and previous config saved to /var/cache/conftool/dbconfig/20230209-110832-marostegui.json
  • 11:07 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab-runner1003.eqiad.wmnet with reason: host reimage
  • 11:02 effie: powercycle mc-gp1001
  • 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on puppetdb2003.codfw.wmnet with reason: master is being reimaged
  • 10:59 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on puppetdb2003.codfw.wmnet with reason: master is being reimaged
  • 10:58 joal@deploy1002: Finished deploy [airflow-dags/analytics@dff3f3b]: Fix analytics webrequest_actor_metrics_rollup sensor (duration: 00m 13s)
  • 10:58 joal@deploy1002: Started deploy [airflow-dags/analytics@dff3f3b]: Fix analytics webrequest_actor_metrics_rollup sensor
  • 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P43978 and previous config saved to /var/cache/conftool/dbconfig/20230209-105714-marostegui.json
  • 10:55 eoghan@cumin1001: START - Cookbook sre.hosts.reimage for host gitlab-runner1003.eqiad.wmnet with OS bullseye
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P43977 and previous config saved to /var/cache/conftool/dbconfig/20230209-105325-marostegui.json
  • 10:52 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2052.codfw.wmnet with reason: host reimage
  • 10:50 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2052.codfw.wmnet with reason: host reimage
  • 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43976 and previous config saved to /var/cache/conftool/dbconfig/20230209-104218-root.json
  • 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43975 and previous config saved to /var/cache/conftool/dbconfig/20230209-104214-root.json
  • 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T328817)', diff saved to https://phabricator.wikimedia.org/P43974 and previous config saved to /var/cache/conftool/dbconfig/20230209-104208-marostegui.json
  • 10:38 moritzm: installing containerd security updates
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 (T329203)', diff saved to https://phabricator.wikimedia.org/P43973 and previous config saved to /var/cache/conftool/dbconfig/20230209-103819-marostegui.json
  • 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T328817)', diff saved to https://phabricator.wikimedia.org/P43972 and previous config saved to /var/cache/conftool/dbconfig/20230209-103733-marostegui.json
  • 10:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 10:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T328817)', diff saved to https://phabricator.wikimedia.org/P43971 and previous config saved to /var/cache/conftool/dbconfig/20230209-103712-marostegui.json
  • 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2114 (T329203)', diff saved to https://phabricator.wikimedia.org/P43970 and previous config saved to /var/cache/conftool/dbconfig/20230209-103604-marostegui.json
  • 10:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 10:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 10:34 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc-gp1001.eqiad.wmnet with OS bullseye
  • 10:34 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2052.codfw.wmnet with OS bullseye
  • 10:31 joal@deploy1002: Finished deploy [airflow-dags/analytics@2ab6564]: Analytics deploy for 3 druid jobs and webrequest_actor jobs (duration: 00m 17s)
  • 10:31 joal@deploy1002: Started deploy [airflow-dags/analytics@2ab6564]: Analytics deploy for 3 druid jobs and webrequest_actor jobs
  • 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43968 and previous config saved to /var/cache/conftool/dbconfig/20230209-102713-root.json
  • 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43967 and previous config saved to /var/cache/conftool/dbconfig/20230209-102709-root.json
  • 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P43966 and previous config saved to /var/cache/conftool/dbconfig/20230209-102206-marostegui.json
  • 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43965 and previous config saved to /var/cache/conftool/dbconfig/20230209-101209-root.json
  • 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43964 and previous config saved to /var/cache/conftool/dbconfig/20230209-101204-root.json
  • 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P43963 and previous config saved to /var/cache/conftool/dbconfig/20230209-100700-marostegui.json
  • 10:01 kostajh: UTC morning deploys really done
  • 09:59 kharlan@deploy1002: Finished scap: Backport for ComputedUserImpactLookup: Reduce logspam for page view rate limiting (T328945) (duration: 09m 06s)
  • 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43962 and previous config saved to /var/cache/conftool/dbconfig/20230209-095704-root.json
  • 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43961 and previous config saved to /var/cache/conftool/dbconfig/20230209-095659-root.json
  • 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T328817)', diff saved to https://phabricator.wikimedia.org/P43960 and previous config saved to /var/cache/conftool/dbconfig/20230209-095153-marostegui.json
  • 09:51 kharlan@deploy1002: kharlan: Backport for ComputedUserImpactLookup: Reduce logspam for page view rate limiting (T328945) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 09:50 kharlan@deploy1002: Started scap: Backport for ComputedUserImpactLookup: Reduce logspam for page view rate limiting (T328945)
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T328817)', diff saved to https://phabricator.wikimedia.org/P43959 and previous config saved to /var/cache/conftool/dbconfig/20230209-094816-marostegui.json
  • 09:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 09:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T328817)', diff saved to https://phabricator.wikimedia.org/P43958 and previous config saved to /var/cache/conftool/dbconfig/20230209-094755-marostegui.json
  • 09:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43957 and previous config saved to /var/cache/conftool/dbconfig/20230209-094159-root.json
  • 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43956 and previous config saved to /var/cache/conftool/dbconfig/20230209-094154-root.json
  • 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P43955 and previous config saved to /var/cache/conftool/dbconfig/20230209-093248-marostegui.json
  • 09:32 godog: roll-restart opensearch-dashboards to apply memory limit - T327161
  • 09:29 kharlan@deploy1002: Finished scap: Backport for Add StatusValue::hasMessagesExcept() (T272081) (duration: 09m 20s)
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43954 and previous config saved to /var/cache/conftool/dbconfig/20230209-092654-root.json
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43953 and previous config saved to /var/cache/conftool/dbconfig/20230209-092650-root.json
  • 09:22 kharlan@deploy1002: kharlan: Backport for Add StatusValue::hasMessagesExcept() (T272081) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 09:20 kharlan@deploy1002: Started scap: Backport for Add StatusValue::hasMessagesExcept() (T272081)
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P43952 and previous config saved to /var/cache/conftool/dbconfig/20230209-091742-marostegui.json
  • 09:13 moritzm: installing openssl security updates on Bullseye
  • 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P43951 and previous config saved to /var/cache/conftool/dbconfig/20230209-091149-root.json
  • 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P43950 and previous config saved to /var/cache/conftool/dbconfig/20230209-091145-root.json
  • 09:10 marostegui: Install 10.4.28 on db1107 T329011
  • 09:10 marostegui: Install 10.6.12 on db1132 T329011
  • 09:09 vgutierrez: pool cp4044 with ESI testing enabled
  • 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 db1132', diff saved to https://phabricator.wikimedia.org/P43949 and previous config saved to /var/cache/conftool/dbconfig/20230209-090846-root.json
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T328817)', diff saved to https://phabricator.wikimedia.org/P43948 and previous config saved to /var/cache/conftool/dbconfig/20230209-090236-marostegui.json
  • 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T328817)', diff saved to https://phabricator.wikimedia.org/P43947 and previous config saved to /var/cache/conftool/dbconfig/20230209-090018-marostegui.json
  • 09:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 09:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 09:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 08:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T328817)', diff saved to https://phabricator.wikimedia.org/P43946 and previous config saved to /var/cache/conftool/dbconfig/20230209-085952-marostegui.json
  • 08:57 vgutierrez: depool cp4044 - T308799
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P43945 and previous config saved to /var/cache/conftool/dbconfig/20230209-084446-marostegui.json
  • 08:42 apergos: UTC morning backport and config training window complete
  • 08:32 kartik@deploy1002: Finished scap: Backport for CX: Provide the appropriate arguments to ve.ui.CXSurface constructor (T329154) (duration: 13m 10s)
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P43944 and previous config saved to /var/cache/conftool/dbconfig/20230209-082940-marostegui.json
  • 08:21 kartik@deploy1002: kartik: Backport for CX: Provide the appropriate arguments to ve.ui.CXSurface constructor (T329154) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 08:19 kartik@deploy1002: Started scap: Backport for CX: Provide the appropriate arguments to ve.ui.CXSurface constructor (T329154)
  • 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T328817)', diff saved to https://phabricator.wikimedia.org/P43943 and previous config saved to /var/cache/conftool/dbconfig/20230209-081433-marostegui.json
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T328817)', diff saved to https://phabricator.wikimedia.org/P43942 and previous config saved to /var/cache/conftool/dbconfig/20230209-081116-marostegui.json
  • 08:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 08:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T328817)', diff saved to https://phabricator.wikimedia.org/P43941 and previous config saved to /var/cache/conftool/dbconfig/20230209-081054-marostegui.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P43940 and previous config saved to /var/cache/conftool/dbconfig/20230209-075548-marostegui.json
  • 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P43939 and previous config saved to /var/cache/conftool/dbconfig/20230209-074042-marostegui.json
  • 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T328817)', diff saved to https://phabricator.wikimedia.org/P43938 and previous config saved to /var/cache/conftool/dbconfig/20230209-072535-marostegui.json
  • 07:24 marostegui: Stop mariadb on db1117:3321 (some dbproxy irc alerts will be triggered) T329143
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T328817)', diff saved to https://phabricator.wikimedia.org/P43936 and previous config saved to /var/cache/conftool/dbconfig/20230209-072204-marostegui.json
  • 07:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 07:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 07:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 07:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1098 (s6, s7) from dbctl T329171', diff saved to https://phabricator.wikimedia.org/P43935 and previous config saved to /var/cache/conftool/dbconfig/20230209-071013-marostegui.json
  • 07:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 07:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 07:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 07:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 07:00 marostegui: Failover m3 from db1164 to db1159 - T329141
  • 06:48 oblivian@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in eqiad: maintenance
  • 06:48 oblivian@cumin2002: START - Cookbook sre.discovery.datacenter status all services in eqiad: maintenance
  • 03:34 eileen: civicrm upgraded from 07ef73b8 to efa4c485
  • 02:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2428.codfw.wmnet with OS buster
  • 02:56 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:50 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P43934 and previous config saved to /var/cache/conftool/dbconfig/20230209-024920-ladsgroup.json
  • 02:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P43933 and previous config saved to /var/cache/conftool/dbconfig/20230209-023413-ladsgroup.json
  • 02:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2428.codfw.wmnet with reason: host reimage
  • 02:30 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2428.codfw.wmnet with reason: host reimage
  • 02:28 TheresNoTime: `[samtar@mwmaint1002 ~]$ mwscript extensions/PageAssessments/maintenance/purgeUnusedProjects.php --wiki zhwiki` for T326387
  • 02:23 TheresNoTime: `[samtar@mwmaint1002 ~]$ mwscript extensions/PageAssessments/maintenance/purgeUnusedProjects.php --wiki zhwiki --dry-run` for T326387
  • 02:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P43931 and previous config saved to /var/cache/conftool/dbconfig/20230209-021907-ladsgroup.json
  • 02:11 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2428.codfw.wmnet with OS buster
  • 02:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2427.codfw.wmnet with OS buster
  • 02:11 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:10 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2426.codfw.wmnet with OS buster
  • 02:10 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:04 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P43930 and previous config saved to /var/cache/conftool/dbconfig/20230209-020401-ladsgroup.json
  • 02:00 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2427.codfw.wmnet with reason: host reimage
  • 01:47 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2427.codfw.wmnet with reason: host reimage
  • 01:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2426.codfw.wmnet with reason: host reimage
  • 01:42 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2426.codfw.wmnet with reason: host reimage
  • 01:27 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2427.codfw.wmnet with OS buster
  • 01:23 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2426.codfw.wmnet with OS buster
  • 01:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2424.codfw.wmnet with OS buster
  • 01:22 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2425.codfw.wmnet with OS buster
  • 01:22 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:17 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:09 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P43929 and previous config saved to /var/cache/conftool/dbconfig/20230209-010450-ladsgroup.json
  • 01:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 01:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 01:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P43928 and previous config saved to /var/cache/conftool/dbconfig/20230209-010429-ladsgroup.json
  • 01:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2425.codfw.wmnet with reason: host reimage
  • 01:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 01:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 01:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T328817)', diff saved to https://phabricator.wikimedia.org/P43927 and previous config saved to /var/cache/conftool/dbconfig/20230209-010132-marostegui.json
  • 01:01 eileen: civicrm upgraded from b5d6a790 to 07ef73b8
  • 01:00 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2425.codfw.wmnet with reason: host reimage
  • 00:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2424.codfw.wmnet with reason: host reimage
  • 00:50 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2424.codfw.wmnet with reason: host reimage
  • 00:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P43926 and previous config saved to /var/cache/conftool/dbconfig/20230209-004923-ladsgroup.json
  • 00:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P43925 and previous config saved to /var/cache/conftool/dbconfig/20230209-004625-marostegui.json
  • 00:41 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2425.codfw.wmnet with OS buster
  • 00:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P43924 and previous config saved to /var/cache/conftool/dbconfig/20230209-003416-ladsgroup.json
  • 00:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P43923 and previous config saved to /var/cache/conftool/dbconfig/20230209-003119-marostegui.json
  • 00:24 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2424.codfw.wmnet with OS buster
  • 00:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2423.codfw.wmnet with OS buster
  • 00:22 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P43922 and previous config saved to /var/cache/conftool/dbconfig/20230209-001910-ladsgroup.json
  • 00:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T328817)', diff saved to https://phabricator.wikimedia.org/P43921 and previous config saved to /var/cache/conftool/dbconfig/20230209-001613-marostegui.json
  • 00:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1197 (T328817)', diff saved to https://phabricator.wikimedia.org/P43920 and previous config saved to /var/cache/conftool/dbconfig/20230209-001401-marostegui.json
  • 00:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 00:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 00:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T328817)', diff saved to https://phabricator.wikimedia.org/P43919 and previous config saved to /var/cache/conftool/dbconfig/20230209-001340-marostegui.json

2023-02-08

  • 23:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P43918 and previous config saved to /var/cache/conftool/dbconfig/20230208-235833-marostegui.json
  • 23:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P43917 and previous config saved to /var/cache/conftool/dbconfig/20230208-235157-ladsgroup.json
  • 23:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 23:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 23:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 23:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 23:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P43916 and previous config saved to /var/cache/conftool/dbconfig/20230208-235109-ladsgroup.json
  • 23:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P43915 and previous config saved to /var/cache/conftool/dbconfig/20230208-234327-marostegui.json
  • 23:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P43913 and previous config saved to /var/cache/conftool/dbconfig/20230208-233603-ladsgroup.json
  • 23:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T328817)', diff saved to https://phabricator.wikimedia.org/P43912 and previous config saved to /var/cache/conftool/dbconfig/20230208-232821-marostegui.json
  • 23:27 urbanecm@deploy1002: Finished scap: Backport for Change the trwiki logo with a temporary one (vector 2022) (T329047) (duration: 08m 32s)
  • 23:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1188 (T328817)', diff saved to https://phabricator.wikimedia.org/P43911 and previous config saved to /var/cache/conftool/dbconfig/20230208-232608-marostegui.json
  • 23:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 23:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 23:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T328817)', diff saved to https://phabricator.wikimedia.org/P43910 and previous config saved to /var/cache/conftool/dbconfig/20230208-232547-marostegui.json
  • 23:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P43909 and previous config saved to /var/cache/conftool/dbconfig/20230208-232056-ladsgroup.json
  • 23:20 urbanecm@deploy1002: superpes and urbanecm: Backport for Change the trwiki logo with a temporary one (vector 2022) (T329047) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 23:18 urbanecm@deploy1002: Started scap: Backport for Change the trwiki logo with a temporary one (vector 2022) (T329047)
  • 23:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P43908 and previous config saved to /var/cache/conftool/dbconfig/20230208-231041-marostegui.json
  • 23:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P43907 and previous config saved to /var/cache/conftool/dbconfig/20230208-230550-ladsgroup.json
  • 22:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P43906 and previous config saved to /var/cache/conftool/dbconfig/20230208-225534-marostegui.json
  • 22:51 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T328817)', diff saved to https://phabricator.wikimedia.org/P43905 and previous config saved to /var/cache/conftool/dbconfig/20230208-224028-marostegui.json
  • 22:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2423.codfw.wmnet with reason: host reimage
  • 22:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T328817)', diff saved to https://phabricator.wikimedia.org/P43904 and previous config saved to /var/cache/conftool/dbconfig/20230208-223430-marostegui.json
  • 22:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 22:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 22:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T328817)', diff saved to https://phabricator.wikimedia.org/P43903 and previous config saved to /var/cache/conftool/dbconfig/20230208-223408-marostegui.json
  • 22:32 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2423.codfw.wmnet with reason: host reimage
  • 22:29 demon@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.21 refs T325585 (duration: 06m 29s)
  • 22:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2422.codfw.wmnet with OS buster
  • 22:24 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:23 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.21 refs T325585
  • 22:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P43902 and previous config saved to /var/cache/conftool/dbconfig/20230208-221902-marostegui.json
  • 22:17 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:12 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2423.codfw.wmnet with OS buster
  • 22:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P43901 and previous config saved to /var/cache/conftool/dbconfig/20230208-220532-ladsgroup.json
  • 22:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 22:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 22:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P43900 and previous config saved to /var/cache/conftool/dbconfig/20230208-220356-marostegui.json
  • 22:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2422.codfw.wmnet with reason: host reimage
  • 21:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2421.codfw.wmnet with OS buster
  • 21:58 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:57 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2422.codfw.wmnet with reason: host reimage
  • 21:56 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T328817)', diff saved to https://phabricator.wikimedia.org/P43899 and previous config saved to /var/cache/conftool/dbconfig/20230208-214849-marostegui.json
  • 21:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T328817)', diff saved to https://phabricator.wikimedia.org/P43898 and previous config saved to /var/cache/conftool/dbconfig/20230208-214343-marostegui.json
  • 21:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 21:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 21:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T328817)', diff saved to https://phabricator.wikimedia.org/P43897 and previous config saved to /var/cache/conftool/dbconfig/20230208-214322-marostegui.json
  • 21:41 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mw2422
  • 21:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2421.codfw.wmnet with reason: host reimage
  • 21:40 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mw2422
  • 21:37 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2421.codfw.wmnet with reason: host reimage
  • 21:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P43896 and previous config saved to /var/cache/conftool/dbconfig/20230208-212815-marostegui.json
  • 21:27 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2422.codfw.wmnet with OS buster
  • 21:20 urbanecm@deploy1002: Finished scap: Backport for guwwikiquote: Add custom logo (T321247) (duration: 08m 10s)
  • 21:18 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2421.codfw.wmnet with OS buster
  • 21:13 urbanecm@deploy1002: urbanecm: Backport for guwwikiquote: Add custom logo (T321247) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 21:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P43895 and previous config saved to /var/cache/conftool/dbconfig/20230208-211309-marostegui.json
  • 21:11 urbanecm@deploy1002: Started scap: Backport for guwwikiquote: Add custom logo (T321247)
  • 21:11 urbanecm@deploy1002: Finished scap: Backport for [logos] Make logos/manage.py work again, [logos] Regenerate logos.php (duration: 07m 42s)
  • 21:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 21:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 21:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P43894 and previous config saved to /var/cache/conftool/dbconfig/20230208-210807-ladsgroup.json
  • 21:03 urbanecm@deploy1002: Started scap: Backport for [logos] Make logos/manage.py work again, [logos] Regenerate logos.php
  • 20:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T328817)', diff saved to https://phabricator.wikimedia.org/P43893 and previous config saved to /var/cache/conftool/dbconfig/20230208-205803-marostegui.json
  • 20:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P43892 and previous config saved to /var/cache/conftool/dbconfig/20230208-205301-ladsgroup.json
  • 20:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T328817)', diff saved to https://phabricator.wikimedia.org/P43891 and previous config saved to /var/cache/conftool/dbconfig/20230208-205211-marostegui.json
  • 20:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 20:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 20:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 20:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 20:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T328817)', diff saved to https://phabricator.wikimedia.org/P43890 and previous config saved to /var/cache/conftool/dbconfig/20230208-205133-marostegui.json
  • 20:48 rzl: enabled puppet on C:profile::mediawiki::webserver - T306015
  • 20:43 rzl: disabling puppet on C:profile::mediawiki::webserver to merge and test 887434 - T306015
  • 20:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P43889 and previous config saved to /var/cache/conftool/dbconfig/20230208-203755-ladsgroup.json
  • 20:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P43888 and previous config saved to /var/cache/conftool/dbconfig/20230208-203627-marostegui.json
  • 20:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P43887 and previous config saved to /var/cache/conftool/dbconfig/20230208-202249-ladsgroup.json
  • 20:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P43886 and previous config saved to /var/cache/conftool/dbconfig/20230208-202120-marostegui.json
  • 20:17 demon@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.22 refs T325585 (duration: 06m 33s)
  • 20:11 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.22 refs T325585
  • 20:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T328817)', diff saved to https://phabricator.wikimedia.org/P43885 and previous config saved to /var/cache/conftool/dbconfig/20230208-200614-marostegui.json
  • 20:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2420.codfw.wmnet with OS buster
  • 20:04 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T328817)', diff saved to https://phabricator.wikimedia.org/P43884 and previous config saved to /var/cache/conftool/dbconfig/20230208-200006-marostegui.json
  • 20:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 19:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 19:58 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 19:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 19:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T328817)', diff saved to https://phabricator.wikimedia.org/P43883 and previous config saved to /var/cache/conftool/dbconfig/20230208-195542-marostegui.json
  • 19:52 dancy@deploy1002: say aborted: (duration: 00m 01s)
  • 19:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T328817)', diff saved to https://phabricator.wikimedia.org/P43882 and previous config saved to /var/cache/conftool/dbconfig/20230208-194745-marostegui.json
  • 19:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2420.codfw.wmnet with reason: host reimage
  • 19:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P43881 and previous config saved to /var/cache/conftool/dbconfig/20230208-194036-marostegui.json
  • 19:39 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2420.codfw.wmnet with reason: host reimage
  • 19:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P43880 and previous config saved to /var/cache/conftool/dbconfig/20230208-193239-marostegui.json
  • 19:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P43879 and previous config saved to /var/cache/conftool/dbconfig/20230208-192530-marostegui.json
  • 19:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P43878 and previous config saved to /var/cache/conftool/dbconfig/20230208-192136-ladsgroup.json
  • 19:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 19:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 19:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P43877 and previous config saved to /var/cache/conftool/dbconfig/20230208-192115-ladsgroup.json
  • 19:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P43876 and previous config saved to /var/cache/conftool/dbconfig/20230208-191732-marostegui.json
  • 19:12 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2420.codfw.wmnet with OS buster
  • 19:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T328817)', diff saved to https://phabricator.wikimedia.org/P43875 and previous config saved to /var/cache/conftool/dbconfig/20230208-191023-marostegui.json
  • 19:08 milimetric@deploy1002: Finished deploy [analytics/refinery@9101b03] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@9101b03] (duration: 01m 20s)
  • 19:06 milimetric@deploy1002: Started deploy [analytics/refinery@9101b03] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@9101b03]
  • 19:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P43874 and previous config saved to /var/cache/conftool/dbconfig/20230208-190608-ladsgroup.json
  • 19:06 milimetric@deploy1002: Finished deploy [analytics/refinery@9101b03] (thin): Regular analytics weekly train THIN [analytics/refinery@9101b03] (duration: 00m 07s)
  • 19:05 milimetric@deploy1002: Started deploy [analytics/refinery@9101b03] (thin): Regular analytics weekly train THIN [analytics/refinery@9101b03]
  • 19:05 milimetric@deploy1002: Finished deploy [analytics/refinery@9101b03]: Regular analytics weekly train [analytics/refinery@9101b03] (duration: 06m 28s)
  • 19:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T328817)', diff saved to https://phabricator.wikimedia.org/P43873 and previous config saved to /var/cache/conftool/dbconfig/20230208-190410-marostegui.json
  • 19:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 19:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 19:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T328817)', diff saved to https://phabricator.wikimedia.org/P43872 and previous config saved to /var/cache/conftool/dbconfig/20230208-190349-marostegui.json
  • 19:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T328817)', diff saved to https://phabricator.wikimedia.org/P43871 and previous config saved to /var/cache/conftool/dbconfig/20230208-190226-marostegui.json
  • 18:59 milimetric@deploy1002: Started deploy [analytics/refinery@9101b03]: Regular analytics weekly train [analytics/refinery@9101b03]
  • 18:57 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mw2420.codfw.wmnet with OS buster
  • 18:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T328817)', diff saved to https://phabricator.wikimedia.org/P43870 and previous config saved to /var/cache/conftool/dbconfig/20230208-185513-marostegui.json
  • 18:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 18:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 18:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T328817)', diff saved to https://phabricator.wikimedia.org/P43869 and previous config saved to /var/cache/conftool/dbconfig/20230208-185451-marostegui.json
  • 18:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P43868 and previous config saved to /var/cache/conftool/dbconfig/20230208-185102-ladsgroup.json
  • 18:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P43867 and previous config saved to /var/cache/conftool/dbconfig/20230208-184842-marostegui.json
  • 18:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2420.codfw.wmnet with OS buster
  • 18:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P43866 and previous config saved to /var/cache/conftool/dbconfig/20230208-183945-marostegui.json
  • 18:39 mutante: adding az.wikimedia.org to DNS - approved by affcom T306015
  • 18:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P43865 and previous config saved to /var/cache/conftool/dbconfig/20230208-183556-ladsgroup.json
  • 18:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P43864 and previous config saved to /var/cache/conftool/dbconfig/20230208-183336-marostegui.json
  • 18:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P43863 and previous config saved to /var/cache/conftool/dbconfig/20230208-182439-marostegui.json
  • 18:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T328817)', diff saved to https://phabricator.wikimedia.org/P43862 and previous config saved to /var/cache/conftool/dbconfig/20230208-181829-marostegui.json
  • 18:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1122 (T328817)', diff saved to https://phabricator.wikimedia.org/P43861 and previous config saved to /var/cache/conftool/dbconfig/20230208-181222-marostegui.json
  • 18:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 18:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 18:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T328817)', diff saved to https://phabricator.wikimedia.org/P43860 and previous config saved to /var/cache/conftool/dbconfig/20230208-181200-marostegui.json
  • 18:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T328817)', diff saved to https://phabricator.wikimedia.org/P43859 and previous config saved to /var/cache/conftool/dbconfig/20230208-180933-marostegui.json
  • 18:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T328817)', diff saved to https://phabricator.wikimedia.org/P43858 and previous config saved to /var/cache/conftool/dbconfig/20230208-180216-marostegui.json
  • 18:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 18:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 18:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T328817)', diff saved to https://phabricator.wikimedia.org/P43857 and previous config saved to /var/cache/conftool/dbconfig/20230208-180153-marostegui.json
  • 17:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P43856 and previous config saved to /var/cache/conftool/dbconfig/20230208-175654-marostegui.json
  • 17:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P43855 and previous config saved to /var/cache/conftool/dbconfig/20230208-174647-marostegui.json
  • 17:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P43854 and previous config saved to /var/cache/conftool/dbconfig/20230208-174148-marostegui.json
  • 17:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T328817)', diff saved to https://phabricator.wikimedia.org/P43850 and previous config saved to /var/cache/conftool/dbconfig/20230208-172021-marostegui.json
  • 17:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 17:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 17:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T328817)', diff saved to https://phabricator.wikimedia.org/P43849 and previous config saved to /var/cache/conftool/dbconfig/20230208-171634-marostegui.json
  • 17:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 17:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 17:14 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on rpki2002.codfw.wmnet with reason: Restarting to increase VM RAM allocation
  • 17:13 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on rpki2002.codfw.wmnet with reason: Restarting to increase VM RAM allocation
  • 17:11 oblivian@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in codfw: maintenance
  • 17:11 oblivian@cumin2002: START - Cookbook sre.discovery.datacenter status all services in codfw: maintenance
  • 17:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T328817)', diff saved to https://phabricator.wikimedia.org/P43848 and previous config saved to /var/cache/conftool/dbconfig/20230208-171028-marostegui.json
  • 17:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 17:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 17:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T328817)', diff saved to https://phabricator.wikimedia.org/P43847 and previous config saved to /var/cache/conftool/dbconfig/20230208-171006-marostegui.json
  • 17:09 jynus: disable bacula job backup1002.eqiad.wmnet-Weekly-Thu-EsRwCodfw-mysql-srv-backups-dumps-latest
  • 17:08 oblivian@cumin2002: END (FAIL) - Cookbook sre.discovery.datacenter (exit_code=93) status all services in codfw: maintenance
  • 17:08 oblivian@cumin2002: START - Cookbook sre.discovery.datacenter status all services in codfw: maintenance
  • 16:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P43844 and previous config saved to /var/cache/conftool/dbconfig/20230208-165500-marostegui.json
  • 16:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 16:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 16:45 ladsgroup@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 16:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 16:44 sukhe: [done] rolling restart of haproxy and trafficserver in A:cp
  • 16:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 16:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 16:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P43842 and previous config saved to /var/cache/conftool/dbconfig/20230208-163954-marostegui.json
  • 16:27 sukhe: rolling restart of haproxy and trafficserver in A:cp
  • 16:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T328817)', diff saved to https://phabricator.wikimedia.org/P43841 and previous config saved to /var/cache/conftool/dbconfig/20230208-162447-marostegui.json
  • 16:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3312 (T328817)', diff saved to https://phabricator.wikimedia.org/P43840 and previous config saved to /var/cache/conftool/dbconfig/20230208-161828-marostegui.json
  • 16:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 16:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 16:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T328817)', diff saved to https://phabricator.wikimedia.org/P43839 and previous config saved to /var/cache/conftool/dbconfig/20230208-161807-marostegui.json
  • 16:14 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on rpki1001.eqiad.wmnet with reason: Restarting to increase VM RAM allocation
  • 16:14 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on rpki1001.eqiad.wmnet with reason: Restarting to increase VM RAM allocation
  • 16:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P43838 and previous config saved to /var/cache/conftool/dbconfig/20230208-160301-marostegui.json
  • 15:50 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw2420.codfw.wmnet with OS buster
  • 15:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P43837 and previous config saved to /var/cache/conftool/dbconfig/20230208-154754-marostegui.json
  • 15:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T328817)', diff saved to https://phabricator.wikimedia.org/P43836 and previous config saved to /var/cache/conftool/dbconfig/20230208-154420-marostegui.json
  • 15:38 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new test VM in drmrs - jmm@cumin2002"
  • 15:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T328817)', diff saved to https://phabricator.wikimedia.org/P43835 and previous config saved to /var/cache/conftool/dbconfig/20230208-153248-marostegui.json
  • 15:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T328817)', diff saved to https://phabricator.wikimedia.org/P43834 and previous config saved to /var/cache/conftool/dbconfig/20230208-153022-marostegui.json
  • 15:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 15:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 15:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 15:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 15:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T328817)', diff saved to https://phabricator.wikimedia.org/P43833 and previous config saved to /var/cache/conftool/dbconfig/20230208-152956-marostegui.json
  • 15:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P43832 and previous config saved to /var/cache/conftool/dbconfig/20230208-152913-marostegui.json
  • 15:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P43831 and previous config saved to /var/cache/conftool/dbconfig/20230208-151450-marostegui.json
  • 15:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P43830 and previous config saved to /var/cache/conftool/dbconfig/20230208-151407-marostegui.json
  • 15:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:10 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 15:02 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2420.codfw.wmnet with OS buster
  • 15:00 jforrester@deploy1002: Finished scap: Backport for Add a wordmark to itwiktionary (T329168) (duration: 08m 26s)
  • 14:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P43829 and previous config saved to /var/cache/conftool/dbconfig/20230208-145944-marostegui.json
  • 14:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T328817)', diff saved to https://phabricator.wikimedia.org/P43828 and previous config saved to /var/cache/conftool/dbconfig/20230208-145901-marostegui.json
  • 14:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T328817)', diff saved to https://phabricator.wikimedia.org/P43826 and previous config saved to /var/cache/conftool/dbconfig/20230208-145651-marostegui.json
  • 14:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 14:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 14:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 14:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 14:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T328817)', diff saved to https://phabricator.wikimedia.org/P43825 and previous config saved to /var/cache/conftool/dbconfig/20230208-145625-marostegui.json
  • 14:53 jforrester@deploy1002: superpes and jforrester: Backport for Add a wordmark to itwiktionary (T329168) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 14:51 jforrester@deploy1002: Started scap: Backport for Add a wordmark to itwiktionary (T329168)
  • 14:50 jforrester@deploy1002: Finished scap: Backport for Replace trwiki temporary legacy logo with one including the wordmark (T329047) (duration: 07m 52s)
  • 14:50 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new test VM in drmrs - jmm@cumin2002"
  • 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T328817)', diff saved to https://phabricator.wikimedia.org/P43824 and previous config saved to /var/cache/conftool/dbconfig/20230208-144437-marostegui.json
  • 14:44 jforrester@deploy1002: jforrester and superpes: Backport for Replace trwiki temporary legacy logo with one including the wordmark (T329047) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 14:42 jforrester@deploy1002: Started scap: Backport for Replace trwiki temporary legacy logo with one including the wordmark (T329047)
  • 14:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P43823 and previous config saved to /var/cache/conftool/dbconfig/20230208-144119-marostegui.json
  • 14:41 jforrester@deploy1002: Finished scap: Backport for Move non-variant wgMFUseWikibase to CommonSettings (duration: 07m 37s)
  • 14:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T328817)', diff saved to https://phabricator.wikimedia.org/P43822 and previous config saved to /var/cache/conftool/dbconfig/20230208-143756-marostegui.json
  • 14:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 14:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 14:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T328817)', diff saved to https://phabricator.wikimedia.org/P43821 and previous config saved to /var/cache/conftool/dbconfig/20230208-143735-marostegui.json
  • 14:35 jforrester@deploy1002: jforrester: Backport for Move non-variant wgMFUseWikibase to CommonSettings synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 14:33 jforrester@deploy1002: Started scap: Backport for Move non-variant wgMFUseWikibase to CommonSettings
  • 14:31 jforrester@deploy1002: Finished scap: Backport for Move non-variant wgMFNearby to CommonSettings (duration: 08m 52s)
  • 14:29 moritzm: test install of testvm6001 T327867
  • 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P43820 and previous config saved to /var/cache/conftool/dbconfig/20230208-142613-marostegui.json
  • 14:24 jforrester@deploy1002: jforrester: Backport for Move non-variant wgMFNearby to CommonSettings synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 14:22 jforrester@deploy1002: Started scap: Backport for Move non-variant wgMFNearby to CommonSettings
  • 14:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P43819 and previous config saved to /var/cache/conftool/dbconfig/20230208-142229-marostegui.json
  • 14:20 jforrester@deploy1002: Finished scap: Backport for Replace wgBetaFeaturesWhitelist with wgBetaFeaturesAllowList, Part II (duration: 07m 48s)
  • 14:14 jforrester@deploy1002: jforrester: Backport for Replace wgBetaFeaturesWhitelist with wgBetaFeaturesAllowList, Part II synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 14:12 jforrester@deploy1002: Started scap: Backport for Replace wgBetaFeaturesWhitelist with wgBetaFeaturesAllowList, Part II
  • 14:11 jforrester@deploy1002: Finished scap: Backport for Replace wgBetaFeaturesWhitelist with wgBetaFeaturesAllowList, Part I (duration: 08m 12s)
  • 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T328817)', diff saved to https://phabricator.wikimedia.org/P43818 and previous config saved to /var/cache/conftool/dbconfig/20230208-141106-marostegui.json
  • 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T328817)', diff saved to https://phabricator.wikimedia.org/P43817 and previous config saved to /var/cache/conftool/dbconfig/20230208-140859-marostegui.json
  • 14:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 14:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T328817)', diff saved to https://phabricator.wikimedia.org/P43816 and previous config saved to /var/cache/conftool/dbconfig/20230208-140837-marostegui.json
  • 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P43815 and previous config saved to /var/cache/conftool/dbconfig/20230208-140722-marostegui.json
  • 14:05 jforrester@deploy1002: jforrester: Backport for Replace wgBetaFeaturesWhitelist with wgBetaFeaturesAllowList, Part I synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 14:03 jforrester@deploy1002: Started scap: Backport for Replace wgBetaFeaturesWhitelist with wgBetaFeaturesAllowList, Part I
  • 13:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P43813 and previous config saved to /var/cache/conftool/dbconfig/20230208-135331-marostegui.json
  • 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T328817)', diff saved to https://phabricator.wikimedia.org/P43812 and previous config saved to /var/cache/conftool/dbconfig/20230208-135216-marostegui.json
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T328817)', diff saved to https://phabricator.wikimedia.org/P43811 and previous config saved to /var/cache/conftool/dbconfig/20230208-134950-marostegui.json
  • 13:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 13:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 13:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 13:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P43810 and previous config saved to /var/cache/conftool/dbconfig/20230208-133825-marostegui.json
  • 13:33 jbond: (correction) send puppet.esams.wmnet to eqiad and puppet.esqin.wmnet to codfw
  • 13:33 jbond: send puppet.esams.wmnet to eqiad and puppet.esams.wmnet to codfw
  • 13:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T328817)', diff saved to https://phabricator.wikimedia.org/P43809 and previous config saved to /var/cache/conftool/dbconfig/20230208-132318-marostegui.json
  • 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T328817)', diff saved to https://phabricator.wikimedia.org/P43808 and previous config saved to /var/cache/conftool/dbconfig/20230208-131409-marostegui.json
  • 13:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 13:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T328817)', diff saved to https://phabricator.wikimedia.org/P43807 and previous config saved to /var/cache/conftool/dbconfig/20230208-131348-marostegui.json
  • 13:03 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab-runner1002.eqiad.wmnet with OS bullseye
  • 13:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 13150
  • 12:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 13150
  • 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P43805 and previous config saved to /var/cache/conftool/dbconfig/20230208-125841-marostegui.json
  • 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P43804 and previous config saved to /var/cache/conftool/dbconfig/20230208-124335-marostegui.json
  • 12:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1096.eqiad.wmnet
  • 12:37 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:37 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1096.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 12:36 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1096.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 12:34 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 12:29 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1096.eqiad.wmnet
  • 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T328817)', diff saved to https://phabricator.wikimedia.org/P43803 and previous config saved to /var/cache/conftool/dbconfig/20230208-122829-marostegui.json
  • 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T328817)', diff saved to https://phabricator.wikimedia.org/P43802 and previous config saved to /var/cache/conftool/dbconfig/20230208-122620-marostegui.json
  • 12:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 12:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T328817)', diff saved to https://phabricator.wikimedia.org/P43801 and previous config saved to /var/cache/conftool/dbconfig/20230208-122559-marostegui.json
  • 12:21 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling restart_daemons on A:ldap-replicas
  • 12:19 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling restart_daemons on A:ldap-replicas
  • 12:18 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab-runner1002.eqiad.wmnet with reason: host reimage
  • 12:15 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab-runner1002.eqiad.wmnet with reason: host reimage
  • 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: kafka-stretch2002.codfw.wmnet
  • 12:13 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: kafka-stretch2002.codfw.wmnet
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P43800 and previous config saved to /var/cache/conftool/dbconfig/20230208-121053-marostegui.json
  • 12:03 eoghan@cumin1001: START - Cookbook sre.hosts.reimage for host gitlab-runner1002.eqiad.wmnet with OS bullseye
  • 11:59 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dse-k8s-worker1001.eqiad.wmnet with reason: Attempting to move some GPUs
  • 11:59 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dse-k8s-worker1001.eqiad.wmnet with reason: Attempting to move some GPUs
  • 11:58 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-worker1097.eqiad.wmnet with reason: Attempting to move some GPUs
  • 11:57 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on an-worker1097.eqiad.wmnet with reason: Attempting to move some GPUs
  • 11:57 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-worker1096.eqiad.wmnet with reason: Attempting to move some GPUs
  • 11:57 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on an-worker1096.eqiad.wmnet with reason: Attempting to move some GPUs
  • 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: moss-be1001.eqiad.wmnet
  • 11:56 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: moss-be1001.eqiad.wmnet
  • 11:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P43799 and previous config saved to /var/cache/conftool/dbconfig/20230208-115546-marostegui.json
  • 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: flowspec1001.eqiad.wmnet
  • 11:53 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: flowspec1001.eqiad.wmnet
  • 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T328817)', diff saved to https://phabricator.wikimedia.org/P43798 and previous config saved to /var/cache/conftool/dbconfig/20230208-114040-marostegui.json
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T328817)', diff saved to https://phabricator.wikimedia.org/P43797 and previous config saved to /var/cache/conftool/dbconfig/20230208-113832-marostegui.json
  • 11:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 11:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 11:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 11:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 11:13 marostegui: Stop mysql on db1096 (s5,s6) T329147
  • 11:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 11:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T328817)', diff saved to https://phabricator.wikimedia.org/P43796 and previous config saved to /var/cache/conftool/dbconfig/20230208-110507-marostegui.json
  • 10:57 zabe@deploy1002: Finished scap: Backport for Remove cul_reason comment table migration code (T233004 T329151) (duration: 08m 05s)
  • 10:51 zabe@deploy1002: zabe: Backport for Remove cul_reason comment table migration code (T233004 T329151) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 10:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P43793 and previous config saved to /var/cache/conftool/dbconfig/20230208-105001-marostegui.json
  • 10:49 zabe@deploy1002: Started scap: Backport for Remove cul_reason comment table migration code (T233004 T329151)
  • 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe
  • 10:35 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
  • 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P43791 and previous config saved to /var/cache/conftool/dbconfig/20230208-103455-marostegui.json
  • 10:33 volans: deploying python3-wmflib_1.2.1 to the fleet
  • 10:28 zabe@deploy1002: Finished scap: Backport for Revert "slwiki: Raise AF emergency disable treshold+count" (T328366) (duration: 08m 49s)
  • 10:26 marostegui: Failover m2-master from dbproxy1013 to dbproxy1015 T329073
  • 10:21 zabe@deploy1002: zabe: Backport for Revert "slwiki: Raise AF emergency disable treshold+count" (T328366) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 10:19 zabe@deploy1002: Started scap: Backport for Revert "slwiki: Raise AF emergency disable treshold+count" (T328366)
  • 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T328817)', diff saved to https://phabricator.wikimedia.org/P43790 and previous config saved to /var/cache/conftool/dbconfig/20230208-101948-marostegui.json
  • 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T328817)', diff saved to https://phabricator.wikimedia.org/P43789 and previous config saved to /var/cache/conftool/dbconfig/20230208-101534-marostegui.json
  • 10:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 10:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T328817)', diff saved to https://phabricator.wikimedia.org/P43788 and previous config saved to /var/cache/conftool/dbconfig/20230208-101512-marostegui.json
  • 10:08 phedenskog@deploy1002: Finished deploy [performance/navtiming@079891a]: (no justification provided) (duration: 00m 08s)
  • 10:08 phedenskog@deploy1002: Started deploy [performance/navtiming@079891a]: (no justification provided)
  • 10:07 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: Test Upgrade GitLab Replica gitlab1003 with invalid version
  • 10:07 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Test Upgrade GitLab Replica gitlab1003 with invalid version
  • 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P43787 and previous config saved to /var/cache/conftool/dbconfig/20230208-100006-marostegui.json
  • 09:59 moritzm: installing openssl security updates on bullseye
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1096 (s5,s6) from dbctl T329147', diff saved to https://phabricator.wikimedia.org/P43786 and previous config saved to /var/cache/conftool/dbconfig/20230208-095207-marostegui.json
  • 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P43785 and previous config saved to /var/cache/conftool/dbconfig/20230208-094500-marostegui.json
  • 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T328817)', diff saved to https://phabricator.wikimedia.org/P43783 and previous config saved to /var/cache/conftool/dbconfig/20230208-092954-marostegui.json
  • 09:14 godog: purge user_auth table on grafana1002 - T328784
  • 08:54 moritzm: installing imagemagick security updates
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T328817)', diff saved to https://phabricator.wikimedia.org/P43782 and previous config saved to /var/cache/conftool/dbconfig/20230208-082938-marostegui.json
  • 08:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 08:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T328817)', diff saved to https://phabricator.wikimedia.org/P43781 and previous config saved to /var/cache/conftool/dbconfig/20230208-082916-marostegui.json
  • 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P43780 and previous config saved to /var/cache/conftool/dbconfig/20230208-081410-marostegui.json
  • 08:02 marostegui: dbmaint deploy schema change on s3 eqiad (with replication) T328807 T328828
  • 07:59 marostegui: dbmaint deploy schema change on s1 eqiad (with replication) T328807 T328828
  • 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P43779 and previous config saved to /var/cache/conftool/dbconfig/20230208-075903-marostegui.json
  • 07:56 marostegui: dbmaint deploy schema change on s7 eqiad (with replication) T328807 T328828
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T328817)', diff saved to https://phabricator.wikimedia.org/P43778 and previous config saved to /var/cache/conftool/dbconfig/20230208-074357-marostegui.json
  • 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 (T328817)', diff saved to https://phabricator.wikimedia.org/P43777 and previous config saved to /var/cache/conftool/dbconfig/20230208-073837-marostegui.json
  • 07:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 07:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T328817)', diff saved to https://phabricator.wikimedia.org/P43776 and previous config saved to /var/cache/conftool/dbconfig/20230208-073816-marostegui.json
  • 07:38 marostegui: dbmaint deploy schema change on s2 eqiad (with replication) T328807 T328828
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P43775 and previous config saved to /var/cache/conftool/dbconfig/20230208-072310-marostegui.json
  • 07:19 marostegui: dbmaint deploy schema change on s5 eqiad (with replication) T328807 T328828
  • 07:18 marostegui: dbmaint deploy schema change on s4 eqiad (with replication) T328807 T328828
  • 07:18 marostegui: dbmaint deploy schema change on s8 eqiad (with replication) T328807 T328828
  • 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P43774 and previous config saved to /var/cache/conftool/dbconfig/20230208-070803-marostegui.json
  • 07:07 marostegui: Install 10.6.12 on pc2014 T329011
  • 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T328817)', diff saved to https://phabricator.wikimedia.org/P43773 and previous config saved to /var/cache/conftool/dbconfig/20230208-065257-marostegui.json
  • 06:52 marostegui@deploy1002: Finished scap: Backport for ProductionServices.php: Promote pc2011 back to pc1 master (duration: 16m 01s)
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1173 (T328817)', diff saved to https://phabricator.wikimedia.org/P43772 and previous config saved to /var/cache/conftool/dbconfig/20230208-065149-marostegui.json
  • 06:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 06:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 06:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 10 hosts with reason: Maintenance
  • 06:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 10 hosts with reason: Maintenance
  • 06:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 06:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43771 and previous config saved to /var/cache/conftool/dbconfig/20230208-064405-root.json
  • 06:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T328817)', diff saved to https://phabricator.wikimedia.org/P43770 and previous config saved to /var/cache/conftool/dbconfig/20230208-064134-marostegui.json
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1173 (T328817)', diff saved to https://phabricator.wikimedia.org/P43769 and previous config saved to /var/cache/conftool/dbconfig/20230208-064027-marostegui.json
  • 06:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 06:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 06:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 10 hosts with reason: Maintenance
  • 06:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 10 hosts with reason: Maintenance
  • 06:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 06:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 06:38 marostegui@deploy1002: marostegui: Backport for ProductionServices.php: Promote pc2011 back to pc1 master synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 06:36 marostegui@deploy1002: Started scap: Backport for ProductionServices.php: Promote pc2011 back to pc1 master
  • 03:58 AndyRussG: payments-wiki upgraded from 53d1a58d to 61ea310d, config revision changed from 5e707565 to 4d90c211
  • 02:58 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw2420.codfw.wmnet with OS buster
  • 01:43 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mw2420.codfw.wmnet with OS buster
  • 01:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mw2435']
  • 01:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mw2434']
  • 01:00 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2435']
  • 01:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mw2433']
  • 01:00 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2434']
  • 00:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mw2432']
  • 00:52 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2433']
  • 00:52 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2432']
  • 00:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mw2431']
  • 00:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mw2430']
  • 00:43 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2431']
  • 00:43 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2430']
  • 00:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mw2429']
  • 00:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mw2428']
  • 00:32 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2429']
  • 00:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mw2427']
  • 00:32 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2428']
  • 00:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mw2426']
  • 00:22 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2427']
  • 00:17 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2426']
  • 00:07 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['mw2424']
  • 00:06 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['mw2425']

2023-02-07

  • 23:56 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2425']
  • 23:56 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2424']
  • 23:51 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['mw2423']
  • 23:49 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['mw2422']
  • 23:32 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2423']
  • 23:32 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2422']
  • 23:31 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['mw2421']
  • 23:30 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['mw2420']
  • 23:23 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2421']
  • 23:22 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2420']
  • 23:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2434.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2435.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:59 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2435.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:59 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2434.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2432.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2433.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:46 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2433.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:45 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2432.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new mw nodes in B8 - pt1979@cumin2002"
  • 22:43 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new mw nodes in B8 - pt1979@cumin2002"
  • 22:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2430.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:41 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 22:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2431.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:31 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2431.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:31 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2430.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2429.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2428.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:16 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2429.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:16 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2428.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:15 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:15 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new mw nodes in B6 - pt1979@cumin2002"
  • 22:14 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new mw nodes in B6 - pt1979@cumin2002"
  • 22:12 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 22:10 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "provision new Ganeti VM an-airflow1005 - bking@cumin1001 - T327970"
  • 22:08 urbanecm@deploy1002: Finished scap: Backport for Allow AbuseFilter to block IPs and users on itwikiversity (T328194) (duration: 08m 23s)
  • 22:07 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "provision new Ganeti VM an-airflow1005 - bking@cumin1001 - T327970"
  • 22:02 urbanecm@deploy1002: urbanecm and superpes: Backport for Allow AbuseFilter to block IPs and users on itwikiversity (T328194) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 22:00 urbanecm@deploy1002: Started scap: Backport for Allow AbuseFilter to block IPs and users on itwikiversity (T328194)
  • 21:59 urbanecm@deploy1002: Finished scap: Backport for Change the trwiki logo with a temporary one (old vector) (T329047) (duration: 10m 20s)
  • 21:51 urbanecm@deploy1002: superpes and urbanecm: Backport for Change the trwiki logo with a temporary one (old vector) (T329047) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 21:49 urbanecm@deploy1002: Started scap: Backport for Change the trwiki logo with a temporary one (old vector) (T329047)
  • 21:48 urbanecm@deploy1002: Finished scap: Backport for Install WikiLove extension on bnwikiquote (T328834) (duration: 15m 32s)
  • 21:35 urbanecm@deploy1002: superpes and urbanecm: Backport for Install WikiLove extension on bnwikiquote (T328834) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 21:34 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2051.codfw.wmnet with OS bullseye
  • 21:33 urbanecm: Create extension tables for Wikilove on bnwikiquote (T328834)
  • 21:33 urbanecm@deploy1002: Started scap: Backport for Install WikiLove extension on bnwikiquote (T328834)
  • 21:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2426.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:31 urbanecm@deploy1002: Finished scap: Backport for Disable languages on history page (T328996), Remove button styling from log in link (T289212), [followup] mediawiki.feedlink: Atom's link icon overlaps the link (T327717) (duration: 11m 10s)
  • 21:29 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1053.eqiad.wmnet with OS bullseye
  • 21:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2427.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:24 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2427.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:22 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2427.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:22 urbanecm@deploy1002: urbanecm and jdlrobson: Backport for Disable languages on history page (T328996), Remove button styling from log in link (T289212), [followup] mediawiki.feedlink: Atom's link icon overlaps the link (T327717) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 21:21 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2426.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:21 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2426.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:20 urbanecm@deploy1002: Started scap: Backport for Disable languages on history page (T328996), Remove button styling from log in link (T289212), [followup] mediawiki.feedlink: Atom's link icon overlaps the link (T327717)
  • 21:18 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2051.codfw.wmnet with reason: host reimage
  • 21:17 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2427.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1053.eqiad.wmnet with reason: host reimage
  • 21:14 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2051.codfw.wmnet with reason: host reimage
  • 21:12 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1053.eqiad.wmnet with reason: host reimage
  • 21:12 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2426.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:02 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: wgEventSreams - Fix android session schema path (duration: 07m 26s)
  • 21:01 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1053.eqiad.wmnet with OS bullseye
  • 20:58 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2051.codfw.wmnet with OS bullseye
  • 20:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2050.codfw.wmnet with OS bullseye
  • 20:50 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1051.eqiad.wmnet with OS bullseye
  • 20:44 bking@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-airflow1005.eqiad.wmnet
  • 20:41 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2050.codfw.wmnet with reason: host reimage
  • 20:38 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2050.codfw.wmnet with reason: host reimage
  • 20:36 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1051.eqiad.wmnet with reason: host reimage
  • 20:33 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1051.eqiad.wmnet with reason: host reimage
  • 20:21 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2050.codfw.wmnet with OS bullseye
  • 20:21 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1051.eqiad.wmnet with OS bullseye
  • 20:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2425.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2424.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:08 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2425.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:04 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2424.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:59 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2424.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:58 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2425.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:57 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-airflow1005.eqiad.wmnet on all recursors
  • 19:57 bking@cumin1001: START - Cookbook sre.dns.wipe-cache an-airflow1005.eqiad.wmnet on all recursors
  • 19:57 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:57 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM an-airflow1005.eqiad.wmnet - bking@cumin1001"
  • 19:56 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM an-airflow1005.eqiad.wmnet - bking@cumin1001"
  • 19:55 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2424.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:55 demon@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.22 refs T325585
  • 19:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2424.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:53 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 19:53 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-airflow1005.eqiad.wmnet
  • 19:48 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2425.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:47 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2424.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2423.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2422.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:46 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2423.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:45 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2422.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:44 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2423.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:44 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2422.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:39 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2049.codfw.wmnet with OS bullseye
  • 19:33 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1049.eqiad.wmnet with OS bullseye
  • 19:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2049.codfw.wmnet with reason: host reimage
  • 19:20 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2049.codfw.wmnet with reason: host reimage
  • 19:18 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1049.eqiad.wmnet with reason: host reimage
  • 19:15 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1049.eqiad.wmnet with reason: host reimage
  • 19:04 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1049.eqiad.wmnet with OS bullseye
  • 19:03 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2049.codfw.wmnet with OS bullseye
  • 19:03 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2423.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:01 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2422.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:00 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:00 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add mw2423,25,26,27 DNS - pt1979@cumin2002"
  • 19:00 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add mw2423,25,26,27 DNS - pt1979@cumin2002"
  • 18:57 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 18:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2048.codfw.wmnet with OS bullseye
  • 18:47 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1047.eqiad.wmnet with OS bullseye
  • 18:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2048.codfw.wmnet with reason: host reimage
  • 18:34 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2048.codfw.wmnet with reason: host reimage
  • 18:32 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1047.eqiad.wmnet with reason: host reimage
  • 18:29 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1047.eqiad.wmnet with reason: host reimage
  • 18:18 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2048.codfw.wmnet with OS bullseye
  • 18:17 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1047.eqiad.wmnet with OS bullseye
  • 18:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 13 hosts
  • 18:02 bking@cumin2002: START - Cookbook sre.hosts.remove-downtime for 13 hosts
  • 17:55 inflatador: bking@cumin1001 repooling elastic and wdqs hosts post-maintenance T327925
  • 17:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2047.codfw.wmnet with OS bullseye
  • 17:51 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1046.eqiad.wmnet with OS bullseye
  • 17:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2047.codfw.wmnet with reason: host reimage
  • 17:37 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2047.codfw.wmnet with reason: host reimage
  • 17:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1046.eqiad.wmnet with reason: host reimage
  • 17:34 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1046.eqiad.wmnet with reason: host reimage
  • 17:22 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1046.eqiad.wmnet with OS bullseye
  • 17:21 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2047.codfw.wmnet with OS bullseye
  • 16:50 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2046.codfw.wmnet with OS bullseye
  • 16:48 urbanecm@deploy1002: Finished scap: 58f4d877: Finalize mediawiki/page/change schema, produce at rc1.mediawiki.page_change (T308017), 854ff4ac: Finalize mediawiki/page/change schema at 1.0.0 (T308017) (duration: 07m 32s)
  • 16:46 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1045.eqiad.wmnet with OS bullseye
  • 16:41 urbanecm@deploy1002: Started scap: 58f4d877: Finalize mediawiki/page/change schema, produce at rc1.mediawiki.page_change (T308017), 854ff4ac: Finalize mediawiki/page/change schema at 1.0.0 (T308017)
  • 16:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43765 and previous config saved to /var/cache/conftool/dbconfig/20230207-163902-root.json
  • 16:34 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2046.codfw.wmnet with reason: host reimage
  • 16:31 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2046.codfw.wmnet with reason: host reimage
  • 16:31 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1045.eqiad.wmnet with reason: host reimage
  • 16:26 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1045.eqiad.wmnet with reason: host reimage
  • 16:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43764 and previous config saved to /var/cache/conftool/dbconfig/20230207-162357-root.json
  • 16:18 urbanecm@deploy1002: Finished scap: Backport for Restore mediawiki.page-undelete hook (T329064), Restore mediawiki.page-undelete hook (T329064) (duration: 17m 44s)
  • 16:15 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2046.codfw.wmnet with OS bullseye
  • 16:14 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1045.eqiad.wmnet with OS bullseye
  • 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43763 and previous config saved to /var/cache/conftool/dbconfig/20230207-160852-root.json
  • 16:02 urbanecm@deploy1002: urbanecm: Backport for Restore mediawiki.page-undelete hook (T329064), Restore mediawiki.page-undelete hook (T329064) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 16:00 urbanecm@deploy1002: Started scap: Backport for Restore mediawiki.page-undelete hook (T329064), Restore mediawiki.page-undelete hook (T329064)
  • 15:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43762 and previous config saved to /var/cache/conftool/dbconfig/20230207-155347-root.json
  • 15:53 moritzm: installing tiff security updates
  • 15:48 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2045.codfw.wmnet with OS bullseye
  • 15:47 urbanecm@deploy1002: Finished scap: 20a79c5: Don't add custom attributes in unwrapParsoidSections() (T328268) (duration: 07m 34s)
  • 15:43 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1043.eqiad.wmnet with OS bullseye
  • 15:39 urbanecm@deploy1002: Started scap: 20a79c5: Don't add custom attributes in unwrapParsoidSections() (T328268)
  • 15:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43761 and previous config saved to /var/cache/conftool/dbconfig/20230207-153842-root.json
  • 15:32 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2045.codfw.wmnet with reason: host reimage
  • 15:29 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2045.codfw.wmnet with reason: host reimage
  • 15:28 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1043.eqiad.wmnet with reason: host reimage
  • 15:26 urbanecm@deploy1002: Finished scap: Backport for Add "Page Frame" to DiscussionTools beta feature on enwiki (T327456) (duration: 10m 39s)
  • 15:25 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1043.eqiad.wmnet with reason: host reimage
  • 15:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43760 and previous config saved to /var/cache/conftool/dbconfig/20230207-152337-root.json
  • 15:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1003.eqiad.wmnet
  • 15:17 urbanecm@deploy1002: matmarex and urbanecm: Backport for Add "Page Frame" to DiscussionTools beta feature on enwiki (T327456) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 15:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people1003.eqiad.wmnet
  • 15:15 urbanecm@deploy1002: Started scap: Backport for Add "Page Frame" to DiscussionTools beta feature on enwiki (T327456)
  • 15:14 volans@cumin2002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool restbase-async in eqiad: T327925
  • 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
  • 15:13 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1043.eqiad.wmnet with OS bullseye
  • 15:13 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2045.codfw.wmnet with OS bullseye
  • 15:12 vgutierrez: repool codfw edge site - T327925
  • 15:09 volans@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) restbase-async.discovery.wmnet on all recursors
  • 15:09 volans@cumin2002: START - Cookbook sre.dns.wipe-cache restbase-async.discovery.wmnet on all recursors
  • 15:09 volans@cumin2002: START - Cookbook sre.discovery.service-route depool restbase-async in eqiad: T327925
  • 15:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 15:07 volans@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter-route (exit_code=0) pool all active/active services in codfw: T327925
  • 15:05 marostegui: dbmaint deploy schema change on s8 T328807 T328828
  • 15:04 vgutierrez: restart pybal in lvs2010 - T327925
  • 15:01 marostegui: dbmaint deploy schema change on s6 T328807
  • 15:00 vgutierrez: restart pybal in lvs2009 - T327925
  • 14:59 marostegui: dbmaint deploy schema change on s6 T328828
  • 14:53 moritzm: adding nfraison to pwstore T328915
  • 14:46 volans@cumin2002: START - Cookbook sre.discovery.datacenter-route pool all active/active services in codfw: T327925
  • 14:40 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=thanos-fe2002.codfw.wmnet,service=thanos-web
  • 14:40 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=thanos-fe2001.codfw.wmnet,service=thanos-web
  • 14:36 claime: repooled appserver, api_appserver, jobrunner, parsoid - T327925
  • 14:36 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 14:36 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=api_appserver
  • 14:35 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=jobrunner
  • 14:35 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=appserver
  • 14:35 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid
  • 14:32 Emperor: pool ms-fe2009 (codfw as a whole still depooled) T327925
  • 14:28 jbond: enable puppet in codfw, uslfo, esams post switch upgrade T327925
  • 14:26 claime: depooled appserver, api_appserver, jobrunner, parsoid - T327925
  • 14:25 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 14:21 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=parsoid
  • 14:19 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=appserver
  • 14:19 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=jobrunner
  • 14:18 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=api_appserver
  • 14:13 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=thanos-fe2002.codfw.wmnet,service=thanos-web
  • 14:13 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=thanos-fe2001.codfw.wmnet,service=thanos-web
  • 14:08 jbond: disable puppet in codfw, uslfo, esams for switch upgrade T327925
  • 14:07 lucaswerkmeister-wmde@deploy1002: backport aborted: (duration: 17m 46s)
  • 14:06 XioNoX: asw-a-codfw> request system reboot all-members - T327925
  • 13:59 XioNoX: disable puppet in ulsfo/esams/codfw for codfw row A switch upgrade - T327925
  • 13:56 Emperor: depool ms-fe2009 T327925
  • 13:55 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:55 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add mw2422 and 24 DNS - pt1979@cumin2002"
  • 13:54 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add mw2422 and 24 DNS - pt1979@cumin2002"
  • 13:51 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 13:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 199 hosts with reason: codfw row A upgrade
  • 13:32 oblivian@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter-route (exit_code=0) depool all active/active services in codfw: T327925
  • 13:31 vgutierrez: depool codfw edge site - T327925
  • 13:31 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 199 hosts with reason: codfw row A upgrade
  • 13:13 jbond: enable puppet in codfw, ulsfo and esams to allow depools post switch upgrade T327925
  • 13:11 oblivian@cumin2002: START - Cookbook sre.discovery.datacenter-route depool all active/active services in codfw: T327925
  • 13:05 jbond: diable puppet in codfw, ulsfo and esams for switch upgrade T327925
  • 12:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm6001.drmrs.wmnet
  • 12:28 vgutierrez: depooling authdns2001 - T327925
  • 12:25 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on doh2001.wikimedia.org with reason: depooled; T327925
  • 12:24 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on doh2001.wikimedia.org with reason: depooled; T327925
  • 12:20 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm6001.drmrs.wmnet on all recursors
  • 12:20 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache testvm6001.drmrs.wmnet on all recursors
  • 12:20 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm6001.drmrs.wmnet - jmm@cumin2002"
  • 12:19 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm6001.drmrs.wmnet - jmm@cumin2002"
  • 12:17 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 12:17 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm6001.drmrs.wmnet
  • 12:00 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1041.eqiad.wmnet with OS bullseye
  • 11:56 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2044.codfw.wmnet with OS bullseye
  • 11:56 marostegui: Install 10.4.28 on db1152 T329011
  • 11:52 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-logging-eqiad cluster: Roll restart of jvm daemons.
  • 11:44 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1041.eqiad.wmnet with reason: host reimage
  • 11:41 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1041.eqiad.wmnet with reason: host reimage
  • 11:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2044.codfw.wmnet with reason: host reimage
  • 11:37 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2044.codfw.wmnet with reason: host reimage
  • 11:33 moritzm: installing imagemagick security updates on buster
  • 11:29 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1041.eqiad.wmnet with OS bullseye
  • 11:21 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2044.codfw.wmnet with OS bullseye
  • 10:51 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-logging-eqiad cluster: Roll restart of jvm daemons.
  • 10:49 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-logging-codfw cluster: Roll restart of jvm daemons.
  • 10:19 oblivian@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter-route (exit_code=0) pool all active/active services in eqiad: Pooling eqiad for codfw depool today
  • 10:19 oblivian@cumin2002: START - Cookbook sre.discovery.datacenter-route pool all active/active services in eqiad: Pooling eqiad for codfw depool today
  • 10:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast1003.wikimedia.org with OS bullseye
  • 10:13 oblivian@cumin2002: END (FAIL) - Cookbook sre.discovery.datacenter-route (exit_code=93) pool all active/active services in eqiad: Pooling eqiad for codfw depool today
  • 10:12 oblivian@cumin2002: START - Cookbook sre.discovery.datacenter-route pool all active/active services in eqiad: Pooling eqiad for codfw depool today
  • 10:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast1003.wikimedia.org with reason: host reimage
  • 09:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast1003.wikimedia.org with reason: host reimage
  • 09:44 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast1003.wikimedia.org with OS bullseye
  • 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast2002.wikimedia.org with OS bullseye
  • 09:24 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 09:23 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 09:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast2002.wikimedia.org with reason: host reimage
  • 09:20 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 09:20 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 09:20 akosiaris: add wiktionary to mobile-sections rerenders. T226931
  • 09:19 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast2002.wikimedia.org with reason: host reimage
  • 09:19 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 09:19 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 09:08 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-logging-codfw cluster: Roll restart of jvm daemons.
  • 09:02 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast2002.wikimedia.org with OS bullseye
  • 08:50 vgutierrez: rolling upgrade to HAProxy 2.4.21 in cp nodes
  • 08:48 kostajh: UTC morning deploys done
  • 08:48 kharlan@deploy1002: Finished scap: Backport for [Growth] Remove mentor list variables (T321501), Remove GEMentorProvider (T321501) (duration: 12m 48s)
  • 08:37 kharlan@deploy1002: urbanecm and kharlan: Backport for [Growth] Remove mentor list variables (T321501), Remove GEMentorProvider (T321501) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 08:35 kharlan@deploy1002: Started scap: Backport for [Growth] Remove mentor list variables (T321501), Remove GEMentorProvider (T321501)
  • 08:30 moritzm: installing imagemagick security updates on Thumbor T328901
  • 08:28 kharlan@deploy1002: Finished scap: Backport for GrowthExperiments: Disable leveling up features in production (T328757) (duration: 12m 11s)
  • 08:18 kharlan@deploy1002: kharlan: Backport for GrowthExperiments: Disable leveling up features in production (T328757) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 08:16 kharlan@deploy1002: Started scap: Backport for GrowthExperiments: Disable leveling up features in production (T328757)
  • 08:14 kharlan@deploy1002: backport aborted: (duration: 00m 07s)
  • 07:00 marostegui: Failover m3 from db1159 to db1164 - T328404
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2110 in API', diff saved to https://phabricator.wikimedia.org/P43758 and previous config saved to /var/cache/conftool/dbconfig/20230207-063147-root.json
  • 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1187', diff saved to https://phabricator.wikimedia.org/P43757 and previous config saved to /var/cache/conftool/dbconfig/20230207-062826-root.json
  • 04:58 mwpresync@deploy1002: Pruned MediaWiki: 1.40.0-wmf.20 (duration: 02m 20s)
  • 04:55 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.22 refs T325585 (duration: 53m 11s)
  • 04:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.22 refs T325585

2023-02-06

  • 23:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2421.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:01 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2421.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:55 ryankemper: T327925 Depooled codfw wdqs hosts: `ryankemper@cumin2002:~$ sudo -E cumin -b 3 'wdqs[2003-2004,2009]*' 'sudo depool'`
  • 22:51 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 13 hosts with reason: switch upgrade
  • 22:51 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 13 hosts with reason: switch upgrade
  • 22:48 ryankemper: T327925 Banned `elastic[2037-2040,2055-2056,2061-2062,2069,2073-2076]` on codfw elastic
  • 22:42 inflatador: bking@cumin2002 banning Elastic nodes from cluster in preparation for T327925
  • 22:17 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2421.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:10 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2421.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:08 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mw2421
  • 22:07 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mw2421
  • 22:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:06 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add mw2421 DNS - pt1979@cumin2002"
  • 22:05 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add mw2421 DNS - pt1979@cumin2002"
  • 22:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2420.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:01 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 22:00 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2420.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:44 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2420.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:32 zabe@deploy1002: say aborted: (duration: 00m 39s)
  • 19:30 zabe@deploy1002: backport aborted: (duration: 00m 00s)
  • 19:29 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript resetAuthenticationThrottle.php --wiki=metawiki --signup --ip 92.62.231.190 # T328929
  • 19:27 zabe@deploy1002: backport aborted: (duration: 00m 23s)
  • 19:25 urbanecm@deploy1002: Finished scap: Backport for Add a new throttle rule (T328929) (duration: 07m 43s)
  • 19:18 urbanecm@deploy1002: Started scap: Backport for Add a new throttle rule (T328929)
  • 19:17 urbanecm@deploy1002: backport aborted: (duration: 00m 01s)
  • 18:53 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2420.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:52 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:52 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add mw2420 DNS - pt1979@cumin2002"
  • 18:51 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add mw2420 DNS - pt1979@cumin2002"
  • 18:51 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mw2420
  • 18:50 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mw2420
  • 18:48 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 18:48 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 18:48 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:10 vgutierrez: rolling upgrade to HAProxy 2.4.21 in ulsfo cp nodes
  • 14:37 moritzm: installing imagemagick security updates on buster
  • 14:13 vgutierrez: testing HAProxy 2.4.21 in cp4052 and cp4044
  • 14:11 urbanecm@deploy1002: Finished scap: Backport for New config entries for migrated android schemas (T324167) (duration: 09m 19s)
  • 14:09 vgutierrez: fetch HAProxy 2.4.21 for buster and bullseye (apt.wm.o)
  • 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43754 and previous config saved to /var/cache/conftool/dbconfig/20230206-140753-root.json
  • 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2176 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43753 and previous config saved to /var/cache/conftool/dbconfig/20230206-140627-root.json
  • 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43752 and previous config saved to /var/cache/conftool/dbconfig/20230206-140623-root.json
  • 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43751 and previous config saved to /var/cache/conftool/dbconfig/20230206-140606-root.json
  • 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43750 and previous config saved to /var/cache/conftool/dbconfig/20230206-140602-root.json
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2156 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43749 and previous config saved to /var/cache/conftool/dbconfig/20230206-140554-root.json
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2155 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43748 and previous config saved to /var/cache/conftool/dbconfig/20230206-140549-root.json
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2154 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43747 and previous config saved to /var/cache/conftool/dbconfig/20230206-140541-root.json
  • 14:05 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@b798462] (releasing): (no justification provided) (duration: 00m 33s)
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2153 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43746 and previous config saved to /var/cache/conftool/dbconfig/20230206-140501-root.json
  • 14:05 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@b798462] (releasing): (no justification provided)
  • 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43745 and previous config saved to /var/cache/conftool/dbconfig/20230206-140449-root.json
  • 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2145 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43744 and previous config saved to /var/cache/conftool/dbconfig/20230206-140433-root.json
  • 14:04 urbanecm@deploy1002: urbanecm and sharvaniharan: Backport for New config entries for migrated android schemas (T324167) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43743 and previous config saved to /var/cache/conftool/dbconfig/20230206-140405-root.json
  • 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43742 and previous config saved to /var/cache/conftool/dbconfig/20230206-140338-root.json
  • 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43741 and previous config saved to /var/cache/conftool/dbconfig/20230206-140333-root.json
  • 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43740 and previous config saved to /var/cache/conftool/dbconfig/20230206-140316-root.json
  • 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43739 and previous config saved to /var/cache/conftool/dbconfig/20230206-140310-root.json
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43738 and previous config saved to /var/cache/conftool/dbconfig/20230206-140257-root.json
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43737 and previous config saved to /var/cache/conftool/dbconfig/20230206-140249-root.json
  • 14:02 urbanecm@deploy1002: Started scap: Backport for New config entries for migrated android schemas (T324167)
  • 13:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 3300
  • 13:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 3300
  • 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43736 and previous config saved to /var/cache/conftool/dbconfig/20230206-135248-root.json
  • 13:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2176 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43735 and previous config saved to /var/cache/conftool/dbconfig/20230206-135122-root.json
  • 13:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43734 and previous config saved to /var/cache/conftool/dbconfig/20230206-135118-root.json
  • 13:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43733 and previous config saved to /var/cache/conftool/dbconfig/20230206-135101-root.json
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43732 and previous config saved to /var/cache/conftool/dbconfig/20230206-135057-root.json
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2156 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43731 and previous config saved to /var/cache/conftool/dbconfig/20230206-135049-root.json
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2155 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43730 and previous config saved to /var/cache/conftool/dbconfig/20230206-135044-root.json
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2154 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43729 and previous config saved to /var/cache/conftool/dbconfig/20230206-135036-root.json
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2153 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43728 and previous config saved to /var/cache/conftool/dbconfig/20230206-134956-root.json
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43727 and previous config saved to /var/cache/conftool/dbconfig/20230206-134944-root.json
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2145 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43726 and previous config saved to /var/cache/conftool/dbconfig/20230206-134928-root.json
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43725 and previous config saved to /var/cache/conftool/dbconfig/20230206-134901-root.json
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43724 and previous config saved to /var/cache/conftool/dbconfig/20230206-134833-root.json
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43723 and previous config saved to /var/cache/conftool/dbconfig/20230206-134828-root.json
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43722 and previous config saved to /var/cache/conftool/dbconfig/20230206-134811-root.json
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43721 and previous config saved to /var/cache/conftool/dbconfig/20230206-134805-root.json
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43720 and previous config saved to /var/cache/conftool/dbconfig/20230206-134752-root.json
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43719 and previous config saved to /var/cache/conftool/dbconfig/20230206-134744-root.json
  • 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43718 and previous config saved to /var/cache/conftool/dbconfig/20230206-133743-root.json
  • 13:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2176 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43717 and previous config saved to /var/cache/conftool/dbconfig/20230206-133618-root.json
  • 13:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43716 and previous config saved to /var/cache/conftool/dbconfig/20230206-133613-root.json
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43715 and previous config saved to /var/cache/conftool/dbconfig/20230206-133556-root.json
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43714 and previous config saved to /var/cache/conftool/dbconfig/20230206-133552-root.json
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2156 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43713 and previous config saved to /var/cache/conftool/dbconfig/20230206-133544-root.json
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2155 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43712 and previous config saved to /var/cache/conftool/dbconfig/20230206-133540-root.json
  • 13:35 jbond: add confd to bookworm repos
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2154 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43711 and previous config saved to /var/cache/conftool/dbconfig/20230206-133531-root.json
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2153 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43710 and previous config saved to /var/cache/conftool/dbconfig/20230206-133451-root.json
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43709 and previous config saved to /var/cache/conftool/dbconfig/20230206-133439-root.json
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2145 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43708 and previous config saved to /var/cache/conftool/dbconfig/20230206-133423-root.json
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43707 and previous config saved to /var/cache/conftool/dbconfig/20230206-133356-root.json
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43706 and previous config saved to /var/cache/conftool/dbconfig/20230206-133329-root.json
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43705 and previous config saved to /var/cache/conftool/dbconfig/20230206-133323-root.json
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43704 and previous config saved to /var/cache/conftool/dbconfig/20230206-133306-root.json
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43703 and previous config saved to /var/cache/conftool/dbconfig/20230206-133300-root.json
  • 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43702 and previous config saved to /var/cache/conftool/dbconfig/20230206-133247-root.json
  • 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43701 and previous config saved to /var/cache/conftool/dbconfig/20230206-133239-root.json
  • 13:26 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 13:26 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 13:23 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43700 and previous config saved to /var/cache/conftool/dbconfig/20230206-132238-root.json
  • 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2176 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43699 and previous config saved to /var/cache/conftool/dbconfig/20230206-132113-root.json
  • 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43698 and previous config saved to /var/cache/conftool/dbconfig/20230206-132108-root.json
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43697 and previous config saved to /var/cache/conftool/dbconfig/20230206-132051-root.json
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43696 and previous config saved to /var/cache/conftool/dbconfig/20230206-132047-root.json
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2156 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43695 and previous config saved to /var/cache/conftool/dbconfig/20230206-132039-root.json
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2155 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43694 and previous config saved to /var/cache/conftool/dbconfig/20230206-132035-root.json
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2154 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43693 and previous config saved to /var/cache/conftool/dbconfig/20230206-132026-root.json
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2153 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43692 and previous config saved to /var/cache/conftool/dbconfig/20230206-131947-root.json
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43691 and previous config saved to /var/cache/conftool/dbconfig/20230206-131934-root.json
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2145 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43690 and previous config saved to /var/cache/conftool/dbconfig/20230206-131918-root.json
  • 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43689 and previous config saved to /var/cache/conftool/dbconfig/20230206-131851-root.json
  • 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43688 and previous config saved to /var/cache/conftool/dbconfig/20230206-131824-root.json
  • 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43687 and previous config saved to /var/cache/conftool/dbconfig/20230206-131818-root.json
  • 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43686 and previous config saved to /var/cache/conftool/dbconfig/20230206-131801-root.json
  • 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43685 and previous config saved to /var/cache/conftool/dbconfig/20230206-131755-root.json
  • 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43684 and previous config saved to /var/cache/conftool/dbconfig/20230206-131740-root.json
  • 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43683 and previous config saved to /var/cache/conftool/dbconfig/20230206-131734-root.json
  • 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43682 and previous config saved to /var/cache/conftool/dbconfig/20230206-130733-root.json
  • 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2176 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43681 and previous config saved to /var/cache/conftool/dbconfig/20230206-130608-root.json
  • 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43680 and previous config saved to /var/cache/conftool/dbconfig/20230206-130603-root.json
  • 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43679 and previous config saved to /var/cache/conftool/dbconfig/20230206-130547-root.json
  • 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43678 and previous config saved to /var/cache/conftool/dbconfig/20230206-130542-root.json
  • 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2156 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43677 and previous config saved to /var/cache/conftool/dbconfig/20230206-130534-root.json
  • 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2155 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43676 and previous config saved to /var/cache/conftool/dbconfig/20230206-130530-root.json
  • 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2154 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43675 and previous config saved to /var/cache/conftool/dbconfig/20230206-130521-root.json
  • 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2153 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43674 and previous config saved to /var/cache/conftool/dbconfig/20230206-130442-root.json
  • 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43673 and previous config saved to /var/cache/conftool/dbconfig/20230206-130429-root.json
  • 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2145 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43672 and previous config saved to /var/cache/conftool/dbconfig/20230206-130414-root.json
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43671 and previous config saved to /var/cache/conftool/dbconfig/20230206-130346-root.json
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43670 and previous config saved to /var/cache/conftool/dbconfig/20230206-130319-root.json
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43669 and previous config saved to /var/cache/conftool/dbconfig/20230206-130313-root.json
  • 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43668 and previous config saved to /var/cache/conftool/dbconfig/20230206-130256-root.json
  • 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43667 and previous config saved to /var/cache/conftool/dbconfig/20230206-130250-root.json
  • 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43666 and previous config saved to /var/cache/conftool/dbconfig/20230206-130235-root.json
  • 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43665 and previous config saved to /var/cache/conftool/dbconfig/20230206-130230-root.json
  • 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43664 and previous config saved to /var/cache/conftool/dbconfig/20230206-125228-root.json
  • 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2176 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43663 and previous config saved to /var/cache/conftool/dbconfig/20230206-125103-root.json
  • 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43662 and previous config saved to /var/cache/conftool/dbconfig/20230206-125059-root.json
  • 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43661 and previous config saved to /var/cache/conftool/dbconfig/20230206-125042-root.json
  • 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43660 and previous config saved to /var/cache/conftool/dbconfig/20230206-125037-root.json
  • 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2156 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43659 and previous config saved to /var/cache/conftool/dbconfig/20230206-125029-root.json
  • 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2155 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43658 and previous config saved to /var/cache/conftool/dbconfig/20230206-125025-root.json
  • 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2154 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43657 and previous config saved to /var/cache/conftool/dbconfig/20230206-125017-root.json
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2153 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43656 and previous config saved to /var/cache/conftool/dbconfig/20230206-124937-root.json
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43655 and previous config saved to /var/cache/conftool/dbconfig/20230206-124924-root.json
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2145 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43654 and previous config saved to /var/cache/conftool/dbconfig/20230206-124909-root.json
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43653 and previous config saved to /var/cache/conftool/dbconfig/20230206-124841-root.json
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43652 and previous config saved to /var/cache/conftool/dbconfig/20230206-124814-root.json
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43651 and previous config saved to /var/cache/conftool/dbconfig/20230206-124808-root.json
  • 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43650 and previous config saved to /var/cache/conftool/dbconfig/20230206-124751-root.json
  • 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43649 and previous config saved to /var/cache/conftool/dbconfig/20230206-124745-root.json
  • 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43648 and previous config saved to /var/cache/conftool/dbconfig/20230206-124730-root.json
  • 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43647 and previous config saved to /var/cache/conftool/dbconfig/20230206-124725-root.json
  • 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db2156 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43646 and previous config saved to /var/cache/conftool/dbconfig/20230206-124629-root.json
  • 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43645 and previous config saved to /var/cache/conftool/dbconfig/20230206-124617-root.json
  • 12:45 marostegui@cumin1001: dbctl commit (dc=all): 'db2126 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43644 and previous config saved to /var/cache/conftool/dbconfig/20230206-124513-root.json
  • 12:45 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43643 and previous config saved to /var/cache/conftool/dbconfig/20230206-124506-root.json
  • 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db2156 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43642 and previous config saved to /var/cache/conftool/dbconfig/20230206-123124-root.json
  • 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43641 and previous config saved to /var/cache/conftool/dbconfig/20230206-123112-root.json
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'db2126 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43640 and previous config saved to /var/cache/conftool/dbconfig/20230206-123007-root.json
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43639 and previous config saved to /var/cache/conftool/dbconfig/20230206-123001-root.json
  • 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db2156 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43638 and previous config saved to /var/cache/conftool/dbconfig/20230206-121619-root.json
  • 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43637 and previous config saved to /var/cache/conftool/dbconfig/20230206-121608-root.json
  • 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'db2126 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43636 and previous config saved to /var/cache/conftool/dbconfig/20230206-121503-root.json
  • 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43635 and previous config saved to /var/cache/conftool/dbconfig/20230206-121456-root.json
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db2156 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43634 and previous config saved to /var/cache/conftool/dbconfig/20230206-120114-root.json
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43633 and previous config saved to /var/cache/conftool/dbconfig/20230206-120103-root.json
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2126 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43631 and previous config saved to /var/cache/conftool/dbconfig/20230206-115958-root.json
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43630 and previous config saved to /var/cache/conftool/dbconfig/20230206-115951-root.json
  • 11:58 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1108.eqiad.wmnet
  • 11:47 jbond: puppetmaster[12]002 reintroduced to services
  • 11:46 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host db1108.eqiad.wmnet
  • 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'db2156 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43629 and previous config saved to /var/cache/conftool/dbconfig/20230206-114609-root.json
  • 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43628 and previous config saved to /var/cache/conftool/dbconfig/20230206-114558-root.json
  • 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2126 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43627 and previous config saved to /var/cache/conftool/dbconfig/20230206-114453-root.json
  • 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43626 and previous config saved to /var/cache/conftool/dbconfig/20230206-114446-root.json
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'db2156 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43625 and previous config saved to /var/cache/conftool/dbconfig/20230206-113104-root.json
  • 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43624 and previous config saved to /var/cache/conftool/dbconfig/20230206-113053-root.json
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2126 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43623 and previous config saved to /var/cache/conftool/dbconfig/20230206-112948-root.json
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43622 and previous config saved to /var/cache/conftool/dbconfig/20230206-112942-root.json
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43621 and previous config saved to /var/cache/conftool/dbconfig/20230206-112900-root.json
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43620 and previous config saved to /var/cache/conftool/dbconfig/20230206-112856-root.json
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43619 and previous config saved to /var/cache/conftool/dbconfig/20230206-112839-root.json
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'es2024 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43618 and previous config saved to /var/cache/conftool/dbconfig/20230206-112832-root.json
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'es2020 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43617 and previous config saved to /var/cache/conftool/dbconfig/20230206-112825-root.json
  • 11:28 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on puppetmaster2002.codfw.wmnet,puppetmaster1002.eqiad.wmnet with reason: Decom
  • 11:27 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on puppetmaster2002.codfw.wmnet,puppetmaster1002.eqiad.wmnet with reason: Decom
  • 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43616 and previous config saved to /var/cache/conftool/dbconfig/20230206-111356-root.json
  • 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43615 and previous config saved to /var/cache/conftool/dbconfig/20230206-111351-root.json
  • 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43614 and previous config saved to /var/cache/conftool/dbconfig/20230206-111334-root.json
  • 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2024 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43613 and previous config saved to /var/cache/conftool/dbconfig/20230206-111327-root.json
  • 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2020 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43612 and previous config saved to /var/cache/conftool/dbconfig/20230206-111320-root.json
  • 11:03 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 11:03 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 11:03 akosiaris: deploy changeprop 0.10.19, adding wikivoyage to list of domains the mobile-sections get rerendered for. T226931
  • 11:03 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 11:02 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 11:01 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 11:01 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 10:59 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 10:58 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 10:58 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43610 and previous config saved to /var/cache/conftool/dbconfig/20230206-105851-root.json
  • 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43609 and previous config saved to /var/cache/conftool/dbconfig/20230206-105846-root.json
  • 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43608 and previous config saved to /var/cache/conftool/dbconfig/20230206-105829-root.json
  • 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'es2024 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43607 and previous config saved to /var/cache/conftool/dbconfig/20230206-105822-root.json
  • 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'es2020 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43606 and previous config saved to /var/cache/conftool/dbconfig/20230206-105815-root.json
  • 10:56 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43605 and previous config saved to /var/cache/conftool/dbconfig/20230206-104346-root.json
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43604 and previous config saved to /var/cache/conftool/dbconfig/20230206-104341-root.json
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43603 and previous config saved to /var/cache/conftool/dbconfig/20230206-104324-root.json
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'es2024 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43602 and previous config saved to /var/cache/conftool/dbconfig/20230206-104317-root.json
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'es2020 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43601 and previous config saved to /var/cache/conftool/dbconfig/20230206-104310-root.json
  • 10:36 marostegui: Upgrade db1115 (db_inventory master) to 10.6. T328408
  • 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43600 and previous config saved to /var/cache/conftool/dbconfig/20230206-102841-root.json
  • 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43599 and previous config saved to /var/cache/conftool/dbconfig/20230206-102837-root.json
  • 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43598 and previous config saved to /var/cache/conftool/dbconfig/20230206-102820-root.json
  • 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'es2024 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43597 and previous config saved to /var/cache/conftool/dbconfig/20230206-102812-root.json
  • 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'es2020 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43596 and previous config saved to /var/cache/conftool/dbconfig/20230206-102806-root.json
  • 10:27 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:27 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for cloudsw1-b1-codfw mgmt IP. - cmooney@cumin1001"
  • 10:26 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for cloudsw1-b1-codfw mgmt IP. - cmooney@cumin1001"
  • 10:23 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43595 and previous config saved to /var/cache/conftool/dbconfig/20230206-101336-root.json
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43594 and previous config saved to /var/cache/conftool/dbconfig/20230206-101332-root.json
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43593 and previous config saved to /var/cache/conftool/dbconfig/20230206-101315-root.json
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2024 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43592 and previous config saved to /var/cache/conftool/dbconfig/20230206-101308-root.json
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2020 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43591 and previous config saved to /var/cache/conftool/dbconfig/20230206-101301-root.json
  • 10:10 hashar@deploy1002: Finished deploy [releng/jenkins-deploy@b798462] (releasing): (no justification provided) (duration: 00m 38s)
  • 10:09 hashar@deploy1002: Started deploy [releng/jenkins-deploy@b798462] (releasing): (no justification provided)
  • 09:05 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 09:05 urbanecm@deploy1002: Finished scap: Backport for Fix and add mising parser test for maplink with suppressed text="" (T328739) (duration: 18m 56s)
  • 09:05 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 09:04 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:04 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 08:56 urbanecm@deploy1002: wmde-fisch and urbanecm: Backport for Fix and add mising parser test for maplink with suppressed text="" (T328739) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 08:46 urbanecm@deploy1002: Started scap: Backport for Fix and add mising parser test for maplink with suppressed text="" (T328739)
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2094 db2097 db2103 db2104 db2105 db2106 db2121 db2122 db2132 db2133 db2136 db2142 db2145 db2146 db2153 db2154 db2155 db2156 db2157 db2158 db2175 db2176 db2183 T327925', diff saved to https://phabricator.wikimedia.org/P43587 and previous config saved to /var/cache/conftool/dbconfig/20230206-073015-root.json
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2020 es2024 es2026 es2027 es2028 T327925', diff saved to https://phabricator.wikimedia.org/P43586 and previous config saved to /var/cache/conftool/dbconfig/20230206-071913-root.json
  • 07:17 hashar: Restarted Gerrit for deployment
  • 07:14 hashar@deploy1002: Finished deploy [gerrit/gerrit@e09efc0]: remove plugins/.eslintrc.json (duration: 00m 05s)
  • 07:14 hashar@deploy1002: Started deploy [gerrit/gerrit@e09efc0]: remove plugins/.eslintrc.json
  • 07:07 hashar@deploy1002: Finished deploy [gerrit/gerrit@e09efc0]: remove plugins/.eslintrc.json | T328134 (duration: 00m 10s)
  • 07:06 hashar@deploy1002: Started deploy [gerrit/gerrit@e09efc0]: remove plugins/.eslintrc.json | T328134

2023-02-05

  • 22:28 topranks: Re-enabling peering to Seabone/Telecom Italit AS 6762 on cr2-esams at AMS-IX
  • 14:39 cdanis: silenced NELHigh alert for 20 hours: Telecom Italy issues; alertmanager silence id 3fb3b999-9756-44af-a1e8-fd1faae8b9bf
  • 11:49 topranks: Manually deactivating peering to Telecom Italia / Seabone at AMS-IX on cr2-esams as they are having issues

2023-02-03

  • 21:05 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:04 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 21:04 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:04 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for cloudsw1-b1-codfw mgmt IP. - cmooney@cumin1001"
  • 21:02 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for cloudsw1-b1-codfw mgmt IP. - cmooney@cumin1001"
  • 21:00 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 20:52 pt1979@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 20:49 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 19:44 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1090.eqiad.wmnet
  • 19:10 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1090.eqiad.wmnet with OS bullseye
  • 19:00 dzahn@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "test what is not synced - dzahn@cumin2002"
  • 18:59 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "test what is not synced - dzahn@cumin2002"
  • 18:49 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1090.eqiad.wmnet with reason: host reimage
  • 18:49 topranks: Enabling 4x10G channelization for pic 0 QSFP 4 on cr1-codfw
  • 18:45 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1090.eqiad.wmnet with reason: host reimage
  • 18:23 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1090.eqiad.wmnet with OS bullseye
  • 18:23 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1088.eqiad.wmnet
  • 18:19 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1088.eqiad.wmnet with OS bullseye
  • 17:57 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp1088.eqiad.wmnet with reason: host reimage
  • 17:57 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1088.eqiad.wmnet with reason: host reimage
  • 17:39 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet
  • 17:36 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1089.eqiad.wmnet with OS bullseye
  • 17:35 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1088.eqiad.wmnet with OS bullseye
  • 17:34 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1086.eqiad.wmnet
  • 17:34 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1086.eqiad.wmnet with OS bullseye
  • 17:14 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1089.eqiad.wmnet with reason: host reimage
  • 17:12 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1086.eqiad.wmnet with reason: host reimage
  • 17:09 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1089.eqiad.wmnet with reason: host reimage
  • 17:09 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1086.eqiad.wmnet with reason: host reimage
  • 16:47 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1086.eqiad.wmnet with OS bullseye
  • 16:47 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1089.eqiad.wmnet with OS bullseye
  • 16:45 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:45 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for cloudsw1-b1-codfw mgmt IP. - cmooney@cumin1001"
  • 16:44 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for cloudsw1-b1-codfw mgmt IP. - cmooney@cumin1001"
  • 16:41 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 16:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2012.codfw.wmnet
  • 16:25 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2012.codfw.wmnet
  • 15:51 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@598ff3c] (releasing): test (duration: 00m 26s)
  • 15:51 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@598ff3c] (releasing): test
  • 15:23 milimetric@deploy1002: Finished deploy [airflow-dags/analytics@ec3e0de]: Hotfix disabling skein log collection (duration: 00m 15s)
  • 15:22 milimetric@deploy1002: Started deploy [airflow-dags/analytics@ec3e0de]: Hotfix disabling skein log collection
  • 14:31 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@598ff3c] (releasing): (no justification provided) (duration: 00m 09s)
  • 14:31 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@598ff3c] (releasing): (no justification provided)
  • 14:20 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2011.codfw.wmnet
  • 14:19 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@598ff3c] (releasing): (no justification provided) (duration: 00m 23s)
  • 14:18 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@598ff3c] (releasing): (no justification provided)
  • 14:13 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2011.codfw.wmnet
  • 13:55 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1087.eqiad.wmnet,service=ats-be
  • 13:55 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1087.eqiad.wmnet,service=cdn
  • 13:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1087.eqiad.wmnet with OS bullseye
  • 13:27 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1087.eqiad.wmnet with reason: host reimage
  • 13:25 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1087.eqiad.wmnet with reason: host reimage
  • 13:05 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1087.eqiad.wmnet with OS bullseye
  • 12:09 moritzm: installing node-moment security updates
  • 12:01 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@598ff3c] (releasing): (no justification provided) (duration: 00m 13s)
  • 12:00 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@598ff3c] (releasing): (no justification provided)
  • 11:58 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2010.codfw.wmnet
  • 11:58 moritzm: installing node-qs security updates
  • 11:50 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2010.codfw.wmnet
  • 11:35 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2009.codfw.wmnet
  • 11:28 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2009.codfw.wmnet
  • 10:44 moritzm: updating perf on buster hosts
  • 10:24 stevemunene@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 10:11 stevemunene@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 10:09 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2008.codfw.wmnet
  • 10:07 stevemunene@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 10:06 stevemunene@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 10:03 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2008.codfw.wmnet
  • 09:51 moritzm: installing ruby-rack security updates
  • 09:31 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 09:31 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 09:24 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 09:24 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 09:23 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 09:23 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 09:19 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1001.eqiad.wmnet
  • 09:14 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1001.eqiad.wmnet
  • 09:13 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 09:13 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 09:07 moritzm: installing modsecurity-crs security updates
  • 09:02 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 09:02 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 05:16 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1085.eqiad.wmnet
  • 05:16 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1084.eqiad.wmnet
  • 05:15 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1084.eqiad.wmnet with OS bullseye
  • 05:13 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1085.eqiad.wmnet with OS bullseye
  • 04:50 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1085.eqiad.wmnet with reason: host reimage
  • 04:47 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp1084.eqiad.wmnet with reason: host reimage
  • 04:47 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1084.eqiad.wmnet with reason: host reimage
  • 04:47 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1085.eqiad.wmnet with reason: host reimage
  • 04:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1084.eqiad.wmnet with OS bullseye
  • 04:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1085.eqiad.wmnet with OS bullseye
  • 04:24 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1083.eqiad.wmnet
  • 04:24 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1082.eqiad.wmnet
  • 04:11 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1083.eqiad.wmnet with OS bullseye
  • 04:11 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1082.eqiad.wmnet with OS bullseye
  • 03:48 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1082.eqiad.wmnet with reason: host reimage
  • 03:46 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1083.eqiad.wmnet with reason: host reimage
  • 03:43 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1082.eqiad.wmnet with reason: host reimage
  • 03:43 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1083.eqiad.wmnet with reason: host reimage
  • 03:21 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1082.eqiad.wmnet with OS bullseye
  • 03:21 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1083.eqiad.wmnet with OS bullseye
  • 03:20 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1080.eqiad.wmnet
  • 03:09 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1080.eqiad.wmnet with OS bullseye
  • 02:47 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1080.eqiad.wmnet with reason: host reimage
  • 02:44 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1080.eqiad.wmnet with reason: host reimage
  • 02:28 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1081.eqiad.wmnet,service=ats-be
  • 02:28 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1081.eqiad.wmnet,service=cdn
  • 02:27 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1081.eqiad.wmnet with OS bullseye
  • 02:23 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1080.eqiad.wmnet with OS bullseye
  • 02:03 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1081.eqiad.wmnet with reason: host reimage
  • 02:00 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1081.eqiad.wmnet with reason: host reimage
  • 01:38 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1081.eqiad.wmnet with OS bullseye
  • 01:31 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1080.eqiad.wmnet with OS bullseye
  • 00:35 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1080.eqiad.wmnet with OS bullseye

2023-02-02

  • 22:58 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1080.eqiad.wmnet with OS bullseye
  • 22:15 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1079.eqiad.wmnet
  • 22:12 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1079.eqiad.wmnet with OS bullseye
  • 22:01 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1080.eqiad.wmnet with OS bullseye
  • 22:00 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1078.eqiad.wmnet
  • 21:58 zabe@deploy1002: Finished scap: Backport for Stop writing to cuc_comment everywhere (T233004) (duration: 07m 58s)
  • 21:52 zabe@deploy1002: zabe: Backport for Stop writing to cuc_comment everywhere (T233004) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 21:50 zabe@deploy1002: Started scap: Backport for Stop writing to cuc_comment everywhere (T233004)
  • 21:47 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1078.eqiad.wmnet with OS bullseye
  • 21:47 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1079.eqiad.wmnet with reason: host reimage
  • 21:44 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1079.eqiad.wmnet with reason: host reimage
  • 21:30 brennen: end of utc late backport & config window
  • 21:30 brennen@deploy1002: Finished scap: Backport for Enable client preferences everywhere (T327979) (duration: 11m 14s)
  • 21:23 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1078.eqiad.wmnet with reason: host reimage
  • 21:22 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1079.eqiad.wmnet with OS bullseye
  • 21:22 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1077.eqiad.wmnet
  • 21:21 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1077.eqiad.wmnet with OS bullseye
  • 21:21 brennen@deploy1002: brennen and nray: Backport for Enable client preferences everywhere (T327979) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:20 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1078.eqiad.wmnet with reason: host reimage
  • 21:19 brennen@deploy1002: Started scap: Backport for Enable client preferences everywhere (T327979)
  • 21:18 brennen@deploy1002: Finished scap: Backport for Disable write old for CheckUserLog reason everywhere (T233004) (duration: 12m 02s)
  • 21:07 brennen@deploy1002: brennen and dreamyjazz: Backport for Disable write old for CheckUserLog reason everywhere (T233004) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 21:06 brennen@deploy1002: Started scap: Backport for Disable write old for CheckUserLog reason everywhere (T233004)
  • 20:59 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1078.eqiad.wmnet with OS bullseye
  • 20:59 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1078.eqiad.wmnet with OS bullseye
  • 20:52 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1077.eqiad.wmnet with reason: host reimage
  • 20:49 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1077.eqiad.wmnet with reason: host reimage
  • 20:28 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1078.eqiad.wmnet with OS bullseye
  • 20:28 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1077.eqiad.wmnet with OS bullseye
  • 20:23 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include bullseye-wikimedia /home/rzl/httpbb/bullseye/httpbb_0.0.3-1+deb11u1_amd64.changes # T328280
  • 20:21 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include buster-wikimedia /home/rzl/httpbb/buster/httpbb_0.0.3-1_amd64.changes # T328280
  • 20:11 zabe@deploy1002: Finished scap: Backport for Stop writing to cuc_user and cuc_user_text everywhere (T233004) (duration: 09m 39s)
  • 20:03 zabe@deploy1002: zabe: Backport for Stop writing to cuc_user and cuc_user_text everywhere (T233004) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:02 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic2037.codfw.wmnet
  • 20:01 zabe@deploy1002: Started scap: Backport for Stop writing to cuc_user and cuc_user_text everywhere (T233004)
  • 19:55 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host elastic2037.codfw.wmnet
  • 19:54 ryankemper: T328674 [Elastic] With puppet disabled on elastic* fleet, `ryankemper@elastic2037:~$ sudo run-puppet-agent --force` to verify changes in https://gerrit.wikimedia.org/r/886055
  • 19:30 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.21 refs T325584
  • 19:28 zabe@deploy1002: say aborted: (duration: 00m 03s)
  • 18:42 zabe@deploy1002: Finished scap: Backport for Stop writing to cuc_comment in group1 wikis (T233004) (duration: 08m 19s)
  • 18:36 zabe@deploy1002: zabe: Backport for Stop writing to cuc_comment in group1 wikis (T233004) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 18:34 zabe@deploy1002: Started scap: Backport for Stop writing to cuc_comment in group1 wikis (T233004)
  • 18:08 aokoth@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Production (gitlab1004) to 15.7.6-ce.0
  • 18:08 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 18:08 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 18:08 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2043.codfw.wmnet with OS bullseye
  • 18:07 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 18:06 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 18:05 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 18:05 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 18:03 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1037.eqiad.wmnet with OS bullseye
  • 17:52 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2043.codfw.wmnet with reason: host reimage
  • 17:49 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2043.codfw.wmnet with reason: host reimage
  • 17:47 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1037.eqiad.wmnet with reason: host reimage
  • 17:45 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1037.eqiad.wmnet with reason: host reimage
  • 17:33 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2043.codfw.wmnet with OS bullseye
  • 17:32 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1037.eqiad.wmnet with OS bullseye
  • 17:29 aokoth@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Production (gitlab1004) to 15.7.6-ce.0
  • 17:12 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 17:12 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 16:53 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 16:52 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 16:51 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 16:50 dancy@deploy1002: Installation of scap version "4.34.0" completed for 561 hosts
  • 16:50 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 16:50 dancy@deploy1002: Installing scap version "4.34.0" for 561 hosts
  • 16:50 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 16:49 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 16:48 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 16:48 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 16:47 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 16:46 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 16:25 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2007.codfw.wmnet
  • 16:18 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 16:17 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 16:17 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2007.codfw.wmnet
  • 16:17 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 16:16 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 16:16 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 16:15 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 16:10 volans: uploaded python3-wmflib_1.2.1 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 16:10 dzahn@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab Replica gitlab2002 to 15.7.6-ce.0
  • 15:40 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@e38efa6] (releasing): (no justification provided) (duration: 07m 01s)
  • 15:38 aokoth@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: Security Release
  • 15:37 aokoth@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release
  • 15:35 aokoth@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: Security Release
  • 15:35 aokoth@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release
  • 15:34 dzahn@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab Replica gitlab2002 to 15.7.6-ce.0
  • 15:33 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@e38efa6] (releasing): (no justification provided)
  • 15:24 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti3004
  • 15:17 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti3004
  • 15:06 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2006.codfw.wmnet
  • 15:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3004 was renamed as ganeti4004 - jmm@cumin2002"
  • 15:02 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3004 was renamed as ganeti4004 - jmm@cumin2002"
  • 15:00 vgutierrez: rolling restart of varnish in cache::text - T315676
  • 14:59 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 14:59 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2006.codfw.wmnet
  • 14:55 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 14:45 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 14:39 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 14:31 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2005.codfw.wmnet
  • 14:29 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 14:25 moritzm: installing containerd security updates on codfw k8s nodes
  • 14:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2005.codfw.wmnet
  • 13:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1076.eqiad.wmnet,service=ats-be
  • 13:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1076.eqiad.wmnet,service=cdn
  • 13:10 kharlan:: Deployed security patch for T328643
  • 13:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1076.eqiad.wmnet with OS bullseye
  • 13:04 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:03 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:03 kharlan:: Deployed security patch for T328643
  • 13:02 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:01 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2004.codfw.wmnet
  • 13:00 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 12:55 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2004.codfw.wmnet
  • 12:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1076.eqiad.wmnet with reason: host reimage
  • 12:47 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 12:46 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 12:44 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1076.eqiad.wmnet with reason: host reimage
  • 12:42 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 12:42 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 12:39 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 12:39 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 12:29 btullis@deploy1002: Finished deploy [analytics/superset/deploy@5175ad7]: Production deployment for numpy downgrade (duration: 00m 42s)
  • 12:29 claime: Work ongoing on m2 and m3
  • 12:29 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2003.codfw.wmnet
  • 12:29 btullis@deploy1002: Started deploy [analytics/superset/deploy@5175ad7]: Production deployment for numpy downgrade
  • 12:23 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1076.eqiad.wmnet with OS bullseye
  • 12:22 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2003.codfw.wmnet
  • 12:08 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 12:08 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 11:46 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 11:42 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:42 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:41 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:41 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:40 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 11:39 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 11:38 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:37 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:37 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php shnwikibooks --fix | tee T328634-namespaceDupes-4.out # T328634 – made some progress then errored out again
  • 11:32 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php shnwikibooks --fix --add-prefix=T328634/ | tee T328634-namespaceDupes-3.out # T328634 – seemed to finish the first 20 pages and then go into an infinite loop, I Ctrl+Ced it
  • 11:28 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php shnwikibooks --fix --add-prefix=T328634/ | tee T328634-namespaceDupes-2.out # T328634 – another error but made more progress
  • 11:23 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php shnwikibooks --fix | tee T328634-namespaceDupes.out # T328634 – failed quickly, details in task
  • 11:22 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 11:22 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 11:12 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 11:02 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 10:27 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2002.codfw.wmnet
  • 10:19 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2002.codfw.wmnet
  • 10:17 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 10:11 moritzm: restarting FPM on mw canaries to pick up tiff security updates
  • 10:04 moritzm: installing tiff security updates
  • 09:59 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2001.codfw.wmnet
  • 09:55 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
  • 09:54 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
  • 09:51 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2001.codfw.wmnet
  • 09:40 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
  • 09:40 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
  • 09:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 398143
  • 09:19 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 398143
  • 09:16 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica gitlab1004 to 15.7.6
  • 09:13 apergos: UTC morning backport and config training window done
  • 09:13 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync
  • 09:12 elukey@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: sync
  • 09:11 elukey: roll restart of eventgate-main pods in wikikube eqiad/codfw to pick up new stream configs - T328576
  • 08:57 ariel@deploy1002: Finished scap: Backport for Enable wgMinervaEnableSiteNotice for bnwiktionary (T328630) (duration: 10m 56s)
  • 08:48 ariel@deploy1002: ariel and aishik: Backport for Enable wgMinervaEnableSiteNotice for bnwiktionary (T328630) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 08:46 ariel@deploy1002: Started scap: Backport for Enable wgMinervaEnableSiteNotice for bnwiktionary (T328630)
  • 08:39 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica gitlab1004 to 15.7.6
  • 08:37 tgr@deploy1002: Finished scap: Backport for campaigns: Donor landing page translations for sv, it, ja, fr, nl (T321370), campaigns: Donor landing page translations for sv, it, ja, fr, nl (T321370) (duration: 14m 26s)
  • 08:27 tgr@deploy1002: tgr: Backport for campaigns: Donor landing page translations for sv, it, ja, fr, nl (T321370), campaigns: Donor landing page translations for sv, it, ja, fr, nl (T321370) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 08:23 tgr@deploy1002: Started scap: Backport for campaigns: Donor landing page translations for sv, it, ja, fr, nl (T321370), campaigns: Donor landing page translations for sv, it, ja, fr, nl (T321370)
  • 06:17 kart_: Updated cxserver to 2023-02-02-004918-production (T129470, T172035, T327842)
  • 06:16 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 06:15 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 06:13 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 06:12 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 06:09 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 06:09 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 04:00 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp5024.eqsin.wmnet
  • 03:22 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5024.eqsin.wmnet with OS bullseye
  • 03:21 ejegg: payments-wiki upgraded from f20a2208 to 53d1a58d
  • 02:49 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5024.eqsin.wmnet with reason: host reimage
  • 02:46 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5024.eqsin.wmnet with reason: host reimage
  • 02:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5024.eqsin.wmnet with OS bullseye
  • 02:14 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5024.eqsin.wmnet with OS bullseye
  • 01:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5024.eqsin.wmnet with OS bullseye
  • 01:55 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp5023.eqsin.wmnet
  • 01:55 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5023.eqsin.wmnet with OS bullseye
  • 01:50 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet,service=ats-be
  • 01:50 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet,service=cdn
  • 01:49 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1075.eqiad.wmnet with OS bullseye
  • 01:27 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1075.eqiad.wmnet with reason: host reimage
  • 01:24 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1075.eqiad.wmnet with reason: host reimage
  • 01:21 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5023.eqsin.wmnet with reason: host reimage
  • 01:18 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5023.eqsin.wmnet with reason: host reimage
  • 01:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1075.eqiad.wmnet with OS bullseye
  • 00:44 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5023.eqsin.wmnet with OS bullseye
  • 00:06 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp5022.eqsin.wmnet
  • 00:04 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5022.eqsin.wmnet with OS bullseye

2023-02-01

  • 23:45 zabe@deploy1002: Finished scap: Backport for Stop writing to cuc_user and cuc_user_text in group1 wikis (T233004) (duration: 08m 07s)
  • 23:39 zabe@deploy1002: zabe: Backport for Stop writing to cuc_user and cuc_user_text in group1 wikis (T233004) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 23:37 zabe@deploy1002: Started scap: Backport for Stop writing to cuc_user and cuc_user_text in group1 wikis (T233004)
  • 23:31 rzl@cumin2002: dbctl commit (dc=all): 'Depool db2181', diff saved to https://phabricator.wikimedia.org/P43574 and previous config saved to /var/cache/conftool/dbconfig/20230201-233140-rzl.json
  • 23:31 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5022.eqsin.wmnet with reason: host reimage
  • 23:27 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5022.eqsin.wmnet with reason: host reimage
  • 23:19 dzahn@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: security release
  • 23:17 dancy@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.21 refs T325584 (duration: 06m 57s)
  • 23:10 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.21 refs T325584
  • 23:01 zabe@deploy1002: Finished scap: Backport for CachingKartographerEmbeddingHandler: Fall back to Special:BlankPage title (T328601) (duration: 07m 45s)
  • 22:55 zabe@deploy1002: zabe: Backport for CachingKartographerEmbeddingHandler: Fall back to Special:BlankPage title (T328601) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 22:54 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5022.eqsin.wmnet with OS bullseye
  • 22:53 zabe@deploy1002: Started scap: Backport for CachingKartographerEmbeddingHandler: Fall back to Special:BlankPage title (T328601)
  • 22:49 zabe@deploy1002: Finished scap: Backport for Stop writing to cuc_comment_id in group0 wikis (T233004) (duration: 13m 03s)
  • 22:47 dzahn@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: security release
  • 22:40 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp5022.eqsin.wmnet with OS bullseye
  • 22:38 zabe@deploy1002: zabe: Backport for Stop writing to cuc_comment_id in group0 wikis (T233004) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 22:36 zabe@deploy1002: Started scap: Backport for Stop writing to cuc_comment_id in group0 wikis (T233004)
  • 22:32 kindrobot: close UTC late backport window
  • 22:31 kindrobot@deploy1002: Finished scap: Backport for Enable client preferences for group1 (T327979) (duration: 10m 37s)
  • 22:22 kindrobot@deploy1002: nray and kindrobot: Backport for Enable client preferences for group1 (T327979) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 22:21 kindrobot@deploy1002: Started scap: Backport for Enable client preferences for group1 (T327979)
  • 22:14 kindrobot@deploy1002: Finished scap: Backport for Enable Linter write namespace, tag and template for all wikis (T299612) (duration: 18m 14s)
  • 21:57 kindrobot@deploy1002: kindrobot and sbailey: Backport for Enable Linter write namespace, tag and template for all wikis (T299612) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 21:57 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore100*: Applying new TLS certificates — T327675 - eevans@cumin1001
  • 21:56 kindrobot@deploy1002: Started scap: Backport for Enable Linter write namespace, tag and template for all wikis (T299612)
  • 21:53 aokoth@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: Security Release
  • 21:52 kindrobot@deploy1002: Finished scap: Backport for Disable write old for CheckUserLog reason on group 0 (T233004) (duration: 14m 53s)
  • 21:43 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5022.eqsin.wmnet with OS bullseye
  • 21:39 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore100*: Applying new TLS certificates — T327675 - eevans@cumin1001
  • 21:39 kindrobot@deploy1002: dreamyjazz and kindrobot: Backport for Disable write old for CheckUserLog reason on group 0 (T233004) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 21:37 kindrobot@deploy1002: Started scap: Backport for Disable write old for CheckUserLog reason on group 0 (T233004)
  • 21:32 kindrobot@deploy1002: Finished scap: Backport for Disable wgParserEnableLegacyMediaDOM on group1 wikis (T314318) (duration: 13m 56s)
  • 21:26 eevans@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=codfw
  • 21:26 eevans@puppetmaster1001: conftool action : get/pooled=true; selector: dnsdisc=sessionstore,name=codfw
  • 21:26 eevans@puppetmaster1001: conftool action : get/pooled=true; selector: dnsdisc=sessionstore,name=codfw
  • 21:24 aokoth@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release
  • 21:20 kindrobot@deploy1002: arlolra and kindrobot: Backport for Disable wgParserEnableLegacyMediaDOM on group1 wikis (T314318) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 21:19 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore200*: Applying new TLS certificates — T327675 - eevans@cumin1001
  • 21:18 kindrobot@deploy1002: Started scap: Backport for Disable wgParserEnableLegacyMediaDOM on group1 wikis (T314318)
  • 21:14 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3065.esams.wmnet
  • 21:10 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3065.esams.wmnet with OS bullseye
  • 21:03 kindrobot: start UTC late backport deployment window
  • 21:02 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore200*: Applying new TLS certificates — T327675 - eevans@cumin1001
  • 20:46 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3065.esams.wmnet with reason: host reimage
  • 20:44 eevans@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=codfw
  • 20:43 urandom: depooling sessionstore —codfw— in preparation for Cassandra restarts — T327675
  • 20:42 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3065.esams.wmnet with reason: host reimage
  • 20:40 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3064.esams.wmnet
  • 20:38 eevans@puppetmaster1001: conftool action : get/pooled; selector: dnsdisc=$SERVICE,name=$DC
  • 20:33 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3064.esams.wmnet with OS bullseye
  • 20:22 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3065.esams.wmnet with OS bullseye
  • 20:21 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3063.esams.wmnet
  • 20:11 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3064.esams.wmnet with reason: host reimage
  • 20:09 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3063.esams.wmnet with OS bullseye
  • 20:08 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3064.esams.wmnet with reason: host reimage
  • 20:03 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5031.eqsin.wmnet,service=ats-be
  • 20:03 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5031.eqsin.wmnet,service=cdn
  • 20:00 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5031.eqsin.wmnet with OS bullseye
  • 19:53 dancy: The train is blocked on T328601
  • 19:49 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3064.esams.wmnet with OS bullseye
  • 19:49 dancy@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.20 refs T325584 (duration: 06m 36s)
  • 19:49 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3062.esams.wmnet
  • 19:48 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3062.esams.wmnet with OS bullseye
  • 19:48 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3063.esams.wmnet with reason: host reimage
  • 19:45 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3063.esams.wmnet with reason: host reimage
  • 19:42 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.20 refs T325584
  • 19:41 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5021.eqsin.wmnet,service=ats-be
  • 19:41 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5021.eqsin.wmnet,service=cdn
  • 19:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5021.eqsin.wmnet with OS bullseye
  • 19:33 dancy@deploy1002: deploy-promote aborted: (duration: 11m 58s)
  • 19:33 dancy@deploy1002: sync-file aborted: group1 wikis to 1.40.0-wmf.21 refs T325584 (duration: 03m 38s)
  • 19:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5031.eqsin.wmnet with reason: host reimage
  • 19:29 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.21 refs T325584
  • 19:27 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5031.eqsin.wmnet with reason: host reimage
  • 19:26 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3062.esams.wmnet with reason: host reimage
  • 19:24 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3063.esams.wmnet with OS bullseye
  • 19:24 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3061.esams.wmnet
  • 19:24 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3062.esams.wmnet with reason: host reimage
  • 19:17 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3061.esams.wmnet with OS bullseye
  • 19:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5021.eqsin.wmnet with reason: host reimage
  • 19:03 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3062.esams.wmnet with OS bullseye
  • 19:02 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3060.esams.wmnet
  • 19:02 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3060.esams.wmnet with OS bullseye
  • 19:01 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5021.eqsin.wmnet with reason: host reimage
  • 18:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3061.esams.wmnet with reason: host reimage
  • 18:55 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5031.eqsin.wmnet with OS bullseye
  • 18:55 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5031.eqsin.wmnet with OS bullseye
  • 18:52 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3061.esams.wmnet with reason: host reimage
  • 18:47 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5031.eqsin.wmnet with OS bullseye
  • 18:46 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5031.eqsin.wmnet with OS bullseye
  • 18:39 jbond@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts puppetmaster2003.codfw.wmnet
  • 18:38 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3060.esams.wmnet with reason: host reimage
  • 18:37 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5031.eqsin.wmnet with OS bullseye
  • 18:35 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3060.esams.wmnet with reason: host reimage
  • 18:32 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3061.esams.wmnet with OS bullseye
  • 18:31 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3059.esams.wmnet
  • 18:31 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3059.esams.wmnet with OS bullseye
  • 18:29 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5021.eqsin.wmnet with OS bullseye
  • 18:29 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts puppetmaster2003.codfw.wmnet
  • 18:29 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5021.eqsin.wmnet with OS bullseye
  • 18:22 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5021.eqsin.wmnet with OS bullseye
  • 18:21 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on cp1075.eqiad.wmnet with reason: downtimed for idrac firmware testing
  • 18:20 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on cp1075.eqiad.wmnet with reason: downtimed for idrac firmware testing
  • 18:19 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5030.eqsin.wmnet,service=ats-be
  • 18:19 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5030.eqsin.wmnet,service=cdn
  • 18:19 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5019.eqsin.wmnet,service=ats-be
  • 18:19 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5019.eqsin.wmnet,service=cdn
  • 18:13 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3060.esams.wmnet with OS bullseye
  • 18:13 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3058.esams.wmnet
  • 18:12 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3058.esams.wmnet with OS bullseye
  • 18:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5030.eqsin.wmnet with OS bullseye
  • 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43573 and previous config saved to /var/cache/conftool/dbconfig/20230201-181036-root.json
  • 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43572 and previous config saved to /var/cache/conftool/dbconfig/20230201-181031-root.json
  • 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43571 and previous config saved to /var/cache/conftool/dbconfig/20230201-181024-root.json
  • 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43570 and previous config saved to /var/cache/conftool/dbconfig/20230201-181016-root.json
  • 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43569 and previous config saved to /var/cache/conftool/dbconfig/20230201-181011-root.json
  • 18:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3059.esams.wmnet with reason: host reimage
  • 18:03 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3059.esams.wmnet with reason: host reimage
  • 17:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43568 and previous config saved to /var/cache/conftool/dbconfig/20230201-175531-root.json
  • 17:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43567 and previous config saved to /var/cache/conftool/dbconfig/20230201-175526-root.json
  • 17:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43566 and previous config saved to /var/cache/conftool/dbconfig/20230201-175519-root.json
  • 17:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43565 and previous config saved to /var/cache/conftool/dbconfig/20230201-175511-root.json
  • 17:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43564 and previous config saved to /var/cache/conftool/dbconfig/20230201-175506-root.json
  • 17:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43563 and previous config saved to /var/cache/conftool/dbconfig/20230201-175446-root.json
  • 17:48 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3058.esams.wmnet with reason: host reimage
  • 17:45 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3058.esams.wmnet with reason: host reimage
  • 17:41 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3059.esams.wmnet with OS bullseye
  • 17:40 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3057.esams.wmnet
  • 17:40 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3057.esams.wmnet with OS bullseye
  • 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43562 and previous config saved to /var/cache/conftool/dbconfig/20230201-174026-root.json
  • 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43561 and previous config saved to /var/cache/conftool/dbconfig/20230201-174021-root.json
  • 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43560 and previous config saved to /var/cache/conftool/dbconfig/20230201-174015-root.json
  • 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43559 and previous config saved to /var/cache/conftool/dbconfig/20230201-174007-root.json
  • 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43558 and previous config saved to /var/cache/conftool/dbconfig/20230201-174001-root.json
  • 17:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43557 and previous config saved to /var/cache/conftool/dbconfig/20230201-173941-root.json
  • 17:39 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5030.eqsin.wmnet with reason: host reimage
  • 17:36 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5030.eqsin.wmnet with reason: host reimage
  • 17:25 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43555 and previous config saved to /var/cache/conftool/dbconfig/20230201-172521-root.json
  • 17:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43554 and previous config saved to /var/cache/conftool/dbconfig/20230201-172516-root.json
  • 17:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43553 and previous config saved to /var/cache/conftool/dbconfig/20230201-172510-root.json
  • 17:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43552 and previous config saved to /var/cache/conftool/dbconfig/20230201-172502-root.json
  • 17:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43551 and previous config saved to /var/cache/conftool/dbconfig/20230201-172456-root.json
  • 17:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43550 and previous config saved to /var/cache/conftool/dbconfig/20230201-172436-root.json
  • 17:23 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3058.esams.wmnet with OS bullseye
  • 17:22 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3056.esams.wmnet
  • 17:22 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3056.esams.wmnet with OS bullseye
  • 17:19 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3057.esams.wmnet with reason: host reimage
  • 17:17 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5019.eqsin.wmnet with OS bullseye
  • 17:15 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3057.esams.wmnet with reason: host reimage
  • 17:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43549 and previous config saved to /var/cache/conftool/dbconfig/20230201-171016-root.json
  • 17:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43548 and previous config saved to /var/cache/conftool/dbconfig/20230201-171011-root.json
  • 17:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43547 and previous config saved to /var/cache/conftool/dbconfig/20230201-171005-root.json
  • 17:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43546 and previous config saved to /var/cache/conftool/dbconfig/20230201-170957-root.json
  • 17:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43545 and previous config saved to /var/cache/conftool/dbconfig/20230201-170951-root.json
  • 17:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43544 and previous config saved to /var/cache/conftool/dbconfig/20230201-170931-root.json
  • 16:57 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5030.eqsin.wmnet with OS bullseye
  • 16:57 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5030.eqsin.wmnet with OS bullseye
  • 16:57 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3056.esams.wmnet with reason: host reimage
  • 16:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43543 and previous config saved to /var/cache/conftool/dbconfig/20230201-165512-root.json
  • 16:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43542 and previous config saved to /var/cache/conftool/dbconfig/20230201-165506-root.json
  • 16:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43541 and previous config saved to /var/cache/conftool/dbconfig/20230201-165500-root.json
  • 16:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43540 and previous config saved to /var/cache/conftool/dbconfig/20230201-165452-root.json
  • 16:54 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3056.esams.wmnet with reason: host reimage
  • 16:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43539 and previous config saved to /var/cache/conftool/dbconfig/20230201-165446-root.json
  • 16:54 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3057.esams.wmnet with OS bullseye
  • 16:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43538 and previous config saved to /var/cache/conftool/dbconfig/20230201-165426-root.json
  • 16:42 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5030.eqsin.wmnet with OS bullseye
  • 16:42 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5030.eqsin.wmnet with OS bullseye
  • 16:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P43536 and previous config saved to /var/cache/conftool/dbconfig/20230201-164007-root.json
  • 16:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P43535 and previous config saved to /var/cache/conftool/dbconfig/20230201-164002-root.json
  • 16:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P43534 and previous config saved to /var/cache/conftool/dbconfig/20230201-163955-root.json
  • 16:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P43533 and previous config saved to /var/cache/conftool/dbconfig/20230201-163947-root.json
  • 16:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P43532 and previous config saved to /var/cache/conftool/dbconfig/20230201-163941-root.json
  • 16:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43531 and previous config saved to /var/cache/conftool/dbconfig/20230201-163921-root.json
  • 16:33 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5030.eqsin.wmnet with OS bullseye
  • 16:33 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3056.esams.wmnet with OS bullseye
  • 16:31 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5030.eqsin.wmnet with OS bullseye
  • 16:29 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5019.eqsin.wmnet with reason: host reimage
  • 16:26 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5019.eqsin.wmnet with reason: host reimage
  • 16:25 jynus: reloaded apache on mailman
  • 16:25 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5030.eqsin.wmnet with OS bullseye
  • 16:23 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 16:22 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 16:15 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 16:14 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 16:14 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 16:13 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 15:53 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5019.eqsin.wmnet with OS bullseye
  • 15:51 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5019.eqsin.wmnet with OS bullseye
  • 15:31 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5019.eqsin.wmnet with OS bullseye
  • 14:56 sukhe: cp1075.eqiad.wmnet for idrac firmware upgrade testing
  • 14:55 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1075.eqiad.wmnet,service=ats-be
  • 14:55 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1075.eqiad.wmnet,service=cdn
  • 14:52 awight: EU deployment window complete
  • 14:48 ayounsi@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:48 awight@deploy1002: Finished scap: Backport for wmf-config: add new revision-score streams for EventGate main (T317768) (duration: 08m 25s)
  • 14:47 ayounsi@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:41 awight@deploy1002: elukey and awight: Backport for wmf-config: add new revision-score streams for EventGate main (T317768) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 14:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2136 db2158 db2157 es2026 db2106 db2146 T327404', diff saved to https://phabricator.wikimedia.org/P43530 and previous config saved to /var/cache/conftool/dbconfig/20230201-144152-root.json
  • 14:40 ayounsi@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:40 ayounsi@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:40 awight@deploy1002: Started scap: Backport for wmf-config: add new revision-score streams for EventGate main (T317768)
  • 14:39 ayounsi@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:39 ayounsi@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:37 awight@deploy1002: Finished scap: Backport for Add cswiki to desktop-improvements group. (T328154) (duration: 09m 22s)
  • 14:29 awight@deploy1002: jdrewniak and awight: Backport for Add cswiki to desktop-improvements group. (T328154) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 14:28 awight@deploy1002: Started scap: Backport for Add cswiki to desktop-improvements group. (T328154)
  • 14:26 awight@deploy1002: Finished scap: Backport for Squashed diff to catch up to master (duration: 09m 07s)
  • 14:19 awight@deploy1002: awight and mlitn: Backport for Squashed diff to catch up to master synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 14:17 awight@deploy1002: Started scap: Backport for Squashed diff to catch up to master
  • 14:11 awight@deploy1002: backport aborted: (duration: 06m 09s)
  • 14:11 awight@deploy1002: sync-world aborted: Backport for Squashed diff to catch up to master (duration: 03m 36s)
  • 14:09 awight@deploy1002: mlitn and awight: Backport for Squashed diff to catch up to master synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 14:07 awight@deploy1002: Started scap: Backport for Squashed diff to catch up to master
  • 14:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast3005.wikimedia.org
  • 14:07 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:07 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3005.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 14:06 moritzm: updating perf on Bullseye hosts
  • 14:05 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3005.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 13:55 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 13:51 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast3005.wikimedia.org
  • 13:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast5002.wikimedia.org
  • 13:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 13:47 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 13:43 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 13:36 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast5002.wikimedia.org
  • 13:21 moritzm: installing curl security updates on bullseye
  • 13:00 stevemunene@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 12:59 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 12:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2003.codfw.wmnet
  • 12:41 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:41 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: testvm2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 12:40 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: testvm2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 12:31 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 12:27 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2003.codfw.wmnet
  • 12:16 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for testvm2002.codfw.wmnet: Renew puppet certificate - jmm@cumin2002
  • 12:15 jmm@cumin2002: START - Cookbook sre.puppet.renew-cert for testvm2002.codfw.wmnet: Renew puppet certificate - jmm@cumin2002
  • 11:29 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Move CirrusSearch settings from IS.php to ext-CirrusSearch.php, part III (T308932) (duration: 06m 43s)
  • 11:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2001.codfw.wmnet
  • 11:27 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:27 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: testvm2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: testvm2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:22 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@e1ca693] (codfw): Allow stylesheets through CSP (duration: 01m 45s)
  • 11:21 ladsgroup@deploy1002: Synchronized multiversion/MWConfigCacheGenerator.php: Move CirrusSearch settings from IS.php to ext-CirrusSearch.php, part II (T308932) (duration: 07m 04s)
  • 11:21 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:20 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@e1ca693] (codfw): Allow stylesheets through CSP
  • 11:17 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
  • 11:17 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@e1ca693] (eqiad): Allow stylesheets through CSP (duration: 00m 51s)
  • 11:16 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@e1ca693] (eqiad): Allow stylesheets through CSP
  • 11:14 ladsgroup@deploy1002: Synchronized wmf-config/ext-CirrusSearch.php: Move CirrusSearch settings from IS.php to ext-CirrusSearch.php, part I (T308932) (duration: 07m 04s)
  • 11:01 stevemunene@deploy1002: Finished deploy [analytics/refinery@a8840b0] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@a8840b0] (duration: 01m 18s)
  • 11:00 stevemunene@deploy1002: Started deploy [analytics/refinery@a8840b0] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@a8840b0]
  • 10:59 stevemunene@deploy1002: Finished deploy [analytics/refinery@a8840b0] (thin): Regular analytics weekly train THIN [analytics/refinery@a8840b0] (duration: 00m 05s)
  • 10:59 stevemunene@deploy1002: Started deploy [analytics/refinery@a8840b0] (thin): Regular analytics weekly train THIN [analytics/refinery@a8840b0]
  • 10:58 stevemunene@deploy1002: Finished deploy [analytics/refinery@a8840b0]: Regular analytics weekly train [analytics/refinery@a8840b0] (duration: 04m 29s)
  • 10:54 stevemunene@deploy1002: Started deploy [analytics/refinery@a8840b0]: Regular analytics weekly train [analytics/refinery@a8840b0]
  • 10:52 steve_munene: Deploying refinery for ops week
  • 10:42 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:42 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:42 zabe: start running migrateRevisionCommentTemp in remaining sections (for now except s3) in screens # T275246
  • 10:42 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 10:42 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 10:41 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:41 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 10:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host krb2002.codfw.wmnet with OS bullseye
  • 10:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on krb2002.codfw.wmnet with reason: host reimage
  • 10:05 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on krb2002.codfw.wmnet with reason: host reimage
  • 10:01 godog: upgrade grafana to 8.5.20 on cloudmetrics* - T328405
  • 09:57 godog: upgrade grafana to 8.5.20 on grafana1002 - T328405
  • 09:50 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host krb2002.codfw.wmnet with OS bullseye
  • 09:47 godog: upgrade grafana to 8.5.20 on grafana2001 - T328405
  • 09:15 urbanecm: Clean sign up throttle for IP 195.113.145.2 (via resetAuthenticationThrottle.php; T328521)
  • 09:14 urbanecm@deploy1002: Finished scap: Backport for Add new throttle rule (T328521) (duration: 07m 24s)
  • 09:07 urbanecm@deploy1002: Started scap: Backport for Add new throttle rule (T328521)
  • 09:06 urbanecm@deploy1002: backport aborted: (duration: 00m 01s)
  • 09:05 ladsgroup@deploy1002: Finished scap: Backport for Create additional namespaces on shn.wikibooks (T327850) (duration: 15m 06s)
  • 08:54 stevemunene@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: apply on main
  • 08:54 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 08:52 ladsgroup@deploy1002: superpes and ladsgroup: Backport for Create additional namespaces on shn.wikibooks (T327850) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 08:50 ladsgroup@deploy1002: Started scap: Backport for Create additional namespaces on shn.wikibooks (T327850)
  • 08:49 ladsgroup@deploy1002: Finished scap: Backport for Add a wordmark to trwiktionary (T328499) (duration: 08m 05s)
  • 08:45 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=k8s-ingress-staging
  • 08:45 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=k8s-ingress-staging
  • 08:42 ladsgroup@deploy1002: superpes and ladsgroup: Backport for Add a wordmark to trwiktionary (T328499) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 08:41 ladsgroup@deploy1002: Started scap: Backport for Add a wordmark to trwiktionary (T328499)
  • 08:40 ladsgroup@deploy1002: Finished scap: Backport for Add mobile wordmark to cswiktionary (T328357) (duration: 12m 26s)
  • 08:29 ladsgroup@deploy1002: superpes and ladsgroup: Backport for Add mobile wordmark to cswiktionary (T328357) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 08:27 ladsgroup@deploy1002: Started scap: Backport for Add mobile wordmark to cswiktionary (T328357)
  • 08:27 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 08:27 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 08:27 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 08:27 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 08:27 ladsgroup@deploy1002: Finished scap: Backport for Remove former EventLogging streams for navtiming (T281103 T286703 T308621 T323623) (duration: 09m 42s)
  • 08:19 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 6 hosts
  • 08:19 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for 6 hosts
  • 08:19 ladsgroup@deploy1002: ladsgroup and krinkle: Backport for Remove former EventLogging streams for navtiming (T281103 T286703 T308621 T323623) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 08:17 ladsgroup@deploy1002: Started scap: Backport for Remove former EventLogging streams for navtiming (T281103 T286703 T308621 T323623)
  • 08:14 ladsgroup@deploy1002: Finished scap: Backport for Remove unused eventlogging_RUMSpeedIndex stream (T286700) (duration: 10m 15s)
  • 08:06 ladsgroup@deploy1002: phedenskog and ladsgroup: Backport for Remove unused eventlogging_RUMSpeedIndex stream (T286700) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 08:05 moritzm: installing libarchive security updates
  • 08:04 ladsgroup@deploy1002: Started scap: Backport for Remove unused eventlogging_RUMSpeedIndex stream (T286700)
  • 08:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 55821
  • 07:57 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 55821
  • 07:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T310011)', diff saved to https://phabricator.wikimedia.org/P43524 and previous config saved to /var/cache/conftool/dbconfig/20230201-073348-ladsgroup.json
  • 07:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P43523 and previous config saved to /var/cache/conftool/dbconfig/20230201-071841-ladsgroup.json
  • 07:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P43522 and previous config saved to /var/cache/conftool/dbconfig/20230201-070335-ladsgroup.json
  • 06:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T310011)', diff saved to https://phabricator.wikimedia.org/P43521 and previous config saved to /var/cache/conftool/dbconfig/20230201-064828-ladsgroup.json
  • 06:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T310011)', diff saved to https://phabricator.wikimedia.org/P43520 and previous config saved to /var/cache/conftool/dbconfig/20230201-064311-ladsgroup.json
  • 06:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 06:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 06:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 06:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 00:38 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3055.esams.wmnet
  • 00:37 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3055.esams.wmnet with OS bullseye
  • 00:15 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3055.esams.wmnet with reason: host reimage
  • 00:12 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3055.esams.wmnet with reason: host reimage
  • 00:02 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3054.esams.wmnet
  • 00:01 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3054.esams.wmnet with OS bullseye

Other archives

2000s

2010s

2020s