Jump to content

Server Admin Log/Archive 66

From Wikitech
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

2023-05-31

  • 21:00 zabe@deploy1002: Finished scap: Backport for Start reading from rev_comment_id everywhere (T299954) (duration: 07m 44s)
  • 20:57 Amir1: foreachwikiindblist group1 extensions/AbuseFilter/maintenance/MigrateActorsAF.php (T336224)
  • 20:55 Amir1: foreachwikiindblist group0 extensions/AbuseFilter/maintenance/MigrateActorsAF.php (T336224)
  • 20:54 zabe@deploy1002: zabe: Backport for Start reading from rev_comment_id everywhere (T299954) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:52 zabe@deploy1002: Started scap: Backport for Start reading from rev_comment_id everywhere (T299954)
  • 20:40 urbanecm@deploy1002: Finished scap: Backport for Fix description link icon positioning (T329364) (duration: 12m 51s)
  • 20:30 mforns@deploy1002: Finished deploy [analytics/refinery@04c11e6] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@04c11e6] (duration: 01m 29s)
  • 20:28 urbanecm@deploy1002: arlolra and urbanecm: Backport for Fix description link icon positioning (T329364) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:28 mforns@deploy1002: Started deploy [analytics/refinery@04c11e6] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@04c11e6]
  • 20:28 mforns@deploy1002: Finished deploy [analytics/refinery@04c11e6] (thin): Regular analytics weekly train THIN [analytics/refinery@04c11e6] (duration: 00m 04s)
  • 20:28 mforns@deploy1002: Started deploy [analytics/refinery@04c11e6] (thin): Regular analytics weekly train THIN [analytics/refinery@04c11e6]
  • 20:27 mforns@deploy1002: Finished deploy [analytics/refinery@04c11e6]: Regular analytics weekly train [analytics/refinery@04c11e6] (duration: 05m 53s)
  • 20:27 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dns1004']
  • 20:27 urbanecm@deploy1002: Started scap: Backport for Fix description link icon positioning (T329364)
  • 20:26 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns1004']
  • 20:26 urbanecm@deploy1002: Finished scap: Backport for Enable EditInSequence for beta-testing on napwikisource (T337472) (duration: 10m 09s)
  • 20:22 mforns@deploy1002: Started deploy [analytics/refinery@04c11e6]: Regular analytics weekly train [analytics/refinery@04c11e6]
  • 20:18 urbanecm@deploy1002: soda and urbanecm: Backport for Enable EditInSequence for beta-testing on napwikisource (T337472) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 20:17 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dns1006']
  • 20:16 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dns1005']
  • 20:16 urbanecm@deploy1002: Started scap: Backport for Enable EditInSequence for beta-testing on napwikisource (T337472)
  • 20:15 urbanecm@deploy1002: Finished scap: Backport for Enables ab test for multiple languages (T336969) (duration: 11m 56s)
  • 20:10 robh@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dns1004']
  • 20:09 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns1004']
  • 20:07 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns1006']
  • 20:07 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns1005']
  • 20:07 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dns1004']
  • 20:05 urbanecm@deploy1002: ksarabia and urbanecm: Backport for Enables ab test for multiple languages (T336969) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:03 urbanecm@deploy1002: Started scap: Backport for Enables ab test for multiple languages (T336969)
  • 19:57 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns1004']
  • 19:55 robh@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dns1004']
  • 19:55 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns1004']
  • 19:55 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dns1004']
  • 19:54 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns1004']
  • 19:54 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:54 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns1005.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:54 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:20 JustHann_: we started swapping in dumpsdata1006 as primary nfs dumps server, replacing dumpsdata1005 at 16:55 UTC and completed at 19:09 UTC
  • 19:19 robh@cumin1001: START - Cookbook sre.hosts.provision for host dns1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:18 robh@cumin1001: START - Cookbook sre.hosts.provision for host dns1005.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:17 robh@cumin1001: START - Cookbook sre.hosts.provision for host dns1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:16 robh@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['dns1004']
  • 19:16 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns1004']
  • 19:14 robh@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dns1006']
  • 19:14 robh@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dns1005']
  • 19:14 robh@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dns1004']
  • 19:13 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns1006']
  • 19:13 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns1005']
  • 19:13 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns1004']
  • 19:12 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns1006
  • 19:11 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dns1006
  • 19:11 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns1005
  • 19:10 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dns1005
  • 19:09 JustHann_: swapping in dumpsdata1006 as primary nfs dumps server, replacing dumpsdata1005 now completed!
  • 18:59 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns1004
  • 18:58 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dns1004
  • 18:57 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:57 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new dns100[345] - robh@cumin1001"
  • 18:56 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new dns100[345] - robh@cumin1001"
  • 18:53 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 18:40 dduvall@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.11 refs T337525 (duration: 06m 02s)
  • 18:38 mforns@deploy1002: Finished deploy [airflow-dags/analytics_product@b3eb622]: (no justification provided) (duration: 00m 07s)
  • 18:38 mforns@deploy1002: Started deploy [airflow-dags/analytics_product@b3eb622]: (no justification provided)
  • 18:34 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.11 refs T337525
  • 18:01 ran `wikiadmin2023@10.64.32.139(huwiki)> UPDATE thread SET thread_signature = '<span title="bétaverzió"> <!--<font style="text-decoration: blink;">--><font color="red">♥</font><font color="white">♥</font><font color="green">♥</font> </font> [[User:Gubbubu|<font color="green" face="Lucida calligraphy">Γουββος Θιλο' WHERE thread_id = 1288;` (with `BEGIN`/`COMMIT`) for T337700
  • 17:31 ladsgroup@deploy1002: Backport cancelled.
  • 17:30 ladsgroup@deploy1002: Finished scap: Backport for Remove legacy encoding option from dawiktionary (T128155) (duration: 12m 54s)
  • 17:18 ladsgroup@deploy1002: ladsgroup: Backport for Remove legacy encoding option from dawiktionary (T128155) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 17:17 ladsgroup@deploy1002: Started scap: Backport for Remove legacy encoding option from dawiktionary (T128155)
  • 17:10 brett@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS maintenance in codfw, blocking deploys T322937 (duration: 11m 07s)
  • 17:10 brett: Maglev LVS scheduler rollout in codfw finished - T263797
  • 17:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=0) rolling custom on P{cp[2037,2039,2041].codfw.wmnet} and A:cp
  • 16:59 brett@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS maintenance in codfw, blocking deploys T322937
  • 16:37 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@5379d83]: (no justification provided) (duration: 00m 34s)
  • 16:37 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@5379d83]: (no justification provided)
  • 16:22 elukey: `systemctl reset-failed session-c6111.scope session-c7230.scope` on stat1005 to clear old alerts
  • 16:20 vgutierrez@cumin1001: START - Cookbook sre.cdn.run-puppet-restart-varnish rolling custom on P{cp[2037,2039,2041].codfw.wmnet} and A:cp
  • 16:13 vgutierrez: repool cp2035 - T337247 T323557
  • 16:12 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp2035.codfw.wmnet
  • 16:12 vgutierrez@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp2035.codfw.wmnet
  • 16:10 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:10 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:08 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:08 otto@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:04 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:04 otto@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:51 Emperor: swift delete virtual machines from "swift" WMCS project
  • 15:51 brett@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS maintenance in codfw, blocking deploys T322937
  • 15:50 brett@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS maintenance in eqiad, blocking deploys T322937 (duration: 02m 24s)
  • 15:48 brett@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS maintenance in eqiad, blocking deploys T322937
  • 15:47 Emperor: delete virtual machines from "swift" WMCS project
  • 15:45 vgutierrez: cp2035 depooled as puppet is unable to run due to ipmi issues - T337247
  • 15:42 brett: Maglev LVS scheduler rollout began IN PROGRESS, not finished - T263797
  • 15:42 brett: Maglev LVS scheduler rollout finished in codfw - T263797
  • 15:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on cp2035.codfw.wmnet with reason: ipmi/mgmt console issues
  • 15:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on cp2035.codfw.wmnet with reason: ipmi/mgmt console issues
  • 15:39 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=1) rolling custom on A:cp-text_codfw
  • 14:55 klausman@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 14:55 klausman@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 14:54 klausman@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 14:54 klausman@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 14:50 klausman@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 14:50 klausman@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 14:44 bking@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:43 bking@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:41 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 14:27 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:27 bking@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:25 bking@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:25 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:25 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:14 urbanecm@deploy1002: Finished scap: Backport for NewImpact: Cache empty user impact on account creation (T337320), Personalized praise: Fix first-ever notifications (T322452) (duration: 07m 26s)
  • 14:08 urbanecm@deploy1002: urbanecm: Backport for NewImpact: Cache empty user impact on account creation (T337320), Personalized praise: Fix first-ever notifications (T322452) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 14:07 urbanecm@deploy1002: Started scap: Backport for NewImpact: Cache empty user impact on account creation (T337320), Personalized praise: Fix first-ever notifications (T322452)
  • 14:02 urbanecm@deploy1002: Finished scap: Backport for Personalized praise: Fix first-ever notifications (T322452), DeleteAction: Replace remaining OOUI fields (T337809) (duration: 11m 11s)
  • 14:02 hnowlan@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1019*,lvs2009*} and A:lvs (T329049)
  • 13:58 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:57 otto@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:56 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 13:56 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 13:55 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 13:55 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 13:55 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 13:55 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 13:54 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 13:53 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:53 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 13:53 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:53 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 13:53 urbanecm@deploy1002: daimona and urbanecm: Backport for Personalized praise: Fix first-ever notifications (T322452), DeleteAction: Replace remaining OOUI fields (T337809) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 13:52 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:51 urbanecm@deploy1002: Started scap: Backport for Personalized praise: Fix first-ever notifications (T322452), DeleteAction: Replace remaining OOUI fields (T337809)
  • 13:46 hnowlan@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1019*,lvs2009*} and A:lvs (T329049)
  • 13:44 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:44 otto@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:42 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:41 otto@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:41 urbanecm@deploy1002: Finished scap: Backport for Enable wgMinervaEnableSiteNotice for bnwikiquote (T337683) (duration: 10m 01s)
  • 13:39 ottomata: destroy mw-page-content-change-enrich deployment in dse-k8s-eqiad in order to deploy in wikikube - T330507
  • 13:35 godog: rm cadvisor.service symlink/alias and restart kubelet on affected hosts - T337836
  • 13:33 urbanecm@deploy1002: mdsshakil and urbanecm: Backport for Enable wgMinervaEnableSiteNotice for bnwikiquote (T337683) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 13:31 urbanecm@deploy1002: Started scap: Backport for Enable wgMinervaEnableSiteNotice for bnwikiquote (T337683)
  • 13:29 vgutierrez@cumin1001: START - Cookbook sre.cdn.run-puppet-restart-varnish rolling custom on A:cp-text_codfw
  • 13:28 urbanecm@deploy1002: Finished scap: Backport for [Growth] Enable new Impact for 10 additional wikis (T336203) (duration: 08m 13s)
  • 13:21 urbanecm@deploy1002: urbanecm: Backport for [Growth] Enable new Impact for 10 additional wikis (T336203) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 13:20 urbanecm@deploy1002: Started scap: Backport for [Growth] Enable new Impact for 10 additional wikis (T336203)
  • 13:19 urbanecm@deploy1002: Finished scap: Backport for Revert "Revert "Switch VisualEditor to not use RESTbase on small and medium wikis"" (duration: 15m 10s)
  • 13:17 volans: uploaded spicerack_7.2.0 to apt.wikimedia.org bullseye-wikimedia
  • 13:15 mforns@deploy1002: Finished deploy [airflow-dags/analytics_product@5a38fbf]: (no justification provided) (duration: 00m 06s)
  • 13:15 mforns@deploy1002: Started deploy [airflow-dags/analytics_product@5a38fbf]: (no justification provided)
  • 13:06 urbanecm@deploy1002: urbanecm and d3r1ck01: Backport for Revert "Revert "Switch VisualEditor to not use RESTbase on small and medium wikis"" synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:04 urbanecm@deploy1002: Started scap: Backport for Revert "Revert "Switch VisualEditor to not use RESTbase on small and medium wikis""
  • 12:35 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=0) rolling custom on P{cp[2028,2030,2032,2034,2036,2038,2040].codfw.wmnet} and A:cp
  • 12:03 jayme: re-enabling puppet on all kubernetes hosts
  • 12:00 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:59 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:54 jayme: disabled puppet on all kubernetes hosts apart from staging-codfw for https://gerrit.wikimedia.org/r/c/operations/puppet/+/924905
  • 11:36 hnowlan@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1020*,lvs2010*} and A:lvs (T329049)
  • 11:20 apergos: rebooted dumpsdata1006 manually after seeral timeouts trying to use the cookbook; in the end, forced to powercycle the host via mgmt console
  • 11:18 hnowlan@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1020*,lvs2010*} and A:lvs (T329049)
  • 11:14 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts doc2001.codfw.wmnet
  • 11:14 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:14 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doc2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eoghan@cumin1001"
  • 11:12 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doc2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eoghan@cumin1001"
  • 11:07 eoghan@cumin1001: START - Cookbook sre.dns.netbox
  • 10:34 ladsgroup@deploy1002: Finished scap: Backport for mwscript: Avoid prepending maintenance/ if >= 2 dots in argument (T336819) (duration: 08m 50s)
  • 10:34 eoghan@cumin1001: START - Cookbook sre.hosts.decommission for hosts doc2001.codfw.wmnet
  • 10:27 ladsgroup@deploy1002: ladsgroup: Backport for mwscript: Avoid prepending maintenance/ if >= 2 dots in argument (T336819) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 10:25 ladsgroup@deploy1002: Started scap: Backport for mwscript: Avoid prepending maintenance/ if >= 2 dots in argument (T336819)
  • 10:21 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host dumpsdata1006.eqiad.wmnet
  • 10:20 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1006.eqiad.wmnet
  • 10:17 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host dumpsdata1006.eqiad.wmnet
  • 10:16 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1006.eqiad.wmnet
  • 10:12 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host dumpsdata1006.eqiad.wmnet
  • 10:12 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1006.eqiad.wmnet
  • 10:11 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts doc1002.eqiad.wmnet
  • 10:11 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:11 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doc1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eoghan@cumin1001"
  • 10:08 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doc1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eoghan@cumin1001"
  • 10:07 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host dumpsdata1006.eqiad.wmnet
  • 10:07 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1006.eqiad.wmnet
  • 10:01 vgutierrez@cumin1001: START - Cookbook sre.cdn.run-puppet-restart-varnish rolling custom on P{cp[2028,2030,2032,2034,2036,2038,2040].codfw.wmnet} and A:cp
  • 10:01 eoghan@cumin1001: START - Cookbook sre.dns.netbox
  • 09:57 eoghan@cumin1001: START - Cookbook sre.hosts.decommission for hosts doc1002.eqiad.wmnet
  • 09:56 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol2004-dev.wikimedia.org
  • 09:56 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:56 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2004-dev.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
  • 09:49 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2004-dev.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
  • 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48665 and previous config saved to /var/cache/conftool/dbconfig/20230531-093659-root.json
  • 09:36 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 09:35 aborrero@cumin2002: START - Cookbook sre.dns.netbox
  • 09:23 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=0) rolling custom on P{cp2042.codfw.wmnet} and A:cp
  • 09:23 aborrero@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudcontrol2004-dev.wikimedia.org
  • 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48663 and previous config saved to /var/cache/conftool/dbconfig/20230531-092154-root.json
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48662 and previous config saved to /var/cache/conftool/dbconfig/20230531-090649-root.json
  • 09:01 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: bast2002.wikimedia.org
  • 09:01 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: bast2002.wikimedia.org
  • 09:00 vgutierrez@cumin1001: START - Cookbook sre.cdn.run-puppet-restart-varnish rolling custom on P{cp2042.codfw.wmnet} and A:cp
  • 08:59 fabfur: Testing new cookbook to switch port 80 from Varnish to HAProxy on cp2042
  • 08:58 mvernon@cumin1001: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe1009.eqiad.wmnet
  • 08:58 mvernon@cumin1001: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe1009.eqiad.wmnet
  • 08:55 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: labstore1005.eqiad.wmnet
  • 08:55 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: labstore1005.eqiad.wmnet
  • 08:55 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: labstore1004.eqiad.wmnet
  • 08:55 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: labstore1004.eqiad.wmnet
  • 08:52 moritzm: manually run puppet node clean/deactivate for labstore1004/1005 (which run into a traceback in the decom script) T337269
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48661 and previous config saved to /var/cache/conftool/dbconfig/20230531-085145-root.json
  • 08:41 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1009.eqiad.wmnet with OS bullseye
  • 08:39 kostajh: UTC morning deploys done
  • 08:37 kharlan@deploy1002: Finished scap: Backport for NewImpact: Cache empty user impact on account creation (T337320) (duration: 13m 48s)
  • 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48660 and previous config saved to /var/cache/conftool/dbconfig/20230531-083640-root.json
  • 08:25 kharlan@deploy1002: kharlan: Backport for NewImpact: Cache empty user impact on account creation (T337320) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 08:24 kharlan@deploy1002: Started scap: Backport for NewImpact: Cache empty user impact on account creation (T337320)
  • 08:22 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe1009.eqiad.wmnet with reason: host reimage
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48659 and previous config saved to /var/cache/conftool/dbconfig/20230531-082135-root.json
  • 08:20 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe1009.eqiad.wmnet with reason: host reimage
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48658 and previous config saved to /var/cache/conftool/dbconfig/20230531-080631-root.json
  • 08:04 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1009.eqiad.wmnet with OS bullseye
  • 07:58 legoktm@deploy1002: Finished scap: Backport for Remove GWToolset configuration (2/2) (T270911) (duration: 38m 58s)
  • 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48657 and previous config saved to /var/cache/conftool/dbconfig/20230531-075126-root.json
  • 07:41 legoktm@deploy1002: legoktm: Backport for Remove GWToolset configuration (2/2) (T270911) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 07:35 godog: apache2 restarted on logstash1032 before I could get a backtrace to debug logstash lag
  • 07:19 legoktm@deploy1002: Started scap: Backport for Remove GWToolset configuration (2/2) (T270911)
  • 07:17 legoktm@deploy1002: Finished scap: Backport for Remove GWToolset configuration (1/2) (T270911) (duration: 09m 51s)
  • 07:09 legoktm@deploy1002: legoktm: Backport for Remove GWToolset configuration (1/2) (T270911) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 07:07 legoktm@deploy1002: Started scap: Backport for Remove GWToolset configuration (1/2) (T270911)
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48656 and previous config saved to /var/cache/conftool/dbconfig/20230531-070730-root.json
  • 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48655 and previous config saved to /var/cache/conftool/dbconfig/20230531-065225-root.json
  • 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48654 and previous config saved to /var/cache/conftool/dbconfig/20230531-064327-root.json
  • 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48653 and previous config saved to /var/cache/conftool/dbconfig/20230531-063721-root.json
  • 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48652 and previous config saved to /var/cache/conftool/dbconfig/20230531-062823-root.json
  • 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48651 and previous config saved to /var/cache/conftool/dbconfig/20230531-062216-root.json
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48650 and previous config saved to /var/cache/conftool/dbconfig/20230531-061318-root.json
  • 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48649 and previous config saved to /var/cache/conftool/dbconfig/20230531-060710-root.json
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48648 and previous config saved to /var/cache/conftool/dbconfig/20230531-055813-root.json
  • 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48647 and previous config saved to /var/cache/conftool/dbconfig/20230531-055205-root.json
  • 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48646 and previous config saved to /var/cache/conftool/dbconfig/20230531-054308-root.json
  • 05:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48645 and previous config saved to /var/cache/conftool/dbconfig/20230531-053700-root.json
  • 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48644 and previous config saved to /var/cache/conftool/dbconfig/20230531-052804-root.json
  • 05:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48643 and previous config saved to /var/cache/conftool/dbconfig/20230531-052156-root.json
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48642 and previous config saved to /var/cache/conftool/dbconfig/20230531-051259-root.json
  • 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1221 (sanitarium s4 master) T337446', diff saved to https://phabricator.wikimedia.org/P48640 and previous config saved to /var/cache/conftool/dbconfig/20230531-045927-root.json
  • 04:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48639 and previous config saved to /var/cache/conftool/dbconfig/20230531-045754-root.json
  • 02:19 eileen: civicrm upgraded from 5905a403 to 885208ca

2023-05-30

  • 23:38 zabe@deploy1002: Finished scap: Backport for Start reading from rev_comment_id in group1 wikis (T299954) (duration: 08m 00s)
  • 23:31 zabe@deploy1002: zabe: Backport for Start reading from rev_comment_id in group1 wikis (T299954) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 23:30 zabe@deploy1002: Started scap: Backport for Start reading from rev_comment_id in group1 wikis (T299954)
  • 22:22 ejegg: civicrm upgraded from 415aa7e5 to 5905a403
  • 21:56 samtar@deploy1002: Finished scap: Backport for linker: Check for null parser in Linker::makeThumbLink2 (T337794) (duration: 07m 48s)
  • 21:50 samtar@deploy1002: jforrester and samtar: Backport for linker: Check for null parser in Linker::makeThumbLink2 (T337794) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 21:48 samtar@deploy1002: Started scap: Backport for linker: Check for null parser in Linker::makeThumbLink2 (T337794)
  • 20:58 ladsgroup@deploy1002: ladsgroup: Backport for Add WANCache to ParserOutputPageProperties::finalize (T336698) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:57 ladsgroup@deploy1002: Started scap: Backport for Add WANCache to ParserOutputPageProperties::finalize (T336698)
  • 20:40 ladsgroup@deploy1002: Finished scap: Backport for Add WANCache to ParserOutputPageProperties::finalize (T336698) (duration: 09m 27s)
  • 20:32 ladsgroup@deploy1002: ladsgroup: Backport for Add WANCache to ParserOutputPageProperties::finalize (T336698) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:30 ladsgroup@deploy1002: Started scap: Backport for Add WANCache to ParserOutputPageProperties::finalize (T336698)
  • 20:12 inflatador: bking@wdqs2009 depool wdqs2009 until it catches up with lag
  • 20:10 samtar@deploy1002: Finished scap: Backport for Turn on A/B Test Hebrew (T336969) (duration: 08m 46s)
  • 20:03 samtar@deploy1002: ksarabia and samtar: Backport for Turn on A/B Test Hebrew (T336969) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:01 samtar@deploy1002: Started scap: Backport for Turn on A/B Test Hebrew (T336969)
  • 19:48 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@cd667c2]: Deplot Iceberg version of referrer_daily on analytics Airflow instance. T335305. (duration: 00m 09s)
  • 19:48 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@cd667c2]: Deplot Iceberg version of referrer_daily on analytics Airflow instance. T335305.
  • 19:36 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 04m 02s)
  • 19:32 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 19:29 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 54s)
  • 19:29 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 19:29 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 16m 36s)
  • 19:24 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2021.*
  • 19:12 inflatador: [WDQS Deploy] Deploying version 0.3.124
  • 19:11 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 18:27 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.11 refs T337525
  • 17:45 mutante: re-enabling puppet on contint2001
  • 16:20 rzl: rzl@mwmaint1002:~$ sudo systemctl start mediawiki_job_growthexperiments-userImpactUpdateRecentlyEdited
  • 16:19 rzl: rzl@mwmaint1002:~$ sudo systemctl start mediawiki_job_growthexperiments-userImpactUpdateRecentlyRegistered
  • 16:14 urbanecm@deploy1002: Finished scap: Backport for [Growth] Enable user impact refresh on 10 more wikis (T336203) (duration: 07m 08s)
  • 16:07 urbanecm@deploy1002: Started scap: Backport for [Growth] Enable user impact refresh on 10 more wikis (T336203)
  • 16:00 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 16:00 otto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:58 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:58 otto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:57 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:56 otto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:56 otto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:55 otto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:54 otto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:54 otto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:54 otto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:53 otto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:51 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mwlog2002.codfw.wmnet with OS bullseye
  • 15:51 otto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:51 otto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:49 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:49 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:15 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2005-dev.codfw.wmnet with OS bullseye
  • 15:15 aborrero@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
  • 15:14 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
  • 15:10 tgr_: UTC evening deploys done
  • 15:08 tgr@deploy1002: Finished scap: Backport for ve.ui.MWGalleryDialog: Fix showing the search panel (T337638), Hide 'editnotice-notext' message in VE (and mobile apps) (T337633), ve.ui.MWGalleryDialog: Fix showing the search panel (T337638) (duration: 08m 08s)
  • 15:05 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:03 bking@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:02 tgr@deploy1002: tgr and matmarex: Backport for ve.ui.MWGalleryDialog: Fix showing the search panel (T337638), Hide 'editnotice-notext' message in VE (and mobile apps) (T337633), ve.ui.MWGalleryDialog: Fix showing the search panel (T337638) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 15:00 tgr@deploy1002: Started scap: Backport for ve.ui.MWGalleryDialog: Fix showing the search panel (T337638), Hide 'editnotice-notext' message in VE (and mobile apps) (T337633), ve.ui.MWGalleryDialog: Fix showing the search panel (T337638)
  • 14:50 moritzm: installing texlive-bin security updates
  • 14:49 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwlog2002.codfw.wmnet with reason: host reimage
  • 14:46 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwlog2002.codfw.wmnet with reason: host reimage
  • 14:36 tgr@deploy1002: Finished scap: Backport for Hide 'editnotice-notext' message in VE (and mobile apps) (T337633) (duration: 08m 01s)
  • 14:29 tgr@deploy1002: matmarex and tgr: Backport for Hide 'editnotice-notext' message in VE (and mobile apps) (T337633) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 14:28 herron@cumin1001: START - Cookbook sre.hosts.reimage for host mwlog2002.codfw.wmnet with OS bullseye
  • 14:27 tgr@deploy1002: Started scap: Backport for Hide 'editnotice-notext' message in VE (and mobile apps) (T337633)
  • 14:16 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mwlog2002.codfw.wmnet with OS bullseye
  • 14:16 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host puppetdb1003.eqiad.wmnet with OS bookworm
  • 14:14 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:13 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:08 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:06 bking@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:06 moritzm: installing libwebp security updates
  • 14:06 tgr@deploy1002: Finished scap: Backport for editpage: Change the order of hooks slightly for FlaggedRevs (T337637) (duration: 08m 14s)
  • 13:59 tgr@deploy1002: tgr and matmarex: Backport for editpage: Change the order of hooks slightly for FlaggedRevs (T337637) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:58 tgr@deploy1002: Started scap: Backport for editpage: Change the order of hooks slightly for FlaggedRevs (T337637)
  • 13:57 tgr@deploy1002: Finished scap: Backport for prod: Remove $wgCampaignEventsEnableMultipleOrganizers (T334088) (duration: 16m 13s)
  • 13:56 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe2009.codfw.wmnet
  • 13:55 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe2009.codfw.wmnet
  • 13:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2009.codfw.wmnet with OS bullseye
  • 13:42 tgr@deploy1002: tgr and daimona: Backport for prod: Remove $wgCampaignEventsEnableMultipleOrganizers (T334088) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:40 tgr@deploy1002: Started scap: Backport for prod: Remove $wgCampaignEventsEnableMultipleOrganizers (T334088)
  • 13:33 mlitn@deploy1002: Finished scap: Backport for Fix maxJobs default, Fix maxJobs default (duration: 07m 39s)
  • 13:27 mlitn@deploy1002: mlitn: Backport for Fix maxJobs default, Fix maxJobs default synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:25 mlitn@deploy1002: Started scap: Backport for Fix maxJobs default, Fix maxJobs default
  • 13:20 tgr@deploy1002: Finished scap: Backport for GrowthExperiments: Re-add $wgGERestbaseUrl (duration: 09m 26s)
  • 13:13 tgr@deploy1002: tgr: Backport for GrowthExperiments: Re-add $wgGERestbaseUrl synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:11 herron@cumin1001: START - Cookbook sre.hosts.reimage for host mwlog2002.codfw.wmnet with OS bullseye
  • 13:11 tgr@deploy1002: Started scap: Backport for GrowthExperiments: Re-add $wgGERestbaseUrl
  • 13:09 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 13:09 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe2009.codfw.wmnet with reason: host reimage
  • 13:09 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 13:09 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 13:08 bblack: lvs1018: restart pybal for wikireplicas monitoring removal
  • 13:08 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 13:06 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 13:06 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2009.codfw.wmnet with reason: host reimage
  • 13:06 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 13:04 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 13:03 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
  • 13:00 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
  • 12:51 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:51 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:48 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2009.codfw.wmnet with OS bullseye
  • 12:39 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:39 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:29 volans: disablig puppet where cadvisor is present
  • 12:14 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2005-dev.codfw.wmnet with OS bullseye
  • 11:51 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2006.codfw.wmnet
  • 11:51 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:51 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:51 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for moved cloudcontrol2005-dev - cmooney@cumin1001"
  • 11:50 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for moved cloudcontrol2005-dev - cmooney@cumin1001"
  • 11:50 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 11:47 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 11:46 slyngshede@cumin1001: START - Cookbook sre.hosts.decommission for hosts testvm2006.codfw.wmnet
  • 11:45 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on puppetboard2003.codfw.wmnet,puppetboard1003.eqiad.wmnet with reason: building_systems
  • 11:45 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on puppetboard2003.codfw.wmnet,puppetboard1003.eqiad.wmnet with reason: building_systems
  • 11:41 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 11:41 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 11:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 11:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 11:14 hashar@deploy1002: Finished deploy [gerrit/gerrit@6deabc9]: wm-checks-api: add support for DUCT - T331651 (duration: 00m 08s)
  • 11:14 hashar@deploy1002: Started deploy [gerrit/gerrit@6deabc9]: wm-checks-api: add support for DUCT - T331651
  • 11:07 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2006.codfw.wmnet
  • 11:07 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm2006.codfw.wmnet with OS bookworm
  • 11:00 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 10:57 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 10:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 10:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 10:53 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2006.codfw.wmnet with reason: host reimage
  • 10:53 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 10:50 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 10:50 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2006.codfw.wmnet with reason: host reimage
  • 10:41 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 10:41 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 10:11 jbond@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host puppetboard2003.codfw.wmnet with OS bookworm
  • 10:11 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host puppetboard1003.eqiad.wmnet with OS bookworm
  • 10:00 zabe@deploy1002: Finished scap: Backport for Start reading from rev_comment_id in group0 wikis (T299954) (duration: 08m 12s)
  • 09:59 slyngshede@cumin1001: START - Cookbook sre.hosts.reimage for host testvm2006.codfw.wmnet with OS bookworm
  • 09:58 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 09:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetdb1003.eqiad.wmnet with reason: host reimage
  • 09:57 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 09:55 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetdb1003.eqiad.wmnet with reason: host reimage
  • 09:54 zabe@deploy1002: zabe: Backport for Start reading from rev_comment_id in group0 wikis (T299954) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 09:52 zabe@deploy1002: Started scap: Backport for Start reading from rev_comment_id in group0 wikis (T299954)
  • 09:52 zabe@deploy1002: Finished scap: Backport for Check for null when using ::getCheckUserHelperFieldset (T337599) (duration: 09m 52s)
  • 09:49 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetboard2003.codfw.wmnet with reason: host reimage
  • 09:46 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetboard2003.codfw.wmnet with reason: host reimage
  • 09:43 zabe@deploy1002: zabe: Backport for Check for null when using ::getCheckUserHelperFieldset (T337599) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 09:43 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 09:43 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 09:43 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:43 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 09:42 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host puppetdb1003.eqiad.wmnet with OS bookworm
  • 09:42 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 09:42 zabe@deploy1002: Started scap: Backport for Check for null when using ::getCheckUserHelperFieldset (T337599)
  • 09:40 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 09:40 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
  • 09:37 zabe@deploy1002: Finished scap: Backport for Start reading from rev_comment_id in test wikis (T299954) (duration: 07m 48s)
  • 09:34 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetboard1003.eqiad.wmnet with reason: host reimage
  • 09:33 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=2) for new host testvm2006.codfw.wmnet
  • 09:33 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 09:33 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 09:33 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:33 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 09:32 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 09:31 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetboard1003.eqiad.wmnet with reason: host reimage
  • 09:30 zabe@deploy1002: zabe: Backport for Start reading from rev_comment_id in test wikis (T299954) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 09:30 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 09:30 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetdb2003.codfw.wmnet with OS bookworm
  • 09:29 zabe@deploy1002: Started scap: Backport for Start reading from rev_comment_id in test wikis (T299954)
  • 09:27 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 09:24 tgr@deploy1002: Finished scap: Backport for Improve handling of missing image recommendation (duration: 08m 57s)
  • 09:22 jbond@cumin2002: START - Cookbook sre.hosts.reimage for host puppetboard2003.codfw.wmnet with OS bookworm
  • 09:20 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetboard1003.eqiad.wmnet with OS bookworm
  • 09:19 arturo: run aborrero@cumin1001:~ 2s 98 $ sudo cumin "P{R:Profile::Mariadb::Section = 's7'} and P{P:wmcs::db::wikireplicas::mariadb_multiinstance}" "/usr/local/sbin/maintain-meta_p --all-databases --bootstrap"
  • 09:17 tgr@deploy1002: tgr: Backport for Improve handling of missing image recommendation synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 09:15 tgr@deploy1002: Started scap: Backport for Improve handling of missing image recommendation
  • 09:14 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 09:14 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 09:14 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:14 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 09:13 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 09:11 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 09:11 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
  • 09:06 tgr@deploy1002: Finished scap: Backport for Section images: Do not treat unexpected kinds as production errors (duration: 14m 22s)
  • 09:00 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2006.codfw.wmnet
  • 09:00 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 09:00 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 09:00 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:00 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 08:59 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 08:54 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 08:53 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 08:53 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 08:53 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:53 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 08:53 tgr@deploy1002: tgr: Backport for Section images: Do not treat unexpected kinds as production errors synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 08:52 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 08:51 tgr@deploy1002: Started scap: Backport for Section images: Do not treat unexpected kinds as production errors
  • 08:50 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 08:50 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
  • 08:49 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2006.codfw.wmnet
  • 08:49 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 08:49 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 08:49 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:49 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 08:48 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 08:44 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 08:44 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 08:44 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 08:44 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:44 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 08:43 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 08:41 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 08:41 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
  • 08:39 tgr@deploy1002: Finished scap: Backport for Improve logging of invalid image recommendation kinds (duration: 10m 30s)
  • 08:39 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=2) for new host testvm2006.codfw.wmnet
  • 08:39 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 08:39 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 08:39 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:39 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 08:38 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 08:36 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 08:36 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 08:35 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 08:35 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 08:34 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 08:34 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:34 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 08:33 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 08:31 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 08:31 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
  • 08:30 tgr@deploy1002: tgr: Backport for Improve logging of invalid image recommendation kinds synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 08:29 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2006.codfw.wmnet
  • 08:29 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 08:29 tgr@deploy1002: Started scap: Backport for Improve logging of invalid image recommendation kinds
  • 08:29 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 08:28 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:28 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 08:27 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 08:27 jayme: re-enable puppet on P:kubernetes::node for https://gerrit.wikimedia.org/r/c/operations/puppet/+/909687
  • 08:25 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 08:20 jayme: disable puppet on P:kubernetes::node (apart from staging-codfw) for https://gerrit.wikimedia.org/r/c/operations/puppet/+/909687
  • 08:15 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 08:15 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 08:15 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:15 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 08:14 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 08:12 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 08:12 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
  • 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetdb2003.codfw.wmnet with reason: host reimage
  • 08:08 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetdb2003.codfw.wmnet with reason: host reimage
  • 08:08 tgr@deploy1002: Finished scap: Backport for Section images: Accept more recommendation types (duration: 07m 51s)
  • 08:01 tgr@deploy1002: tgr: Backport for Section images: Accept more recommendation types synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 08:00 tgr@deploy1002: Started scap: Backport for Section images: Accept more recommendation types
  • 07:56 ladsgroup@deploy1002: Finished scap: Backport for Revert "Rename wgPageContentLanguage to wgPageViewLanguage" partially (T337634) (duration: 09m 17s)
  • 07:49 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host puppetdb2003.codfw.wmnet with OS bookworm
  • 07:48 ladsgroup@deploy1002: func and ladsgroup: Backport for Revert "Rename wgPageContentLanguage to wgPageViewLanguage" partially (T337634) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 07:46 ladsgroup@deploy1002: Started scap: Backport for Revert "Rename wgPageContentLanguage to wgPageViewLanguage" partially (T337634)
  • 07:45 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2006.codfw.wmnet
  • 07:45 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 07:45 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 07:45 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:45 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48633 and previous config saved to /var/cache/conftool/dbconfig/20230530-074445-root.json
  • 07:44 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 07:42 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 07:41 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 07:41 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 07:41 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:41 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 07:40 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 07:38 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 07:38 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
  • 07:31 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=2) for new host testvm2006.codfw.wmnet
  • 07:31 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 07:31 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 07:31 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:31 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 07:30 moritzm: move LDAP permissions for hghani from cn=nda to cn=wmf T322145
  • 07:30 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48632 and previous config saved to /var/cache/conftool/dbconfig/20230530-072941-root.json
  • 07:29 kartik@deploy1002: Finished scap: Backport for testwiki: Enable Section Translation for 9 Wikipedia (T337290) (duration: 09m 38s)
  • 07:28 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 07:28 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 07:27 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 07:21 kartik@deploy1002: kartik: Backport for testwiki: Enable Section Translation for 9 Wikipedia (T337290) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 07:19 kartik@deploy1002: Started scap: Backport for testwiki: Enable Section Translation for 9 Wikipedia (T337290)
  • 07:17 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 07:17 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 07:17 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:17 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 07:16 kartik@deploy1002: Finished scap: Backport for Undeploy Special:Contribute from unsupported skins (T337366) (duration: 11m 49s)
  • 07:16 moritzm: update bookworm installer to rc4 T330495
  • 07:16 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48630 and previous config saved to /var/cache/conftool/dbconfig/20230530-071436-root.json
  • 07:10 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 07:10 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
  • 07:10 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2006.codfw.wmnet
  • 07:10 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 07:10 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 07:10 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:10 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 07:09 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 07:07 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 07:07 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 07:07 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 07:07 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:07 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 07:06 kartik@deploy1002: kartik: Backport for Undeploy Special:Contribute from unsupported skins (T337366) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 07:06 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 07:04 kartik@deploy1002: Started scap: Backport for Undeploy Special:Contribute from unsupported skins (T337366)
  • 07:04 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 07:03 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
  • 07:02 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=2) for new host testvm2006.codfw.wmnet
  • 07:02 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 07:02 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 07:02 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:02 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 07:01 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48629 and previous config saved to /var/cache/conftool/dbconfig/20230530-065932-root.json
  • 06:58 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 06:58 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 06:57 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 06:51 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 06:51 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 06:51 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:51 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 06:50 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 06:48 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 06:48 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48628 and previous config saved to /var/cache/conftool/dbconfig/20230530-064427-root.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48625 and previous config saved to /var/cache/conftool/dbconfig/20230530-062922-root.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48624 and previous config saved to /var/cache/conftool/dbconfig/20230530-061417-root.json
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48623 and previous config saved to /var/cache/conftool/dbconfig/20230530-055913-root.json
  • 05:43 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.peering (exit_code=97) with action 'configure' for AS: 62597
  • 05:43 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 62597
  • 05:41 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nray out of all services on: 1255 hosts
  • 05:40 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Nray out of all services on: 1255 hosts
  • 05:40 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nray out of all services on: 784 hosts
  • 05:40 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Nray out of all services on: 784 hosts
  • 05:28 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Hxi-ctr out of all services on: 784 hosts
  • 05:27 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Hxi-ctr out of all services on: 784 hosts
  • 05:26 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Hxi-ctr out of all services on: 1255 hosts
  • 05:25 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Hxi-ctr out of all services on: 1255 hosts
  • 05:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 62597
  • 05:17 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 62597
  • 04:28 kart_: Updated cxserver to 2023-05-29-112644-production (T337657)
  • 04:28 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 04:27 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 04:24 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 04:24 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 04:21 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 04:20 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 03:54 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.9 (duration: 02m 10s)
  • 03:52 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.11 refs T337525 (duration: 49m 54s)
  • 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.11 refs T337525

2023-05-29

  • 15:19 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: This is being worked on
  • 15:19 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: This is being worked on
  • 14:18 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on stat1009.eqiad.wmnet with reason: Bringing stat1009 into service
  • 14:18 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on stat1009.eqiad.wmnet with reason: Bringing stat1009 into service
  • 13:57 vgutierrez@puppetmaster1001: conftool action : set/weight=10; selector: name=dbproxy.*,dc=eqiad
  • 11:25 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 11:24 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48618 and previous config saved to /var/cache/conftool/dbconfig/20230529-112242-root.json
  • 11:13 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 11:13 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 11:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48617 and previous config saved to /var/cache/conftool/dbconfig/20230529-110737-root.json
  • 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48616 and previous config saved to /var/cache/conftool/dbconfig/20230529-105233-root.json
  • 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48615 and previous config saved to /var/cache/conftool/dbconfig/20230529-103728-root.json
  • 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48614 and previous config saved to /var/cache/conftool/dbconfig/20230529-102223-root.json
  • 10:07 vgutierrez: restarting pybal on lvs1018
  • 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48612 and previous config saved to /var/cache/conftool/dbconfig/20230529-100719-root.json
  • 10:05 oblivian@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 10:05 oblivian@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 10:05 oblivian@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 10:05 oblivian@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 10:04 oblivian@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 10:04 oblivian@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 10:03 oblivian@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 10:03 oblivian@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 10:03 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 10:02 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 10:00 vgutierrez: restarting pybal on lvs1020
  • 09:59 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 09:58 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 09:56 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 09:55 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48611 and previous config saved to /var/cache/conftool/dbconfig/20230529-095214-root.json
  • 09:52 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 09:51 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 09:50 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 09:49 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 09:45 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 09:45 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48610 and previous config saved to /var/cache/conftool/dbconfig/20230529-093709-root.json
  • 09:31 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 09:31 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 09:30 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 09:29 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 09:13 godog: start partial rollout of cadvisor to eqiad/codfw (~10%) T108027
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48609 and previous config saved to /var/cache/conftool/dbconfig/20230529-090216-root.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48608 and previous config saved to /var/cache/conftool/dbconfig/20230529-084711-root.json
  • 08:45 godog: delete old raw blocks from thanos - T337236
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48607 and previous config saved to /var/cache/conftool/dbconfig/20230529-083206-root.json
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48606 and previous config saved to /var/cache/conftool/dbconfig/20230529-081702-root.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48604 and previous config saved to /var/cache/conftool/dbconfig/20230529-080157-root.json
  • 07:57 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:56 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48603 and previous config saved to /var/cache/conftool/dbconfig/20230529-074653-root.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48602 and previous config saved to /var/cache/conftool/dbconfig/20230529-073148-root.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48601 and previous config saved to /var/cache/conftool/dbconfig/20230529-071643-root.json
  • 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool sanitarium masters for s1, s2, s3, s5 T337446', diff saved to https://phabricator.wikimedia.org/P48598 and previous config saved to /var/cache/conftool/dbconfig/20230529-051043-root.json

2023-05-28

  • 13:19 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 13:17 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 13:16 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
  • 13:16 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: sync
  • 06:12 marostegui: Change innodb_fast_shutdown to 0 on db1154 before downgrading T337446

2023-05-27

  • 21:40 Amir1: insert into templatelinks (tl_from, tl_from_namespace, tl_target_id) values (686, 0, 199); on db1154:3113 (T337446)
  • 17:42 godog: silence systemd state alert flapping on stat1009 until monday
  • 00:03 tzatziki: removing 1 file for legal compliance

2023-05-26

  • 23:48 tzatziki: removing 2 files for legal compliance
  • 20:50 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:50 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:47 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:47 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:24 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:24 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:21 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:21 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:15 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:15 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 18:26 demon@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.10 refs T330216
  • 17:38 demon@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.10 refs T330216 (duration: 06m 10s)
  • 17:31 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.10 refs T330216
  • 16:37 jbond@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host puppetboard2003.codfw.wmnet with OS bookworm
  • 16:36 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host puppetboard1003.eqiad.wmnet with OS bookworm
  • 15:54 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:54 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2005-dev.private.codfw.wikimedia.cloud - aborrero@cumin2002"
  • 15:52 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2005-dev.private.codfw.wikimedia.cloud - aborrero@cumin2002"
  • 15:50 aborrero@cumin2002: START - Cookbook sre.dns.netbox
  • 15:41 jbond@cumin2002: START - Cookbook sre.hosts.reimage for host puppetboard2003.codfw.wmnet with OS bookworm
  • 15:40 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:40 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetboard1003.eqiad.wmnet with OS bookworm
  • 15:38 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:36 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:34 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 15:34 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 15:31 nskaggs@cumin1001: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
  • 15:30 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:08 nskaggs@cumin1001: START - Cookbook sre.wikireplicas.update-views
  • 14:26 oblivian@puppetmaster1001: conftool action : set/weight=10; selector: cluster=videoscaler,dc=eqiad,name=parse.*
  • 14:25 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name=parse.*
  • 14:25 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name="parse.*"
  • 14:25 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name="parse.*"
  • 14:08 jbond@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host puppetboard1003.eqiad.wmnet
  • 14:08 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM puppetboard1003.eqiad.wmnet - jbond@cumin1001"
  • 14:06 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM puppetboard1003.eqiad.wmnet - jbond@cumin1001"
  • 14:06 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard1003.eqiad.wmnet on all recursors
  • 14:06 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard1003.eqiad.wmnet on all recursors
  • 14:06 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:06 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM puppetboard1003.eqiad.wmnet - jbond@cumin1001"
  • 14:05 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM puppetboard1003.eqiad.wmnet - jbond@cumin1001"
  • 14:03 jbond@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host puppetboard2003.codfw.wmnet
  • 14:03 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM puppetboard2003.codfw.wmnet - jbond@cumin2002"
  • 14:03 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM puppetboard2003.codfw.wmnet - jbond@cumin2002"
  • 14:02 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 14:02 jbond@cumin1001: START - Cookbook sre.ganeti.makevm for new host puppetboard1003.eqiad.wmnet
  • 14:02 jbond@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard2003.codfw.wmnet on all recursors
  • 14:02 jbond@cumin2002: START - Cookbook sre.dns.wipe-cache puppetboard2003.codfw.wmnet on all recursors
  • 14:02 jbond@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:02 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM puppetboard2003.codfw.wmnet - jbond@cumin2002"
  • 14:01 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM puppetboard2003.codfw.wmnet - jbond@cumin2002"
  • 13:58 jbond@cumin2002: START - Cookbook sre.dns.netbox
  • 13:58 jbond@cumin2002: START - Cookbook sre.ganeti.makevm for new host puppetboard2003.codfw.wmnet
  • 13:58 jbond@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host puppetdb2003.codfw.wmnet
  • 13:58 jbond@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:56 jbond@cumin2002: START - Cookbook sre.dns.netbox
  • 13:56 jbond@cumin2002: START - Cookbook sre.ganeti.makevm for new host puppetdb2003.codfw.wmnet
  • 13:56 jbond@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host puppetdb1003.eqiad.wmnet
  • 13:56 jbond@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:55 jbond@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host puppetdb2003.codfw.wmnet
  • 13:55 jbond@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:52 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:51 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 13:46 jbond@cumin2002: START - Cookbook sre.dns.netbox
  • 13:46 jbond@cumin2002: START - Cookbook sre.ganeti.makevm for new host puppetdb2003.codfw.wmnet
  • 13:45 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 13:45 jbond@cumin1001: START - Cookbook sre.ganeti.makevm for new host puppetdb1003.eqiad.wmnet
  • 13:13 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:13 bblack@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add the new pybal IPs at edge-only sites - bblack@cumin1001"
  • 13:12 bblack@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add the new pybal IPs at edge-only sites - bblack@cumin1001"
  • 13:06 bblack@cumin1001: START - Cookbook sre.dns.netbox
  • 12:47 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1023.eqiad.wmnet with OS bullseye
  • 12:43 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:43 bblack@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add rest of eqiad+codfw pybal IPs - bblack@cumin1001"
  • 12:41 bblack@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add rest of eqiad+codfw pybal IPs - bblack@cumin1001"
  • 12:39 bblack@cumin1001: START - Cookbook sre.dns.netbox
  • 12:21 hashar@deploy1002: Finished deploy [gerrit/gerrit@0932557]: wm-patch-demo: do not return runs when there are no wikis | T332474 (duration: 00m 08s)
  • 12:21 hashar@deploy1002: Started deploy [gerrit/gerrit@0932557]: wm-patch-demo: do not return runs when there are no wikis | T332474
  • 11:50 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS bullseye
  • 11:35 hashar@deploy1002: Finished deploy [gerrit/gerrit@c490ae6]: wm-patch-demo: link to other patches, use WARNING to prevent chipset collapsing | T332474 (duration: 00m 08s)
  • 11:35 hashar@deploy1002: Started deploy [gerrit/gerrit@c490ae6]: wm-patch-demo: link to other patches, use WARNING to prevent chipset collapsing | T332474
  • 10:54 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 10:54 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 10:38 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 10:27 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 09:54 effie: pool parse1013-parse1016 to the jobrunner cluster - T329366
  • 09:29 jbond: disable puppet fleet wide to deploy minor puppet change https://gerrit.wikimedia.org/r/c/operations/puppet/+/923353
  • 09:28 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parse1016.eqiad.wmnet with OS buster
  • 09:26 effie: parse1013-parse1016 have neen depooled and removed from the parsoid-php service - T329366
  • 09:26 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parse1014.eqiad.wmnet with OS buster
  • 09:24 jnuche@deploy1002: Installation of scap version "4.52.3" completed for 596 hosts
  • 09:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parse1013.eqiad.wmnet with OS buster
  • 09:23 jnuche@deploy1002: Installing scap version "4.52.3" for 596 hosts
  • 09:13 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 09:13 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 09:08 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host parse1015.eqiad.wmnet with OS buster
  • 08:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse1016.eqiad.wmnet with reason: host reimage
  • 08:56 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse1014.eqiad.wmnet with reason: host reimage
  • 08:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse1013.eqiad.wmnet with reason: host reimage
  • 08:54 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on parse1015.eqiad.wmnet with reason: host reimage
  • 08:52 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1016.eqiad.wmnet with reason: host reimage
  • 08:52 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1015.eqiad.wmnet with reason: host reimage
  • 08:51 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1014.eqiad.wmnet with reason: host reimage
  • 08:51 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1013.eqiad.wmnet with reason: host reimage
  • 08:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host parse1016.eqiad.wmnet with OS buster
  • 08:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host parse1015.eqiad.wmnet with OS buster
  • 08:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host parse1014.eqiad.wmnet with OS buster
  • 08:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host parse1013.eqiad.wmnet with OS buster
  • 08:10 jiji@cumin1001: conftool action : set/pooled=inactive; selector: dc=eqiad,name=parse101[3-6].eqiad.wmnet
  • 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48591 and previous config saved to /var/cache/conftool/dbconfig/20230526-075903-root.json
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48590 and previous config saved to /var/cache/conftool/dbconfig/20230526-075809-root.json
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48589 and previous config saved to /var/cache/conftool/dbconfig/20230526-074358-root.json
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48588 and previous config saved to /var/cache/conftool/dbconfig/20230526-074304-root.json
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48587 and previous config saved to /var/cache/conftool/dbconfig/20230526-072854-root.json
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48586 and previous config saved to /var/cache/conftool/dbconfig/20230526-072759-root.json
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48585 and previous config saved to /var/cache/conftool/dbconfig/20230526-071349-root.json
  • 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48584 and previous config saved to /var/cache/conftool/dbconfig/20230526-071255-root.json
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48583 and previous config saved to /var/cache/conftool/dbconfig/20230526-065844-root.json
  • 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48582 and previous config saved to /var/cache/conftool/dbconfig/20230526-065750-root.json
  • 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48581 and previous config saved to /var/cache/conftool/dbconfig/20230526-064340-root.json
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48580 and previous config saved to /var/cache/conftool/dbconfig/20230526-064245-root.json
  • 06:42 elukey: `apt-get clean` on stat1008 to clean up some space in the root partition
  • 06:36 elukey: `truncate /var/log/kerberos/krb5kdc.log -s 10g` on krb1001 to avoid the root partition to fill up
  • 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48579 and previous config saved to /var/cache/conftool/dbconfig/20230526-062835-root.json
  • 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48578 and previous config saved to /var/cache/conftool/dbconfig/20230526-062741-root.json
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48577 and previous config saved to /var/cache/conftool/dbconfig/20230526-061330-root.json
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48576 and previous config saved to /var/cache/conftool/dbconfig/20230526-061236-root.json
  • 03:51 fab@deploy1002: Finished deploy [airflow-dags/research@77cf676]: (no justification provided) (duration: 00m 17s)
  • 03:51 fab@deploy1002: Started deploy [airflow-dags/research@77cf676]: (no justification provided)

2023-05-25

  • 22:14 zabe@deploy1002: Finished scap: Backport for Replace deprecated Hooks::runWithoutAbort (T335536), BannerRenderer: Make sure the language variant is valid (T337427) (duration: 09m 14s)
  • 22:07 zabe@deploy1002: zabe and ladsgroup: Backport for Replace deprecated Hooks::runWithoutAbort (T335536), BannerRenderer: Make sure the language variant is valid (T337427) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 22:05 zabe@deploy1002: Started scap: Backport for Replace deprecated Hooks::runWithoutAbort (T335536), BannerRenderer: Make sure the language variant is valid (T337427)
  • 21:26 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@77cf676]: (no justification provided) (duration: 00m 08s)
  • 21:25 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@77cf676]: (no justification provided)
  • 20:47 TheresNoTime: close UTC late backport
  • 20:47 samtar@deploy1002: Finished scap: Backport for Manual backport of OOUI change I63293edd62 (tab dialog fix) (T337515) (duration: 08m 34s)
  • 20:40 samtar@deploy1002: samtar and matmarex: Backport for Manual backport of OOUI change I63293edd62 (tab dialog fix) (T337515) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:38 samtar@deploy1002: Started scap: Backport for Manual backport of OOUI change I63293edd62 (tab dialog fix) (T337515)
  • 20:32 samtar@deploy1002: Finished scap: Backport for Use document feature classes to extract A/B test state (T335972) (duration: 10m 58s)
  • 20:22 samtar@deploy1002: jdrewniak and samtar: Backport for Use document feature classes to extract A/B test state (T335972) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:21 samtar@deploy1002: Started scap: Backport for Use document feature classes to extract A/B test state (T335972)
  • 20:13 samtar@deploy1002: Finished scap: Backport for [prod] Configure logging for the CampaignEvents channel (T337365) (duration: 08m 31s)
  • 20:06 samtar@deploy1002: samtar and daimona: Backport for [prod] Configure logging for the CampaignEvents channel (T337365) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 20:05 samtar@deploy1002: Started scap: Backport for [prod] Configure logging for the CampaignEvents channel (T337365)
  • 19:32 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:32 bblack@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add pybal-low-traffic.svc.codfw.wmnet - bblack@cumin1001"
  • 19:31 bblack@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add pybal-low-traffic.svc.codfw.wmnet - bblack@cumin1001"
  • 19:29 bblack@cumin1001: START - Cookbook sre.dns.netbox
  • 19:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48575 and previous config saved to /var/cache/conftool/dbconfig/20230525-190946-root.json
  • 19:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48574 and previous config saved to /var/cache/conftool/dbconfig/20230525-190859-root.json
  • 18:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48573 and previous config saved to /var/cache/conftool/dbconfig/20230525-185441-root.json
  • 18:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48572 and previous config saved to /var/cache/conftool/dbconfig/20230525-185354-root.json
  • 18:43 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@6b27584]: (no justification provided) (duration: 00m 19s)
  • 18:43 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@6b27584]: (no justification provided)
  • 18:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48571 and previous config saved to /var/cache/conftool/dbconfig/20230525-183937-root.json
  • 18:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48570 and previous config saved to /var/cache/conftool/dbconfig/20230525-183849-root.json
  • 18:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48568 and previous config saved to /var/cache/conftool/dbconfig/20230525-182432-root.json
  • 18:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48567 and previous config saved to /var/cache/conftool/dbconfig/20230525-182345-root.json
  • 18:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48566 and previous config saved to /var/cache/conftool/dbconfig/20230525-180927-root.json
  • 18:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48565 and previous config saved to /var/cache/conftool/dbconfig/20230525-180840-root.json
  • 17:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48564 and previous config saved to /var/cache/conftool/dbconfig/20230525-175423-root.json
  • 17:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48563 and previous config saved to /var/cache/conftool/dbconfig/20230525-175335-root.json
  • 17:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48562 and previous config saved to /var/cache/conftool/dbconfig/20230525-173918-root.json
  • 17:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48561 and previous config saved to /var/cache/conftool/dbconfig/20230525-173831-root.json
  • 17:27 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:27 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS entires for migration IPs eqiad row E F switches. - cmooney@cumin1001"
  • 17:26 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS entires for migration IPs eqiad row E F switches. - cmooney@cumin1001"
  • 17:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48559 and previous config saved to /var/cache/conftool/dbconfig/20230525-172413-root.json
  • 17:23 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 17:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48558 and previous config saved to /var/cache/conftool/dbconfig/20230525-172326-root.json
  • 17:15 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:14 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:14 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:14 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:13 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:12 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:09 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 17:08 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 17:07 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 17:06 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 17:05 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 17:03 bd808@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 16:39 topranks: adding outbound shaper config on eqsin to codfw transport cct (T328313)
  • 16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T336886)', diff saved to https://phabricator.wikimedia.org/P48557 and previous config saved to /var/cache/conftool/dbconfig/20230525-163657-ladsgroup.json
  • 16:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P48556 and previous config saved to /var/cache/conftool/dbconfig/20230525-162151-ladsgroup.json
  • 16:18 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:18 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:14 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:14 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:11 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e[1,3]-eqiad.mgmt,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e3-eqiad uplinks to spine
  • 16:11 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e[1,3]-eqiad.mgmt,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e3-eqiad uplinks to spine
  • 16:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on gerrit2002.wikimedia.org with reason: maintenance
  • 16:07 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on gerrit2002.wikimedia.org with reason: maintenance
  • 16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P48555 and previous config saved to /var/cache/conftool/dbconfig/20230525-160645-ladsgroup.json
  • 16:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gerrit2002.wikimedia.org with OS bullseye
  • 15:57 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e2-eqiad.mgmt,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e2-eqiad uplink from lsw1-f1 to ssw1-f1
  • 15:56 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e2-eqiad.mgmt,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e2-eqiad uplink from lsw1-f1 to ssw1-f1
  • 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T336886)', diff saved to https://phabricator.wikimedia.org/P48553 and previous config saved to /var/cache/conftool/dbconfig/20230525-155139-ladsgroup.json
  • 15:49 dancy@deploy1002: Finished deploy [integration/docroot@dac2b70]: Updated Scap URLs (duration: 00m 07s)
  • 15:49 dancy@deploy1002: Started deploy [integration/docroot@dac2b70]: Updated Scap URLs
  • 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T336886)', diff saved to and previous config saved to /var/cache/conftool/dbconfig/20230525-154927-ladsgroup.json
  • 15:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 15:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T336886)', diff saved to and previous config saved to /var/cache/conftool/dbconfig/20230525-154906-ladsgroup.json
  • 15:44 dancy: dancy@deploy1002 Updated scap URLs on doc.wikimedia.org
  • 15:43 dancy@deploy1002: Finished deploy [integration/docroot@78e6f40]: (no justification provided) (duration: 00m 10s)
  • 15:43 dancy@deploy1002: Started deploy [integration/docroot@78e6f40]: (no justification provided)
  • 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P48552 and previous config saved to /var/cache/conftool/dbconfig/20230525-153359-ladsgroup.json
  • 15:33 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e[1-2]-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr1-eqiad link to ssw1-e1-eqiad
  • 15:33 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e[1-2]-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr1-eqiad link to ssw1-e1-eqiad
  • 15:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
  • 15:30 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
  • 15:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1022.eqiad.wmnet with OS bullseye
  • 15:27 kartik@deploy1002: Finished scap: Backport for Show Contribute menu item in main menu when Special:Contribute is enabled (T336838) (duration: 07m 01s)
  • 15:22 kartik@deploy1002: kartik: Backport for Show Contribute menu item in main menu when Special:Contribute is enabled (T336838) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 15:21 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cr2-eqiad,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr2-eqiad link to ssw1-e1-eqiad
  • 15:20 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cr2-eqiad,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr2-eqiad link to ssw1-e1-eqiad
  • 15:20 kartik@deploy1002: Started scap: Backport for Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)
  • 15:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P48551 and previous config saved to /var/cache/conftool/dbconfig/20230525-151853-ladsgroup.json
  • 15:18 kartik@deploy1002: Finished scap: Backport for Show Contribute menu item in main menu when Special:Contribute is enabled (T336838), Show Contribute menu item in main menu when Special:Contribute is enabled (T336838) (duration: 68m 07s)
  • 15:14 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host gerrit2002.wikimedia.org with OS bullseye
  • 15:10 topranks: Migrating cr1-eqiad downlink to row E/F from lsw1-e1-eqiad et-0/0/48 to ssw1-e1-eqiad et-0/0/31
  • 15:10 mutante: gerrit-replica.wikimedia.org - gerrit2002 - reimaging - scheduled maintenance
  • 15:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2002.wikimedia.org with reason: maintenance
  • 15:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2002.wikimedia.org with reason: maintenance
  • 15:04 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cr1-eqiad,lsw1-e1-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr1-eqiad link to ssw1-e1-eqiad
  • 15:04 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cr1-eqiad,lsw1-e1-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr1-eqiad link to ssw1-e1-eqiad
  • 15:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T336886)', diff saved to https://phabricator.wikimedia.org/P48550 and previous config saved to /var/cache/conftool/dbconfig/20230525-150347-ladsgroup.json
  • 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T336886)', diff saved to https://phabricator.wikimedia.org/P48549 and previous config saved to /var/cache/conftool/dbconfig/20230525-145857-ladsgroup.json
  • 14:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 14:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T336886)', diff saved to https://phabricator.wikimedia.org/P48548 and previous config saved to /var/cache/conftool/dbconfig/20230525-145836-ladsgroup.json
  • 14:54 marostegui: Wikireplicas are lagging behind for the following sections: s1, s2, s5, s7 T337446
  • 14:54 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P48547 and previous config saved to /var/cache/conftool/dbconfig/20230525-144330-ladsgroup.json
  • 14:32 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bullseye
  • 14:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['dbproxy1026']
  • 14:29 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dbproxy1027']
  • 14:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1027']
  • 14:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1026']
  • 14:28 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dbproxy1025']
  • 14:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1024']
  • 14:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P48546 and previous config saved to /var/cache/conftool/dbconfig/20230525-142824-ladsgroup.json
  • 14:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1025']
  • 14:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1024']
  • 14:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1023']
  • 14:28 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dbproxy1022']
  • 14:27 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1022']
  • 14:27 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1023']
  • 14:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1023']
  • 14:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1022']
  • 14:27 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1022']
  • 14:26 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1023']
  • 14:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1022']
  • 14:26 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1022']
  • 14:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1022']
  • 14:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1022']
  • 14:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1026']
  • 14:22 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=videoscaler
  • 14:22 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=jobrunner
  • 14:22 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072']
  • 14:22 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=appserver
  • 14:21 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:21 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,dc=eqiad
  • 14:21 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=appserver,dc=eqiad
  • 14:20 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 14:14 bblack@cumin1001: conftool action : set/pooled=yes; selector: service=parsoid-php,dc=eqiad
  • 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T336886)', diff saved to https://phabricator.wikimedia.org/P48545 and previous config saved to /var/cache/conftool/dbconfig/20230525-141318-ladsgroup.json
  • 14:12 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:11 kartik@deploy1002: kartik: Backport for Show Contribute menu item in main menu when Special:Contribute is enabled (T336838), Show Contribute menu item in main menu when Special:Contribute is enabled (T336838) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 14:11 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:10 kartik@deploy1002: Started scap: Backport for Show Contribute menu item in main menu when Special:Contribute is enabled (T336838), Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)
  • 14:09 volans@cumin1001: END (PASS) - Cookbook sre.puppetboard.restart-reboot (exit_code=0) rolling restart_daemons on P{puppetboard2002.codfw.wmnet} and (A:puppetboard)
  • 14:09 volans@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet. on all recursors
  • 14:08 volans@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet. on all recursors
  • 14:08 volans@cumin1001: START - Cookbook sre.puppetboard.restart-reboot rolling restart_daemons on P{puppetboard2002.codfw.wmnet} and (A:puppetboard)
  • 14:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T336886)', diff saved to https://phabricator.wikimedia.org/P48544 and previous config saved to /var/cache/conftool/dbconfig/20230525-140822-ladsgroup.json
  • 14:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 14:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 14:08 kartik@deploy1002: Finished scap: Backport for Show Contribute menu item in main menu when Special:Contribute is enabled (T336838), Show Contribute menu item in main menu when Special:Contribute is enabled (T336838) (duration: 15m 56s)
  • 13:53 kartik@deploy1002: kartik: Backport for Show Contribute menu item in main menu when Special:Contribute is enabled (T336838), Show Contribute menu item in main menu when Special:Contribute is enabled (T336838) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 13:52 kartik@deploy1002: Started scap: Backport for Show Contribute menu item in main menu when Special:Contribute is enabled (T336838), Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)
  • 13:46 urbanecm@deploy1002: Finished scap: Backport for Change maint script to do work via jobs (duration: 07m 42s)
  • 13:44 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:44 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:38 urbanecm@deploy1002: Started scap: Backport for Change maint script to do work via jobs
  • 13:28 urbanecm@deploy1002: Finished scap: Backport for Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436), Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436) (duration: 09m 06s)
  • 13:24 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:20 urbanecm@deploy1002: urbanecm and matmarex: Backport for Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436), Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 13:19 urbanecm@deploy1002: Started scap: Backport for Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436), Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool sanitarium masters for s1, s5, s2, s7', diff saved to https://phabricator.wikimedia.org/P48538 and previous config saved to /var/cache/conftool/dbconfig/20230525-121012-root.json
  • 11:56 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 11:56 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 11:54 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 11:54 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 11:52 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 11:51 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 11:49 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 11:49 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 11:43 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 11:43 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 11:40 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 11:40 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 11:39 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48537 and previous config saved to /var/cache/conftool/dbconfig/20230525-113914-root.json
  • 11:38 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 11:38 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 11:38 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 11:31 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 11:31 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 11:30 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 11:30 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 11:28 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 11:27 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 11:26 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 11:26 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 11:25 cgoubert@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 11:25 cgoubert@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 11:25 cgoubert@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 11:25 cgoubert@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48536 and previous config saved to /var/cache/conftool/dbconfig/20230525-112409-root.json
  • 11:22 cgoubert@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 11:22 cgoubert@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 11:21 cgoubert@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 11:20 cgoubert@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 11:15 jbond: update udplog on mwlog server
  • 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48535 and previous config saved to /var/cache/conftool/dbconfig/20230525-110948-root.json
  • 11:09 jbond: upload udplog_1.10_amd64.deb
  • 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48534 and previous config saved to /var/cache/conftool/dbconfig/20230525-110905-root.json
  • 11:05 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:04 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:03 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:03 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:54 klausman@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48533 and previous config saved to /var/cache/conftool/dbconfig/20230525-105443-root.json
  • 10:54 klausman@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 10:54 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: sync
  • 10:54 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: sync
  • 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48532 and previous config saved to /var/cache/conftool/dbconfig/20230525-105400-root.json
  • 10:53 klausman@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 10:52 klausman@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 10:49 klausman@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 10:49 klausman@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 10:48 klausman@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 10:41 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol2005-dev.wikimedia.org
  • 10:41 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:41 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2005-dev.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
  • 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48531 and previous config saved to /var/cache/conftool/dbconfig/20230525-103939-root.json
  • 10:39 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2005-dev.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48530 and previous config saved to /var/cache/conftool/dbconfig/20230525-103855-root.json
  • 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48529 and previous config saved to /var/cache/conftool/dbconfig/20230525-103445-root.json
  • 10:32 aborrero@cumin2002: START - Cookbook sre.dns.netbox
  • 10:24 aborrero@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudcontrol2005-dev.wikimedia.org
  • 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48528 and previous config saved to /var/cache/conftool/dbconfig/20230525-102434-root.json
  • 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48527 and previous config saved to /var/cache/conftool/dbconfig/20230525-102351-root.json
  • 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48526 and previous config saved to /var/cache/conftool/dbconfig/20230525-101940-root.json
  • 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48525 and previous config saved to /var/cache/conftool/dbconfig/20230525-100927-root.json
  • 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48524 and previous config saved to /var/cache/conftool/dbconfig/20230525-100846-root.json
  • 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48523 and previous config saved to /var/cache/conftool/dbconfig/20230525-100436-root.json
  • 10:00 kart_: Updated cxserver to 2023-05-25-093623-production (config: language pairs transform fix + T331201)
  • 09:57 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 09:56 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48522 and previous config saved to /var/cache/conftool/dbconfig/20230525-095423-root.json
  • 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48521 and previous config saved to /var/cache/conftool/dbconfig/20230525-095341-root.json
  • 09:51 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 09:51 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48520 and previous config saved to /var/cache/conftool/dbconfig/20230525-094931-root.json
  • 09:48 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 09:48 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48519 and previous config saved to /var/cache/conftool/dbconfig/20230525-093918-root.json
  • 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48518 and previous config saved to /var/cache/conftool/dbconfig/20230525-093426-root.json
  • 09:32 apergos: running from dumpsdata1004 via ariel login screen session, as root, rsync with bwlimit 100000 to dumpsdata1006, copying all public xml dumps data
  • 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48517 and previous config saved to /var/cache/conftool/dbconfig/20230525-092413-root.json
  • 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48516 and previous config saved to /var/cache/conftool/dbconfig/20230525-091922-root.json
  • 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2179', diff saved to https://phabricator.wikimedia.org/P48515 and previous config saved to /var/cache/conftool/dbconfig/20230525-091132-root.json
  • 09:10 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
  • 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48514 and previous config saved to /var/cache/conftool/dbconfig/20230525-090417-root.json
  • 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48513 and previous config saved to /var/cache/conftool/dbconfig/20230525-084912-root.json
  • 08:32 elukey: revoke kafka_mirror_maker TLS cert (cergen based), remove old cergen certs from puppet private - T337248
  • 07:52 matthiasmullie: UTC morning backports done
  • 07:51 mlitn@deploy1002: Finished scap: Backport for Change maint script to do work via jobs (T322872) (duration: 16m 12s)
  • 07:37 mlitn@deploy1002: mlitn: Backport for Change maint script to do work via jobs (T322872) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 07:35 mlitn@deploy1002: Started scap: Backport for Change maint script to do work via jobs (T322872)
  • 07:18 mlitn@deploy1002: Finished scap: Backport for [WikibaseMediaInfo] Add 'main subject of' property (duration: 14m 02s)
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1158', diff saved to https://phabricator.wikimedia.org/P48511 and previous config saved to /var/cache/conftool/dbconfig/20230525-071719-root.json
  • 07:10 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 07:06 mlitn@deploy1002: mlitn: Backport for [WikibaseMediaInfo] Add 'main subject of' property synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 07:04 mlitn@deploy1002: Started scap: Backport for [WikibaseMediaInfo] Add 'main subject of' property
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1196', diff saved to https://phabricator.wikimedia.org/P48509 and previous config saved to /var/cache/conftool/dbconfig/20230525-064418-root.json
  • 06:09 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1156', diff saved to https://phabricator.wikimedia.org/P48506 and previous config saved to /var/cache/conftool/dbconfig/20230525-055734-root.json
  • 05:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 9 hosts with reason: T337446
  • 05:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 9 hosts with reason: T337446
  • 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161', diff saved to https://phabricator.wikimedia.org/P48504 and previous config saved to /var/cache/conftool/dbconfig/20230525-055236-root.json
  • 05:48 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 05:48 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 05:41 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 05:41 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 05:36 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 05:36 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2110', diff saved to https://phabricator.wikimedia.org/P48503 and previous config saved to /var/cache/conftool/dbconfig/20230525-051923-root.json
  • 02:14 eileen: civicrm upgraded from b8cab6f6 to 415aa7e5
  • 02:14 eileen: civicrm upgraded from b8cab6f6 to 415aa7e5

2023-05-24

  • 21:18 urbanecm@deploy1002: Finished scap: Backport for [Growth] Deploy Personalized praise to pilot wikis with notifications (T334630) (duration: 09m 40s)
  • 21:10 urbanecm@deploy1002: urbanecm: Backport for [Growth] Deploy Personalized praise to pilot wikis with notifications (T334630) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 21:08 urbanecm@deploy1002: Started scap: Backport for [Growth] Deploy Personalized praise to pilot wikis with notifications (T334630)
  • 20:55 samtar@deploy1002: Finished scap: Backport for ipInfo.hooks: Use wgRelevantUserName (T337373) (duration: 08m 15s)
  • 20:48 samtar@deploy1002: samtar: Backport for ipInfo.hooks: Use wgRelevantUserName (T337373) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 20:47 samtar@deploy1002: Started scap: Backport for ipInfo.hooks: Use wgRelevantUserName (T337373)
  • 20:25 samtar@deploy1002: Finished scap: Backport for ipInfo.hooks: Use wgRelevantUserName (T337373) (duration: 08m 31s)
  • 20:18 samtar@deploy1002: samtar: Backport for ipInfo.hooks: Use wgRelevantUserName (T337373) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:16 samtar@deploy1002: Started scap: Backport for ipInfo.hooks: Use wgRelevantUserName (T337373)
  • 20:15 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:08 ayounsi@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:45 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:45 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1025.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1024.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:12 demon@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.9 refs T330216 (duration: 06m 00s)
  • 19:06 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.9 refs T330216
  • 18:55 demon@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.10 refs T330216 (duration: 06m 00s)
  • 18:49 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.10 refs T330216
  • 18:48 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1025.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:48 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1024.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:47 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1023.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:41 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1149.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:32 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1149.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:22 ejegg: civicrm upgraded from 4251dfa1 to b8cab6f6
  • 16:54 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@1603ecf]: Deploying T336800 on platform_eng Airflow instance (duration: 00m 09s)
  • 16:54 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@1603ecf]: Deploying T336800 on platform_eng Airflow instance
  • 16:05 elukey: move kafka mirror on kafka main brokers to PKI - T337248
  • 16:01 urbanecm@deploy1002: Finished scap: Backport for Personalized praise: Add instrumentation (T325117), Personalized praise: Add instrumentation (T325117) (duration: 08m 33s)
  • 15:56 elukey: move kafka mirror on kafka jumbo brokers to PKI - T337248
  • 15:54 urbanecm@deploy1002: urbanecm: Backport for Personalized praise: Add instrumentation (T325117), Personalized praise: Add instrumentation (T325117) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 15:52 urbanecm@deploy1002: Started scap: Backport for Personalized praise: Add instrumentation (T325117), Personalized praise: Add instrumentation (T325117)
  • 15:47 ejegg: payments-wiki upgraded from e02bc7c5 to c2f9f8b5
  • 15:39 aqu@deploy1002: Finished deploy [analytics/refinery@24ff363] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@24ff363] (duration: 01m 35s)
  • 15:38 ejegg: standalone SmashPig upgraded from 5460dbe2 to db23b998
  • 15:37 aqu@deploy1002: Started deploy [analytics/refinery@24ff363] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@24ff363]
  • 15:37 aqu@deploy1002: Finished deploy [analytics/refinery@24ff363] (thin): Regular analytics weekly train THIN [analytics/refinery@24ff363] (duration: 00m 04s)
  • 15:37 aqu@deploy1002: Started deploy [analytics/refinery@24ff363] (thin): Regular analytics weekly train THIN [analytics/refinery@24ff363]
  • 15:35 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1023.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:32 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:31 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:31 aqu@deploy1002: Finished deploy [analytics/refinery@24ff363]: Regular analytics weekly train [analytics/refinery@24ff363] (duration: 06m 13s)
  • 15:31 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:30 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:26 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:26 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:25 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:25 aqu@deploy1002: Started deploy [analytics/refinery@24ff363]: Regular analytics weekly train [analytics/refinery@24ff363]
  • 15:24 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:23 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:23 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:22 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:22 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:21 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:18 aqu: analytics-refinery, about to deploy
  • 15:09 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:30 volans@cumin2002: END (PASS) - Cookbook sre.puppetboard.restart-reboot (exit_code=0) rolling restart_daemons on P{puppetboard2002.codfw.wmnet} and (A:puppetboard)
  • 14:30 volans@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet. on all recursors
  • 14:30 volans@cumin2002: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet. on all recursors
  • 14:29 volans@cumin2002: START - Cookbook sre.puppetboard.restart-reboot rolling restart_daemons on P{puppetboard2002.codfw.wmnet} and (A:puppetboard)
  • 14:26 volans@cumin2002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 14:26 volans@cumin2002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 14:19 urbanecm@deploy1002: Finished scap: Backport for Enable DiscussionTools newtopictool on fiwiki (T317375) (duration: 12m 11s)
  • 14:13 hashar@deploy1002: Finished deploy [gerrit/gerrit@2d719f3]: wm-patch-demo: initial implementation | T332474 (duration: 00m 07s)
  • 14:13 hashar@deploy1002: Started deploy [gerrit/gerrit@2d719f3]: wm-patch-demo: initial implementation | T332474
  • 14:08 urbanecm@deploy1002: urbanecm and matmarex: Backport for Enable DiscussionTools newtopictool on fiwiki (T317375) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 14:06 urbanecm@deploy1002: Started scap: Backport for Enable DiscussionTools newtopictool on fiwiki (T317375)
  • 14:06 urbanecm@deploy1002: Finished scap: Backport for MultiPaneDialog: remove attribute hidden instead of class (T337256), Add maint script to opt out active users from the new topic tool (T317375), Define $maintClass in maintenance script for compatibility (T317375), NewTopicOptOutActiveUsers: Skip bot users etc. (T317375) (duration: 09m 21s)
  • 13:58 urbanecm@deploy1002: matmarex and urbanecm and sgimeno: Backport for MultiPaneDialog: remove attribute hidden instead of class (T337256), Add maint script to opt out active users from the new topic tool (T317375), Define $maintClass in maintenance script for compatibility (T317375), NewTopicOptOutActiveUsers: Skip bot users etc. (T317375) synced t
  • 13:56 urbanecm@deploy1002: Started scap: Backport for MultiPaneDialog: remove attribute hidden instead of class (T337256), Add maint script to opt out active users from the new topic tool (T317375), Define $maintClass in maintenance script for compatibility (T317375), NewTopicOptOutActiveUsers: Skip bot users etc. (T317375)
  • 13:55 urbanecm@deploy1002: Finished scap: Backport for [Growth] Add mediawiki.mentor_dashboard.interaction (T325117) (duration: 07m 06s)
  • 13:48 urbanecm@deploy1002: Started scap: Backport for [Growth] Add mediawiki.mentor_dashboard.interaction (T325117)
  • 13:36 samtar@deploy1002: Finished scap: Backport for Enable Kartographer Nearby on remaining wikis (T336834) (duration: 08m 04s)
  • 13:29 samtar@deploy1002: samtar and wmde-fisch: Backport for Enable Kartographer Nearby on remaining wikis (T336834) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:28 samtar@deploy1002: Started scap: Backport for Enable Kartographer Nearby on remaining wikis (T336834)
  • 13:26 samtar@deploy1002: Finished scap: Backport for [cirrus] Fix typo in config var (duration: 10m 15s)
  • 13:17 samtar@deploy1002: samtar and dcausse: Backport for [cirrus] Fix typo in config var synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 13:16 samtar@deploy1002: Started scap: Backport for [cirrus] Fix typo in config var
  • 13:14 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:14 samtar@deploy1002: Finished scap: Backport for arclamp: switch redis server to arclamp1001 (T327277) (duration: 07m 53s)
  • 13:07 samtar@deploy1002: herron and samtar: Backport for arclamp: switch redis server to arclamp1001 (T327277) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:07 xSavitar: tools.codesearch Deployed https://gerrit.wikimedia.org/r/c/labs/codesearch/+/909258 and also restarted tool instances to core search backend was dead.
  • 13:06 samtar@deploy1002: Started scap: Backport for arclamp: switch redis server to arclamp1001 (T327277)
  • 12:55 TheresNoTime: `[samtar@mwmaint1002 ~]$ mwscript findBadBlobs --wiki nowiki --revisions 5227369 --mark T337392` T337392
  • 12:47 tgr_: running changeWikiConfig.php on Growth pilot wikis for T337348
  • 10:56 akosiaris@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka main-codfw cluster: Reboot kafka nodes
  • 09:42 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2448.codfw.wmnet
  • 09:42 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw2448.codfw.wmnet
  • 09:04 dcausse@deploy1002: Finished deploy [airflow-dags/search@c08e884]: search: build and use a smaller cirrus index dataset (duration: 00m 17s)
  • 09:04 dcausse@deploy1002: Started deploy [airflow-dags/search@c08e884]: search: build and use a smaller cirrus index dataset
  • 08:52 claime: repooling mw2248.codfw.wmnet - T334429
  • 08:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:51 akosiaris@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka main-codfw cluster: Reboot kafka nodes
  • 08:50 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
  • 08:49 marostegui: Stop mariadb on db1154 (sanitarium) there will be lag on clouddb* hosts
  • 08:36 urbanecm@deploy1002: Finished scap: Backport for Migrate GrowthExperiments config to its own file (T308932) (duration: 07m 20s)
  • 08:28 urbanecm@deploy1002: Started scap: Backport for Migrate GrowthExperiments config to its own file (T308932)
  • 07:42 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
  • 07:42 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
  • 07:41 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 07:40 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 07:33 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 07:33 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 07:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 07:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 07:11 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 07:11 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 07:02 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 07:02 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 05:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 136106
  • 05:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 136106
  • 01:19 mutante: contint2001 - jenkins started again
  • 01:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
  • 01:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
  • 00:45 mutante: short maintenance on main contint server (jenkins)
  • 00:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
  • 00:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
  • 00:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
  • 00:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
  • 00:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on contint2001.wikimedia.org with reason: maintenance
  • 00:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on contint2001.wikimedia.org with reason: maintenance
  • 00:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on contint2002.wikimedia.org with reason: maintenance
  • 00:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on contint2002.wikimedia.org with reason: maintenance
  • 00:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on contint1002.wikimedia.org with reason: maintenance
  • 00:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on contint1002.wikimedia.org with reason: maintenance

2023-05-23

  • 23:52 mutante: releases1002 - jenkins service running again, this is the active host behind releases-jenkins.wikimedia.org - maintenance for releases* done
  • 23:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on releases1002.eqiad.wmnet with reason: maintenance
  • 23:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on releases1002.eqiad.wmnet with reason: maintenance
  • 23:41 mutante: releases1002 (releases.wikimedia.org) stopping jenkins for maintenance
  • 23:30 mutante: contint*, releases* - maintenance - changing UID of jenkins user - jenkins will be stopped for a little bit, releases-jenkins is first though - T324659
  • 22:00 eileen: civicrm upgraded from 11538e23 to 4251dfa1
  • 21:26 ejegg: payments-wiki upgraded from a7567c6a to e02bc7c5
  • 21:06 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 21:06 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 21:02 TheresNoTime: close UTC late backport window
  • 21:01 samtar@deploy1002: Finished scap: Backport for Turn on the A/B test for testwiki (T336969) (duration: 11m 47s)
  • 21:01 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 21:01 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 21:00 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 21:00 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:59 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:59 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:51 samtar@deploy1002: ksarabia and samtar: Backport for Turn on the A/B test for testwiki (T336969) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 20:50 samtar@deploy1002: Started scap: Backport for Turn on the A/B test for testwiki (T336969)
  • 20:48 samtar@deploy1002: Finished scap: Backport for Remove centraluserid dependency in ABRequirement.php (T336969), Remove centraluserid dependency in ABRequirement.php (T336969) (duration: 11m 20s)
  • 20:38 samtar@deploy1002: samtar: Backport for Remove centraluserid dependency in ABRequirement.php (T336969), Remove centraluserid dependency in ABRequirement.php (T336969) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:37 ejegg: civicrm upgraded from efe25c9b to 11538e23
  • 20:37 samtar@deploy1002: Started scap: Backport for Remove centraluserid dependency in ABRequirement.php (T336969), Remove centraluserid dependency in ABRequirement.php (T336969)
  • 20:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:10 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:10 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:57 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:50 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:50 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1023.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:50 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:50 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:46 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1023.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:42 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:42 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:42 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:41 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:41 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt dbproxy102{2..7} - jclark@cumin1001"
  • 19:39 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt dbproxy102{2..7} - jclark@cumin1001"
  • 19:36 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1027
  • 19:35 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1027
  • 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1026
  • 19:35 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1026
  • 19:34 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1025
  • 19:33 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1025
  • 19:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:31 jclark@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host dbproxy1025
  • 19:30 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1025
  • 19:30 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1024
  • 19:30 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1024
  • 19:27 jclark@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host dbproxy1024
  • 19:27 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1024
  • 19:27 jclark@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host dbproxy1024
  • 19:27 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1024
  • 19:27 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1023
  • 19:25 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1023
  • 19:25 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1022
  • 19:25 demon@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.10 refs T330216
  • 19:24 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1022
  • 19:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:18 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:18 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:10 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:09 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 18:29 inflatador: bking@cumin1001 rolling restart of codfw wdqs public hosts T337327
  • 18:26 ryankemper: [WDQS] T337327 Deployed new, hopefully-working rule after addressing previous syntax error (unescaped `"`). See `/srv/private` commit `6e2f5ab19427902994bb9d03d28277252f021474`
  • 18:16 ryankemper: [WDQS] Rolled back requestctl rule
  • 18:12 ryankemper: [WDQS] T337327 New rule in place to ban potential source of WDQS codfw outage. Rolling restart will be done in a couple minutes to [attempt to] restore service availability
  • 17:05 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 17:05 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 17:03 sbassett: Deployed updated security mitigation for T336027 and T333140
  • 17:00 akosiaris@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka main-eqiad cluster: Reboot kafka nodes
  • 16:58 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:58 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:50 sbassett: Deployed updated security mitigation for T336027, part 2
  • 16:50 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:49 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:43 cmooney@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Homer Release v0.6.2 with updated wmf-plugin - cmooney@cumin1001
  • 16:43 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:43 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:42 sbassett: Deployed updated security mitigation for T336027
  • 16:41 cmooney@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Homer Release v0.6.2 with updated wmf-plugin - cmooney@cumin1001
  • 16:31 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: EventStreamConfig - Rename page content change enrich error stream to match convention - T336656 (duration: 06m 58s)
  • 16:22 sukhe@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS maintenance in eqiad, blocking deploys T322937 (duration: 36m 02s)
  • 15:56 topranks: moving lvs1018 connection to rack E1 from lsw1-e1-eqiad to ssw1-e1-eqiad T322937
  • 15:46 sukhe@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS maintenance in eqiad, blocking deploys T322937
  • 15:46 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:45 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:45 sukhe: stop pybal on lvs1018: T322937
  • 15:38 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host releases2003.codfw.wmnet with OS bullseye
  • 15:30 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:24 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on releases2003.codfw.wmnet with reason: host reimage
  • 15:22 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:22 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:22 jayme@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:21 jayme@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:21 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 15:21 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:21 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 15:21 jayme@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:21 jayme@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 15:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:20 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on releases2003.codfw.wmnet with reason: host reimage
  • 15:20 jayme@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:19 jayme@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 15:16 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:14 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:14 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:03 eoghan@cumin1001: START - Cookbook sre.hosts.reimage for host releases2003.codfw.wmnet with OS bullseye
  • 15:02 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host releases1003.eqiad.wmnet with OS bullseye
  • 15:00 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:00 akosiaris@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka main-eqiad cluster: Reboot kafka nodes
  • 14:58 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:58 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 14:57 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:57 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:51 moritzm: removed imagemagick 8:6.9.10.23+dfsg-2.1+deb10u1+wmf1 from apt.wikimedia.org/buster-wikimedia now that the Thumbor spec tests have been upgraded to match latest patches
  • 14:49 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on releases1003.eqiad.wmnet with reason: host reimage
  • 14:46 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on releases1003.eqiad.wmnet with reason: host reimage
  • 14:36 eoghan@cumin1001: START - Cookbook sre.hosts.reimage for host releases1003.eqiad.wmnet with OS bullseye
  • 14:33 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:30 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 14:05 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts kafkamon2002.codfw.wmnet
  • 14:05 herron@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:05 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:05 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
  • 14:04 eoghan@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host releases2003.codfw.wmnet
  • 14:04 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM releases2003.codfw.wmnet - eoghan@cumin1001"
  • 14:04 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
  • 14:03 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM releases2003.codfw.wmnet - eoghan@cumin1001"
  • 14:02 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) releases2003.codfw.wmnet on all recursors
  • 14:02 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache releases2003.codfw.wmnet on all recursors
  • 14:02 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:02 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM releases2003.codfw.wmnet - eoghan@cumin1001"
  • 14:01 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM releases2003.codfw.wmnet - eoghan@cumin1001"
  • 14:01 herron@cumin1001: START - Cookbook sre.dns.netbox
  • 14:00 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 13:57 eoghan@cumin1001: START - Cookbook sre.dns.netbox
  • 13:57 eoghan@cumin1001: START - Cookbook sre.ganeti.makevm for new host releases2003.codfw.wmnet
  • 13:56 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafkamon2002.codfw.wmnet
  • 13:56 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafkamon1002.eqiad.wmnet
  • 13:55 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:55 herron@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafkamon1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - herron@cumin1001"
  • 13:54 herron@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafkamon1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - herron@cumin1001"
  • 13:50 herron@cumin1001: START - Cookbook sre.dns.netbox
  • 13:50 eoghan@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host releases1003.eqiad.wmnet
  • 13:50 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM releases1003.eqiad.wmnet - eoghan@cumin1001"
  • 13:47 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM releases1003.eqiad.wmnet - eoghan@cumin1001"
  • 13:46 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) releases1003.eqiad.wmnet on all recursors
  • 13:46 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache releases1003.eqiad.wmnet on all recursors
  • 13:46 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:46 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM releases1003.eqiad.wmnet - eoghan@cumin1001"
  • 13:46 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafkamon1002.eqiad.wmnet
  • 13:45 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM releases1003.eqiad.wmnet - eoghan@cumin1001"
  • 13:45 hoo@deploy1002: Finished scap: Backport for Restore targets declarations temporarily (T336956), Restore targets declarations temporarily (T336956) (duration: 12m 49s)
  • 13:44 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 13:44 elukey@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 13:43 eoghan@cumin1001: START - Cookbook sre.dns.netbox
  • 13:43 eoghan@cumin1001: START - Cookbook sre.ganeti.makevm for new host releases1003.eqiad.wmnet
  • 13:33 hoo@deploy1002: hoo: Backport for Restore targets declarations temporarily (T336956), Restore targets declarations temporarily (T336956) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 13:32 hoo@deploy1002: Started scap: Backport for Restore targets declarations temporarily (T336956), Restore targets declarations temporarily (T336956)
  • 13:11 akosiaris@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons.
  • 12:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 11:56 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 11:56 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 11:55 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 11:55 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 11:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 11:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 10:40 akosiaris@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons.
  • 10:29 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1011.eqiad.wmnet
  • 10:21 akosiaris: reboot rdb1011 for kernel upgrades. ORES in codfw will have a 5m downtime. Other things that might be impacted (but won't): changeprop/cpjobqueue/api-gateway/docker-registry/filebackend.php
  • 10:21 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1011.eqiad.wmnet
  • 10:13 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2009.codfw.wmnet
  • 10:10 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-master1001.eqiad.wmnet
  • 10:07 akosiaris: reboot rdb2009 for kernel upgrades. ORES in codfw will have a 5m downtime. Other things that might be impacted (but won't): changeprop/cpjobqueue/api-gateway/docker-registry/filebackend.php
  • 10:05 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2009.codfw.wmnet
  • 10:02 stevemunene@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-master1001.eqiad.wmnet
  • 09:59 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-master1002.eqiad.wmnet
  • 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48493 and previous config saved to /var/cache/conftool/dbconfig/20230523-095720-root.json
  • 09:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 09:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 09:55 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 09:55 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 09:51 stevemunene@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-master1002.eqiad.wmnet
  • 09:50 stevemunene: reboot an-test-master1002.eqiad.wmnet December 2022 Buster reboots T325132
  • 09:49 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-worker1003.eqiad.wmnet
  • 09:42 stevemunene@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-worker1003.eqiad.wmnet
  • 09:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48492 and previous config saved to /var/cache/conftool/dbconfig/20230523-094216-root.json
  • 09:42 stevemunene: reboot an-test-worker1003.eqiad.wmnet December 2022 Buster reboots T325132
  • 09:41 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-coord1001.eqiad.wmnet
  • 09:34 stevemunene@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-coord1001.eqiad.wmnet
  • 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48491 and previous config saved to /var/cache/conftool/dbconfig/20230523-092711-root.json
  • 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48490 and previous config saved to /var/cache/conftool/dbconfig/20230523-091207-root.json
  • 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48489 and previous config saved to /var/cache/conftool/dbconfig/20230523-085702-root.json
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48488 and previous config saved to /var/cache/conftool/dbconfig/20230523-085246-root.json
  • 08:44 hashar@deploy1002: Finished deploy [gerrit/gerrit@69bc27c]: wm-zuul-status: show reload immediately | T214068 (duration: 00m 07s)
  • 08:44 hashar@deploy1002: Started deploy [gerrit/gerrit@69bc27c]: wm-zuul-status: show reload immediately | T214068
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48487 and previous config saved to /var/cache/conftool/dbconfig/20230523-084157-root.json
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48486 and previous config saved to /var/cache/conftool/dbconfig/20230523-083741-root.json
  • 08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1122.eqiad.wmnet
  • 08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1122.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 08:35 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1122.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 08:32 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 08:27 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1122.eqiad.wmnet
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48485 and previous config saved to /var/cache/conftool/dbconfig/20230523-082653-root.json
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48484 and previous config saved to /var/cache/conftool/dbconfig/20230523-082237-root.json
  • 08:14 kartik@deploy1002: Finished scap: Backport for Special:Contribute: Correct language code for Albanian (T327868) (duration: 08m 37s)
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1119 from dbctl T337206', diff saved to https://phabricator.wikimedia.org/P48483 and previous config saved to /var/cache/conftool/dbconfig/20230523-081342-marostegui.json
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48482 and previous config saved to /var/cache/conftool/dbconfig/20230523-081148-root.json
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48481 and previous config saved to /var/cache/conftool/dbconfig/20230523-080732-root.json
  • 08:07 kartik@deploy1002: kartik: Backport for Special:Contribute: Correct language code for Albanian (T327868) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 08:05 kartik@deploy1002: Started scap: Backport for Special:Contribute: Correct language code for Albanian (T327868)
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48480 and previous config saved to /var/cache/conftool/dbconfig/20230523-075227-root.json
  • 07:51 hashar@deploy1002: Finished deploy [gerrit/gerrit@d151775]: wm-zuul-status: offer to reload on CI completion | T214068 (duration: 00m 07s)
  • 07:51 hashar@deploy1002: Started deploy [gerrit/gerrit@d151775]: wm-zuul-status: offer to reload on CI completion | T214068
  • 07:47 marostegui@deploy1002: Finished scap: Backport for Revert "db-production.php: Disable writes in es5" (duration: 07m 19s)
  • 07:44 hashar@deploy1002: Finished deploy [gerrit/gerrit@e815301]: wm-zuul-status: offer to reload on CI completion | T214068 (duration: 00m 07s)
  • 07:44 hashar@deploy1002: Started deploy [gerrit/gerrit@e815301]: wm-zuul-status: offer to reload on CI completion | T214068
  • 07:41 marostegui@deploy1002: marostegui: Backport for Revert "db-production.php: Disable writes in es5" synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 07:39 marostegui@deploy1002: Started scap: Backport for Revert "db-production.php: Disable writes in es5"
  • 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1024 T337285', diff saved to https://phabricator.wikimedia.org/P48479 and previous config saved to /var/cache/conftool/dbconfig/20230523-073841-root.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48478 and previous config saved to /var/cache/conftool/dbconfig/20230523-073722-root.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1023 to es5 primary T337285', diff saved to https://phabricator.wikimedia.org/P48477 and previous config saved to /var/cache/conftool/dbconfig/20230523-073710-root.json
  • 07:36 marostegui: Starting es5 eqiad failover from es1024 to es1023 T337285
  • 07:25 marostegui@deploy1002: Finished scap: Backport for db-production.php: Disable writes in es5 (T337285) (duration: 07m 16s)
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48476 and previous config saved to /var/cache/conftool/dbconfig/20230523-072218-root.json
  • 07:19 marostegui@deploy1002: marostegui: Backport for db-production.php: Disable writes in es5 (T337285) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 07:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es5 T337285
  • 07:17 marostegui@deploy1002: Started scap: Backport for db-production.php: Disable writes in es5 (T337285)
  • 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es5 T337285
  • 07:14 kartik@deploy1002: Finished scap: Backport for Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868) (duration: 09m 42s)
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48475 and previous config saved to /var/cache/conftool/dbconfig/20230523-070713-root.json
  • 07:06 kartik@deploy1002: kartik: Backport for Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48474 and previous config saved to /var/cache/conftool/dbconfig/20230523-070547-root.json
  • 07:04 kartik@deploy1002: Started scap: Backport for Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868)
  • 07:00 marostegui@deploy1002: Finished scap: Backport for Revert "db-production: Disable es4 writes" (duration: 06m 58s)
  • 06:54 marostegui@deploy1002: marostegui: Backport for Revert "db-production: Disable es4 writes" synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 06:53 marostegui@deploy1002: Started scap: Backport for Revert "db-production: Disable es4 writes"
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48473 and previous config saved to /var/cache/conftool/dbconfig/20230523-065042-root.json
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Change es1020 weight', diff saved to https://phabricator.wikimedia.org/P48472 and previous config saved to /var/cache/conftool/dbconfig/20230523-064850-root.json
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1021 T337283', diff saved to https://phabricator.wikimedia.org/P48471 and previous config saved to /var/cache/conftool/dbconfig/20230523-064820-root.json
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1020 to es4 primary T337283', diff saved to https://phabricator.wikimedia.org/P48470 and previous config saved to /var/cache/conftool/dbconfig/20230523-064729-root.json
  • 06:46 marostegui: Starting es4 eqiad failover from es1021 to es1020 - T337283
  • 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Set es1020 with weight 0 T337283', diff saved to https://phabricator.wikimedia.org/P48469 and previous config saved to /var/cache/conftool/dbconfig/20230523-063836-root.json
  • 06:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es4 T337283
  • 06:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es4 T337283
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48468 and previous config saved to /var/cache/conftool/dbconfig/20230523-063538-root.json
  • 06:26 marostegui@deploy1002: Finished scap: Backport for db-production: Disable es4 writes (T337283) (duration: 08m 21s)
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48467 and previous config saved to /var/cache/conftool/dbconfig/20230523-062033-root.json
  • 06:19 marostegui@deploy1002: marostegui: Backport for db-production: Disable es4 writes (T337283) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 06:18 marostegui@deploy1002: Started scap: Backport for db-production: Disable es4 writes (T337283)
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48466 and previous config saved to /var/cache/conftool/dbconfig/20230523-060528-root.json
  • 06:04 kart_: cxserver: Remove Flores MT service (T331505)
  • 06:03 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 06:02 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 06:00 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 06:00 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 05:56 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 05:56 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48465 and previous config saved to /var/cache/conftool/dbconfig/20230523-055024-root.json
  • 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48464 and previous config saved to /var/cache/conftool/dbconfig/20230523-053519-root.json
  • 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48463 and previous config saved to /var/cache/conftool/dbconfig/20230523-052014-root.json
  • 03:54 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.8 (duration: 02m 17s)
  • 03:51 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.10 refs T330216 (duration: 49m 04s)
  • 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.10 refs T330216
  • 02:57 eileen: civicrm upgraded from 3329155a to 6642b602
  • 02:22 eileen: civicrm upgraded from 7eae24d5 to 3329155a

2023-05-22

  • 23:29 eileen: civicrm upgraded from cc9593d0 to 7eae24d5
  • 23:16 zabe@deploy1002: Finished scap: Backport for Enable VE on new wikis (duration: 06m 58s)
  • 23:11 zabe@deploy1002: zabe: Backport for Enable VE on new wikis synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 23:09 zabe@deploy1002: Started scap: Backport for Enable VE on new wikis
  • 21:38 sbassett: Deployed security mitigations for T333140 and T336027
  • 20:55 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts labstore1004.eqiad.wmnet
  • 20:55 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:54 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: labstore1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 20:53 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: labstore1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 20:51 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 20:45 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts labstore1004.eqiad.wmnet
  • 20:44 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts labstore1005.eqiad.wmnet
  • 20:44 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:44 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: labstore1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 20:43 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: labstore1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 20:40 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 20:33 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts labstore1005.eqiad.wmnet
  • 20:27 TheresNoTime: close UTC late backport window
  • 20:24 samtar@deploy1002: Finished scap: Backport for [kaawiki] Enable SandboxLink extension (T336648) (duration: 07m 47s)
  • 20:17 samtar@deploy1002: samtar and superpes: Backport for [kaawiki] Enable SandboxLink extension (T336648) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:16 samtar@deploy1002: Started scap: Backport for [kaawiki] Enable SandboxLink extension (T336648)
  • 20:14 samtar@deploy1002: Finished scap: Backport for [ruwiki] Add 'abusefilter log/view private' flags to ArbCom (T336625) (duration: 08m 22s)
  • 20:11 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs[2010-2011].codfw.wmnet
  • 20:09 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs[2010-2011].codfw.wmnet
  • 20:08 samtar@deploy1002: superpes and samtar: Backport for [ruwiki] Add 'abusefilter log/view private' flags to ArbCom (T336625) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:06 samtar@deploy1002: Started scap: Backport for [ruwiki] Add 'abusefilter log/view private' flags to ArbCom (T336625)
  • 19:22 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:22 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:20 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:20 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 17:04 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@5ee7a62]: (no justification provided) (duration: 00m 17s)
  • 17:03 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@5ee7a62]: (no justification provided)
  • 16:58 XioNoX: push mgmt_junos to all L2 switches
  • 16:35 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs2009.codfw.wmnet
  • 16:35 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2009.codfw.wmnet
  • 15:57 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs2009.codfw.wmnet
  • 15:56 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs2009.codfw.wmnet
  • 15:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
  • 15:26 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
  • 15:25 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox-canary
  • 15:25 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox-canary
  • 15:12 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "New debmonitor VMs - jmm@cumin2002 - T241049"
  • 15:10 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "New debmonitor VMs - jmm@cumin2002 - T241049"
  • 14:32 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:31 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:10 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:10 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 12:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host debmonitor2003.codfw.wmnet with OS bookworm
  • 12:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on debmonitor2003.codfw.wmnet with reason: host reimage
  • 12:40 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on debmonitor2003.codfw.wmnet with reason: host reimage
  • 12:20 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host debmonitor2003.codfw.wmnet with OS bookworm
  • 12:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host debmonitor1003.eqiad.wmnet with OS bookworm
  • 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on debmonitor1003.eqiad.wmnet with reason: host reimage
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2124', diff saved to https://phabricator.wikimedia.org/P48456 and previous config saved to /var/cache/conftool/dbconfig/20230522-115936-root.json
  • 11:57 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on debmonitor1003.eqiad.wmnet with reason: host reimage
  • 11:45 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host debmonitor1003.eqiad.wmnet with OS bookworm
  • 10:17 topranks: Un-draining transport circuit from eqsin to codfw, moving traffic back to default path T337220
  • 10:17 topranks: Un-draining transport circuit from eqsin to codfw, moving traffic back to default path
  • 10:06 hashar@deploy1002: Finished scap: Backport for Revert "[WikibaseMediaInfo] Add 'main subject of' property" (duration: 37m 00s)
  • 10:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host debmonitor2003.codfw.wmnet
  • 10:06 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM debmonitor2003.codfw.wmnet - jmm@cumin2002"
  • 10:05 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM debmonitor2003.codfw.wmnet - jmm@cumin2002"
  • 10:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) debmonitor2003.codfw.wmnet on all recursors
  • 10:04 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache debmonitor2003.codfw.wmnet on all recursors
  • 10:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM debmonitor2003.codfw.wmnet - jmm@cumin2002"
  • 10:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM debmonitor2003.codfw.wmnet - jmm@cumin2002"
  • 10:02 moritzm: installing updated usb.ids packages for Bullseye
  • 10:01 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 10:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host debmonitor2003.codfw.wmnet
  • 09:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host debmonitor1003.eqiad.wmnet
  • 09:51 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM debmonitor1003.eqiad.wmnet - jmm@cumin2002"
  • 09:50 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM debmonitor1003.eqiad.wmnet - jmm@cumin2002"
  • 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) debmonitor1003.eqiad.wmnet on all recursors
  • 09:49 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache debmonitor1003.eqiad.wmnet on all recursors
  • 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM debmonitor1003.eqiad.wmnet - jmm@cumin2002"
  • 09:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM debmonitor1003.eqiad.wmnet - jmm@cumin2002"
  • 09:43 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 09:43 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host debmonitor1003.eqiad.wmnet
  • 09:39 hashar@deploy1002: hashar: Backport for Revert "[WikibaseMediaInfo] Add 'main subject of' property" synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 09:29 hashar@deploy1002: Started scap: Backport for Revert "[WikibaseMediaInfo] Add 'main subject of' property"
  • 08:46 marostegui: Stop mysql on db2160 (haproxy irc alerts will be generated)
  • 08:28 elukey: drain Arelion link between cr1-codfw and cr3-eqsin to mitigate packet loss eqiad <-> eqsin
  • 08:22 moritzm: installing systemd security updates
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48455 and previous config saved to /var/cache/conftool/dbconfig/20230522-081724-root.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48454 and previous config saved to /var/cache/conftool/dbconfig/20230522-080219-root.json
  • 07:59 elukey: restart purged on cp5017 as test to clear out consumer group timeouts and rejoin events
  • 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48453 and previous config saved to /var/cache/conftool/dbconfig/20230522-075613-root.json
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48452 and previous config saved to /var/cache/conftool/dbconfig/20230522-074715-root.json
  • 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48451 and previous config saved to /var/cache/conftool/dbconfig/20230522-074109-root.json
  • 07:37 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 07:32 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 07:32 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48450 and previous config saved to /var/cache/conftool/dbconfig/20230522-073210-root.json
  • 07:28 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48449 and previous config saved to /var/cache/conftool/dbconfig/20230522-072604-root.json
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48448 and previous config saved to /var/cache/conftool/dbconfig/20230522-071705-root.json
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48447 and previous config saved to /var/cache/conftool/dbconfig/20230522-071333-root.json
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48446 and previous config saved to /var/cache/conftool/dbconfig/20230522-071326-root.json
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48445 and previous config saved to /var/cache/conftool/dbconfig/20230522-071319-root.json
  • 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48444 and previous config saved to /var/cache/conftool/dbconfig/20230522-071059-root.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48443 and previous config saved to /var/cache/conftool/dbconfig/20230522-070200-root.json
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48442 and previous config saved to /var/cache/conftool/dbconfig/20230522-065828-root.json
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48441 and previous config saved to /var/cache/conftool/dbconfig/20230522-065822-root.json
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48440 and previous config saved to /var/cache/conftool/dbconfig/20230522-065815-root.json
  • 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48439 and previous config saved to /var/cache/conftool/dbconfig/20230522-065555-root.json
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48438 and previous config saved to /var/cache/conftool/dbconfig/20230522-064656-root.json
  • 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 T337206', diff saved to https://phabricator.wikimedia.org/P48437 and previous config saved to /var/cache/conftool/dbconfig/20230522-064541-root.json
  • 06:45 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts bast2002
  • 06:45 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:43 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48436 and previous config saved to /var/cache/conftool/dbconfig/20230522-064323-root.json
  • 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48435 and previous config saved to /var/cache/conftool/dbconfig/20230522-064317-root.json
  • 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48434 and previous config saved to /var/cache/conftool/dbconfig/20230522-064310-root.json
  • 06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1121.eqiad.wmnet
  • 06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1121.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48433 and previous config saved to /var/cache/conftool/dbconfig/20230522-064050-root.json
  • 06:40 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1121.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 06:38 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 06:37 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast2002
  • 06:33 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1121.eqiad.wmnet
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48432 and previous config saved to /var/cache/conftool/dbconfig/20230522-063151-root.json
  • 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48431 and previous config saved to /var/cache/conftool/dbconfig/20230522-062818-root.json
  • 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48430 and previous config saved to /var/cache/conftool/dbconfig/20230522-062812-root.json
  • 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48429 and previous config saved to /var/cache/conftool/dbconfig/20230522-062805-root.json
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48428 and previous config saved to /var/cache/conftool/dbconfig/20230522-062545-root.json
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Give weight to es2024', diff saved to https://phabricator.wikimedia.org/P48427 and previous config saved to /var/cache/conftool/dbconfig/20230522-061947-marostegui.json
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2023 T337204', diff saved to https://phabricator.wikimedia.org/P48426 and previous config saved to /var/cache/conftool/dbconfig/20230522-061925-root.json
  • 06:17 marostegui: Starting es5 codfw failover from es2023 to es2024 - T337204
  • 06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es5 T337204
  • 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2024 with weight 0 T337204', diff saved to https://phabricator.wikimedia.org/P48425 and previous config saved to /var/cache/conftool/dbconfig/20230522-061524-root.json
  • 06:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es5 T337204
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48424 and previous config saved to /var/cache/conftool/dbconfig/20230522-061314-root.json
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48423 and previous config saved to /var/cache/conftool/dbconfig/20230522-061307-root.json
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48422 and previous config saved to /var/cache/conftool/dbconfig/20230522-061300-root.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48421 and previous config saved to /var/cache/conftool/dbconfig/20230522-061040-root.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2021', diff saved to https://phabricator.wikimedia.org/P48420 and previous config saved to /var/cache/conftool/dbconfig/20230522-061033-marostegui.json
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48419 and previous config saved to /var/cache/conftool/dbconfig/20230522-055809-root.json
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48418 and previous config saved to /var/cache/conftool/dbconfig/20230522-055803-root.json
  • 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48417 and previous config saved to /var/cache/conftool/dbconfig/20230522-055756-root.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48416 and previous config saved to /var/cache/conftool/dbconfig/20230522-055120-root.json
  • 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48415 and previous config saved to /var/cache/conftool/dbconfig/20230522-054304-root.json
  • 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48414 and previous config saved to /var/cache/conftool/dbconfig/20230522-054258-root.json
  • 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48413 and previous config saved to /var/cache/conftool/dbconfig/20230522-054251-root.json
  • 05:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2021 T337203', diff saved to https://phabricator.wikimedia.org/P48412 and previous config saved to /var/cache/conftool/dbconfig/20230522-053705-marostegui.json
  • 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2020 to es4 codfw primaryT337203', diff saved to https://phabricator.wikimedia.org/P48411 and previous config saved to /var/cache/conftool/dbconfig/20230522-053554-marostegui.json
  • 05:34 marostegui: Starting es4 codfw failover from es2021 to es2020 - T337203
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2020 with weight 0 T337203', diff saved to https://phabricator.wikimedia.org/P48410 and previous config saved to /var/cache/conftool/dbconfig/20230522-052938-root.json
  • 05:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es4 T337203
  • 05:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es4 T337203
  • 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48409 and previous config saved to /var/cache/conftool/dbconfig/20230522-052800-root.json
  • 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48408 and previous config saved to /var/cache/conftool/dbconfig/20230522-052753-root.json
  • 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48407 and previous config saved to /var/cache/conftool/dbconfig/20230522-052746-root.json
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1029, es1030, es1031 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P48406 and previous config saved to /var/cache/conftool/dbconfig/20230522-051957-root.json
  • 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Failover es1, es2 and es3 masters for kernel reboots', diff saved to https://phabricator.wikimedia.org/P48405 and previous config saved to /var/cache/conftool/dbconfig/20230522-051723-marostegui.json

2023-05-21

  • 07:45 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 07:44 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 07:43 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 07:42 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 07:41 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 07:40 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply

2023-05-20

  • 18:25 effie: restart varnish cp3061
  • 16:39 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=parse1018.eqiad.wmnet
  • 15:17 hoo@deploy1002: Finished scap: Backport for Remove linkitem dependency on jquery.wikibase.wbtooltip (T337081) (duration: 08m 47s)
  • 15:10 hoo@deploy1002: hoo: Backport for Remove linkitem dependency on jquery.wikibase.wbtooltip (T337081) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 15:08 hoo@deploy1002: Started scap: Backport for Remove linkitem dependency on jquery.wikibase.wbtooltip (T337081)
  • 14:41 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=parse1018.eqiad.wmnet
  • 09:08 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:08 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Added records for the new private.codfw.wikimedia.cloud domain - volans@cumin1001"
  • 09:07 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Added records for the new private.codfw.wikimedia.cloud domain - volans@cumin1001"
  • 09:00 volans@cumin1001: START - Cookbook sre.dns.netbox

2023-05-19

  • 21:22 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:22 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
  • 21:21 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
  • 21:19 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 20:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1495.eqiad.wmnet
  • 19:46 mutante: mw1469 - sudo pkill ffmpeg (per runbook)
  • 19:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1469.eqiad.wmnet
  • 19:45 mutante: depooled mw1469 from videoscaler, dedicating to just jobrunner
  • 19:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1469.eqiad.wmnet
  • 19:36 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@b34c529]: (no justification provided) (duration: 00m 09s)
  • 19:36 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@b34c529]: (no justification provided)
  • 16:55 mutante: mw2448 - scap pull - T2334429
  • 15:31 taavi@deploy1002: Finished scap: Backport for i18n: Add link to help page (T322717), Enable RealMe (T324535) (duration: 22m 02s)
  • 15:21 taavi@deploy1002: legoktm and taavi: Backport for i18n: Add link to help page (T322717), Enable RealMe (T324535) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 15:09 taavi@deploy1002: Started scap: Backport for i18n: Add link to help page (T322717), Enable RealMe (T324535)
  • 15:06 legoktm@deploy1002: Finished scap: Backport for Disable GWToolset from Commons (T270911) (duration: 09m 46s)
  • 15:06 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 14:59 elukey@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-eqiad
  • 14:58 legoktm@deploy1002: legoktm: Backport for Disable GWToolset from Commons (T270911) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 14:57 legoktm@deploy1002: Started scap: Backport for Disable GWToolset from Commons (T270911)
  • 14:40 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 14:36 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on stat1009.eqiad.wmnet with reason: Bringing stat1009 into service
  • 14:36 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on stat1009.eqiad.wmnet with reason: Bringing stat1009 into service
  • 14:35 sukhe: enable puppet on A:lvs, finished rolling out change
  • 14:20 sukhe: disable puppet on A:lvs to roll out CR 910566
  • 14:17 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on wdqs1014.eqiad.wmnet with reason: firmware update
  • 14:16 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on wdqs1014.eqiad.wmnet with reason: firmware update
  • 13:35 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@be05071]: (no justification provided) (duration: 00m 10s)
  • 13:34 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs1020.eqiad.wmnet with reason: Move lvs1020 handoff port to row e/f from lsw1-f1 to ssw1-f1
  • 13:34 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@be05071]: (no justification provided)
  • 13:34 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs1020.eqiad.wmnet with reason: Move lvs1020 handoff port to row e/f from lsw1-f1 to ssw1-f1
  • 13:26 topranks: Adding vlan config for row e/f vlans on ssw1-f1-eqiad (T322937)
  • 13:17 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.9 refs T330215
  • 12:19 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
  • 11:27 klausman@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-codfw
  • 11:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host testvm2004.codfw.wmnet with OS bullseye
  • 10:55 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts bast2002
  • 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast2002 decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 10:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2004.codfw.wmnet with reason: host reimage
  • 10:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast2002 decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 10:50 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2004.codfw.wmnet with reason: host reimage
  • 10:45 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1003.eqiad.wmnet
  • 10:44 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 10:38 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1003.eqiad.wmnet
  • 10:37 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast2002
  • 10:35 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2004.codfw.wmnet with OS bullseye
  • 10:07 moritzm: installing ncurses security updates
  • 10:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host testvm2002.codfw.wmnet with OS bullseye
  • 09:49 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 09:49 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 09:48 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 09:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
  • 09:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
  • 09:31 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2002.codfw.wmnet with OS bullseye
  • 09:21 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[2040-2043].codfw.wmnet
  • 09:21 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:21 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ms-be[2040-2043].codfw.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin2002"
  • 09:21 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-codfw
  • 09:18 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ms-be[2040-2043].codfw.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin2002"
  • 09:15 mvernon@cumin2002: START - Cookbook sre.dns.netbox
  • 09:08 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2002.codfw.wmnet
  • 09:02 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2002.codfw.wmnet
  • 08:59 mvernon@cumin2002: START - Cookbook sre.hosts.decommission for hosts ms-be[2040-2043].codfw.wmnet
  • 08:58 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2001.codfw.wmnet
  • 08:52 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2001.codfw.wmnet
  • 08:45 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2001.codfw.wmnet
  • 08:41 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2001.codfw.wmnet
  • 08:38 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2002.codfw.wmnet
  • 08:38 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 08:34 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2002.codfw.wmnet
  • 08:31 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2003.codfw.wmnet
  • 08:27 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2003.codfw.wmnet
  • 08:18 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2003.codfw.wmnet
  • 08:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host netflow2003.codfw.wmnet with OS bookworm
  • 08:11 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2003.codfw.wmnet
  • 08:10 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2002.codfw.wmnet
  • 08:09 moritzm: copy samplicator from bullseye-wikimedia to bookworm-wikimedia T330884
  • 08:03 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2002.codfw.wmnet
  • 07:58 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2001.codfw.wmnet
  • 07:52 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2001.codfw.wmnet
  • 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48397 and previous config saved to /var/cache/conftool/dbconfig/20230519-074256-root.json
  • 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48396 and previous config saved to /var/cache/conftool/dbconfig/20230519-074044-root.json
  • 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48395 and previous config saved to /var/cache/conftool/dbconfig/20230519-073959-root.json
  • 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netflow2003.codfw.wmnet with reason: host reimage
  • 07:31 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netflow2003.codfw.wmnet with reason: host reimage
  • 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48394 and previous config saved to /var/cache/conftool/dbconfig/20230519-072751-root.json
  • 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48393 and previous config saved to /var/cache/conftool/dbconfig/20230519-072539-root.json
  • 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48392 and previous config saved to /var/cache/conftool/dbconfig/20230519-072454-root.json
  • 07:21 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: prometheus4001.ulsfo.wmnet
  • 07:21 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: prometheus4001.ulsfo.wmnet
  • 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48391 and previous config saved to /var/cache/conftool/dbconfig/20230519-071247-root.json
  • 07:11 moritzm: installing emacs security updates
  • 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48390 and previous config saved to /var/cache/conftool/dbconfig/20230519-071034-root.json
  • 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48389 and previous config saved to /var/cache/conftool/dbconfig/20230519-070949-root.json
  • 06:59 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host netflow2003.codfw.wmnet with OS bookworm
  • 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48388 and previous config saved to /var/cache/conftool/dbconfig/20230519-065742-root.json
  • 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48387 and previous config saved to /var/cache/conftool/dbconfig/20230519-065530-root.json
  • 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48386 and previous config saved to /var/cache/conftool/dbconfig/20230519-065445-root.json
  • 06:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast6002.wikimedia.org
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48385 and previous config saved to /var/cache/conftool/dbconfig/20230519-064237-root.json
  • 06:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast6002.wikimedia.org
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48384 and previous config saved to /var/cache/conftool/dbconfig/20230519-064025-root.json
  • 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48383 and previous config saved to /var/cache/conftool/dbconfig/20230519-063940-root.json
  • 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48382 and previous config saved to /var/cache/conftool/dbconfig/20230519-062733-root.json
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48381 and previous config saved to /var/cache/conftool/dbconfig/20230519-062520-root.json
  • 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48380 and previous config saved to /var/cache/conftool/dbconfig/20230519-062435-root.json
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48379 and previous config saved to /var/cache/conftool/dbconfig/20230519-061228-root.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48378 and previous config saved to /var/cache/conftool/dbconfig/20230519-061016-root.json
  • 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48377 and previous config saved to /var/cache/conftool/dbconfig/20230519-060931-root.json
  • 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48376 and previous config saved to /var/cache/conftool/dbconfig/20230519-055723-root.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48375 and previous config saved to /var/cache/conftool/dbconfig/20230519-055511-root.json
  • 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48374 and previous config saved to /var/cache/conftool/dbconfig/20230519-055426-root.json
  • 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2027', diff saved to https://phabricator.wikimedia.org/P48373 and previous config saved to /var/cache/conftool/dbconfig/20230519-054952-root.json
  • 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2034 to es3 master', diff saved to https://phabricator.wikimedia.org/P48372 and previous config saved to /var/cache/conftool/dbconfig/20230519-054923-marostegui.json
  • 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2031', diff saved to https://phabricator.wikimedia.org/P48371 and previous config saved to /var/cache/conftool/dbconfig/20230519-054758-root.json
  • 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2033 to es2 master', diff saved to https://phabricator.wikimedia.org/P48370 and previous config saved to /var/cache/conftool/dbconfig/20230519-054737-marostegui.json
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2030', diff saved to https://phabricator.wikimedia.org/P48369 and previous config saved to /var/cache/conftool/dbconfig/20230519-054503-root.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2032 to es1 master', diff saved to https://phabricator.wikimedia.org/P48368 and previous config saved to /var/cache/conftool/dbconfig/20230519-054403-marostegui.json
  • 05:37 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1121 from dbctl T336725', diff saved to https://phabricator.wikimedia.org/P48367 and previous config saved to /var/cache/conftool/dbconfig/20230519-053719-marostegui.json

2023-05-18

  • 23:26 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.9 refs T330215
  • 22:59 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad plugin upgrade - bking@cumin1001 - T332355
  • 22:21 mutante: contint2001 - moving files owned by zuul to new UID/GID - in progress
  • 22:20 mutante: short down-time for zuul-merger on contint2001
  • 21:47 mutante: maintenance for zuul (CI) on contint servers
  • 21:31 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.9 refs T330215
  • 21:13 brennen@deploy1002: Finished scap: Backport for cache: Do not throw on empty set in LinkBatch::constructSet (T336964) (duration: 09m 38s)
  • 21:05 brennen@deploy1002: brennen: Backport for cache: Do not throw on empty set in LinkBatch::constructSet (T336964) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 21:03 brennen@deploy1002: Started scap: Backport for cache: Do not throw on empty set in LinkBatch::constructSet (T336964)
  • 21:01 urbanecm@deploy1002: Finished scap: Backport for Silently ignore istype-depicts image suggestion type (T336962) (duration: 08m 09s)
  • 20:54 urbanecm@deploy1002: urbanecm: Backport for Silently ignore istype-depicts image suggestion type (T336962) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:53 urbanecm@deploy1002: Started scap: Backport for Silently ignore istype-depicts image suggestion type (T336962)
  • 20:36 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad plugin upgrade - bking@cumin1001 - T332355
  • 20:33 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade - bking@cumin1001 - T332355
  • 20:16 urbanecm@deploy1002: Finished scap: Backport for Reverts hewiki A/B test (T335309) (duration: 10m 25s)
  • 20:07 urbanecm@deploy1002: ksarabia and urbanecm: Backport for Reverts hewiki A/B test (T335309) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:06 urbanecm@deploy1002: Started scap: Backport for Reverts hewiki A/B test (T335309)
  • 18:57 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@502ddae]: T333001 (duration: 00m 35s)
  • 18:56 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@502ddae]: T333001
  • 18:55 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (2 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - bking@cumin1001 - T332355
  • 18:50 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.8 refs T330215
  • 18:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts gitlab-runner1003.eqiad.wmnet
  • 18:31 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:31 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add ssw1 irb int dns - cmooney@cumin1001"
  • 18:30 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add ssw1 irb int dns - cmooney@cumin1001"
  • 18:27 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 18:20 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:20 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add ssw1 irb int dns - cmooney@cumin1001"
  • 18:19 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add ssw1 irb int dns - cmooney@cumin1001"
  • 18:18 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.9 refs T330215
  • 18:11 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade - bking@cumin1001 - T332355
  • 18:09 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (2 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - bking@cumin1001 - T332355
  • 18:07 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - bking@cumin1001 - T274204
  • 18:04 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 17:59 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - bking@cumin1001 - T274204
  • 17:38 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:37 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 17:36 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 17:35 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 17:29 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:29 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 17:27 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:26 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 17:26 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 17:26 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 17:26 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:26 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 16:55 XioNoX: push new pfw policies - T336896
  • 16:21 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:21 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:10 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet with OS bullseye
  • 15:58 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:58 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:57 inflatador: bking@cumin1001 starting rolling restart of wcqs for java updates T334470
  • 15:53 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 15:50 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 15:47 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@6e3358d]: (no justification provided) (duration: 00m 10s)
  • 15:47 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@6e3358d]: (no justification provided)
  • 15:37 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1002.eqiad.wmnet
  • 15:37 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
  • 15:31 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1002.eqiad.wmnet
  • 15:29 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1001.eqiad.wmnet
  • 15:25 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 15:23 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1001.eqiad.wmnet
  • 15:20 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2003.codfw.wmnet
  • 15:19 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-staging-worker
  • 15:18 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:18 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:17 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:16 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2003.codfw.wmnet
  • 15:15 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:13 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2002.codfw.wmnet
  • 15:09 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2002.codfw.wmnet
  • 15:08 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2001.codfw.wmnet
  • 15:04 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2001.codfw.wmnet
  • 15:03 stevemunene@deploy1002: Finished deploy [airflow-dags/analytics_product@6e3358d]: (no justification provided) (duration: 00m 06s)
  • 15:02 stevemunene@deploy1002: Started deploy [airflow-dags/analytics_product@6e3358d]: (no justification provided)
  • 14:59 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:59 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:57 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:56 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:38 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts gitlab-runner1003.eqiad.wmnet
  • 14:34 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker
  • 14:31 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 14:31 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 14:30 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:30 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:01 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-serve-worker-codfw
  • 13:59 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-codfw
  • 13:52 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-staging-worker
  • 13:50 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker
  • 13:49 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-staging-worker
  • 13:47 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker
  • 13:18 TheresNoTime: closing backport window
  • 13:14 samtar@deploy1002: Finished scap: Backport for InitialiseSettings: Set wgWatchersMaxAge=30days (T336250) (duration: 08m 45s)
  • 13:07 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 13:07 samtar@deploy1002: samtar and s-mukuti: Backport for InitialiseSettings: Set wgWatchersMaxAge=30days (T336250) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 13:06 samtar@deploy1002: Started scap: Backport for InitialiseSettings: Set wgWatchersMaxAge=30days (T336250)
  • 13:02 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 12:59 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: Revert Enable First Input Delay events. This is causing validation errors as well as breakages in the hadoop ingestion pipepine - T332012 (duration: 06m 19s)
  • 12:57 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 12:56 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 12:54 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-staging-worker
  • 12:51 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker
  • 12:51 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 12:51 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 12:46 otto@deploy1002: Synchronized wmf-config/ext-EventLogging.php: Revert Enable First Input Delay events. This is causing validation errors as well as breakages in the hadoop ingestion pipepine - T332012 (duration: 07m 00s)
  • 12:46 elukey: clean up old jupyterhub.service references (crash looping) on stat* nodes that had it
  • 12:44 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 12:44 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 12:41 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-ctrl2002.codfw.wmnet
  • 12:35 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-ctrl2002.codfw.wmnet
  • 12:35 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-ctrl2001.codfw.wmnet
  • 12:35 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:34 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 12:28 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-ctrl2001.codfw.wmnet
  • 12:24 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 12:24 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 12:20 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1003.eqiad.wmnet
  • 12:19 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 12:17 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 12:16 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1003.eqiad.wmnet
  • 12:15 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1002.eqiad.wmnet
  • 12:12 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 12:11 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1002.eqiad.wmnet
  • 12:06 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1001.eqiad.wmnet
  • 12:02 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1001.eqiad.wmnet
  • 11:56 topranks: reconfiguring DHCP relay function on eqiad core routers (T320508)
  • 11:55 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1001.eqiad.wmnet
  • 11:51 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1001.eqiad.wmnet
  • 11:36 kart_: MinT: Update to 2023-05-18-060931-production and Set CT2_INTRA_THREADS to 0 (T336483)
  • 11:34 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 11:28 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 11:23 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 11:20 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 11:11 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 11:09 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 11:07 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache1003.eqiad.wmnet
  • 11:00 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache1003.eqiad.wmnet
  • 10:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache1002.eqiad.wmnet
  • 10:50 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache1002.eqiad.wmnet
  • 10:32 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache1001.eqiad.wmnet
  • 10:30 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on an-worker1110.eqiad.wmnet with reason: Troubleshooting failed disk
  • 10:29 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on an-worker1110.eqiad.wmnet with reason: Troubleshooting failed disk
  • 10:25 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache1001.eqiad.wmnet
  • 10:24 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ml-cache1001.eqiad.wmnet
  • 10:24 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache1001.eqiad.wmnet
  • 10:06 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 10:05 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 08:30 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 08:29 akosiaris: upgrade docker-registry to 2.8.2 on all registry hosts
  • 08:28 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 08:27 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 08:26 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=registry2003.codfw.wmnet
  • 08:24 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 08:24 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 08:19 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 08:19 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 08:00 akosiaris: upgrade registry on registry2003 to 2.8.2
  • 07:59 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=registry2003.codfw.wmnet
  • 07:25 apergos: UTC morning backport and config training window done
  • 07:15 kartik@deploy1002: Finished scap: Backport for Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868) (duration: 09m 18s)
  • 07:07 kartik@deploy1002: kartik: Backport for Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 07:06 kartik@deploy1002: Started scap: Backport for Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868)
  • 06:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2134,2160].codfw.wmnet,db[1159,1217].eqiad.wmnet with reason: maintenance
  • 06:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db[2134,2160].codfw.wmnet,db[1159,1217].eqiad.wmnet with reason: maintenance
  • 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1122 from dbctl T336833', diff saved to https://phabricator.wikimedia.org/P48362 and previous config saved to /var/cache/conftool/dbconfig/20230518-060734-marostegui.json
  • 04:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2096,2101,2115,2131].codfw.wmnet with reason: maintenance
  • 04:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db[2096,2101,2115,2131].codfw.wmnet with reason: maintenance

2023-05-17

  • 22:30 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:30 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove new openstack.codfw1dev.wikimediacloud.org name server A records. - cmooney@cumin1001"
  • 22:29 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove new openstack.codfw1dev.wikimediacloud.org name server A records. - cmooney@cumin1001"
  • 22:26 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 22:15 krinkle@deploy1002: Synchronized wmf-config/: T332012 (duration: 06m 51s)
  • 21:44 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2012.codfw.wmnet
  • 21:26 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12 days, 0:00:00 on wdqs2012.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 21:26 bking@cumin1001: START - Cookbook sre.hosts.downtime for 12 days, 0:00:00 on wdqs2012.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 21:01 zabe: mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Public policy" "Global Advocacy" "Zabe" --reason "per request T333842"
  • 20:59 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs2012.codfw.wmnet
  • 20:32 urbanecm: UTC late B&C window done
  • 20:29 urbanecm@deploy1002: Finished scap: Backport for GrowthExperiments: amend wrong wiki prefix for jbowiki (T308134), NewTopicOptOutActiveUsers: Skip bot users etc. (T317375), Enable zebra ab test in hewiki (T335972) (duration: 11m 36s)
  • 20:19 urbanecm@deploy1002: urbanecm and matmarex and ksarabia and sgimeno: Backport for GrowthExperiments: amend wrong wiki prefix for jbowiki (T308134), NewTopicOptOutActiveUsers: Skip bot users etc. (T317375), Enable zebra ab test in hewiki (T335972) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.
  • 20:17 urbanecm@deploy1002: Started scap: Backport for GrowthExperiments: amend wrong wiki prefix for jbowiki (T308134), NewTopicOptOutActiveUsers: Skip bot users etc. (T317375), Enable zebra ab test in hewiki (T335972)
  • 20:15 urbanecm@deploy1002: Finished scap: Backport for GrowthExperiments: enable add link frontend in 9th round wikis (T308134) (duration: 12m 06s)
  • 20:13 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs2012.codfw.wmnet
  • 20:12 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs2012.codfw.wmnet
  • 20:07 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs2012.codfw.wmnet
  • 20:04 urbanecm@deploy1002: sgimeno and urbanecm: Backport for GrowthExperiments: enable add link frontend in 9th round wikis (T308134) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:03 urbanecm@deploy1002: Started scap: Backport for GrowthExperiments: enable add link frontend in 9th round wikis (T308134)
  • 19:55 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 19:54 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 19:54 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs2012.codfw.wmnet
  • 19:50 ejegg: payments-wiki upgraded from 8988a598 to a7567c6a
  • 19:41 inflatador: bking@wdqs2012 depooling to attempt firmware update T331297
  • 19:01 Amir1: Removing db1112 from zarcillo T336332
  • 18:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1112.eqiad.wmnet
  • 18:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1112.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
  • 18:58 ladsgroup@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1112.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
  • 18:48 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
  • 18:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1112.eqiad.wmnet
  • 18:34 brennen@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.9 refs T330215 (duration: 06m 22s)
  • 18:27 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.9 refs T330215
  • 18:11 otto@deploy1002: Finished deploy [analytics/refinery@fb22795]: Deploy for ProduceCanaryEvents fix - [analytics/refinery@fb22795] (duration: 09m 14s)
  • 18:03 brennen: train 1.41.0-wmf.9 (T330215): no current blockers, rolling to group1 as backup-backup conductor
  • 18:02 otto@deploy1002: Started deploy [analytics/refinery@fb22795]: Deploy for ProduceCanaryEvents fix - [analytics/refinery@fb22795]
  • 17:44 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 17:44 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 17:44 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 17:44 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 17:43 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: sync
  • 17:43 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: sync
  • 17:19 brett: Maglev LVS scheduler rollout finished in esams - T263797
  • 16:58 Guest4300: Running `foreachwiki extensions/TimedMediaHandler/maintenance/requeueTranscodes.php --video --mime=video/mpeg --missing --error --stalled --throttle` on mwmaint1002 for T244570
  • 16:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
  • 16:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1032 (T335845)', diff saved to https://phabricator.wikimedia.org/P48356 and previous config saved to /var/cache/conftool/dbconfig/20230517-162444-ladsgroup.json
  • 16:21 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
  • 16:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2032 (T335845)', diff saved to https://phabricator.wikimedia.org/P48355 and previous config saved to /var/cache/conftool/dbconfig/20230517-161929-ladsgroup.json
  • 16:18 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 16:17 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 16:14 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 16:13 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 16:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P48354 and previous config saved to /var/cache/conftool/dbconfig/20230517-160937-ladsgroup.json
  • 16:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2032', diff saved to https://phabricator.wikimedia.org/P48353 and previous config saved to /var/cache/conftool/dbconfig/20230517-160423-ladsgroup.json
  • 16:00 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 16:00 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:57 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 15:56 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 15:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P48352 and previous config saved to /var/cache/conftool/dbconfig/20230517-155431-ladsgroup.json
  • 15:52 brett: Rolling out maglev LVS scheduler in esams - T263797
  • 15:52 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 15:50 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2032', diff saved to https://phabricator.wikimedia.org/P48351 and previous config saved to /var/cache/conftool/dbconfig/20230517-154916-ladsgroup.json
  • 15:46 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 15:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1032 (T335845)', diff saved to https://phabricator.wikimedia.org/P48350 and previous config saved to /var/cache/conftool/dbconfig/20230517-153925-ladsgroup.json
  • 15:38 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2032 (T335845)', diff saved to https://phabricator.wikimedia.org/P48349 and previous config saved to /var/cache/conftool/dbconfig/20230517-153410-ladsgroup.json
  • 15:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1032 (T335845)', diff saved to https://phabricator.wikimedia.org/P48348 and previous config saved to /var/cache/conftool/dbconfig/20230517-153042-ladsgroup.json
  • 15:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance
  • 15:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance
  • 15:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2032 (T335845)', diff saved to https://phabricator.wikimedia.org/P48347 and previous config saved to /var/cache/conftool/dbconfig/20230517-153010-ladsgroup.json
  • 15:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1027 (T335845)', diff saved to https://phabricator.wikimedia.org/P48346 and previous config saved to /var/cache/conftool/dbconfig/20230517-153004-ladsgroup.json
  • 15:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2032.codfw.wmnet with reason: Maintenance
  • 15:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2032.codfw.wmnet with reason: Maintenance
  • 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2028 (T335845)', diff saved to https://phabricator.wikimedia.org/P48345 and previous config saved to /var/cache/conftool/dbconfig/20230517-152945-ladsgroup.json
  • 15:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2002.wikimedia.org
  • 15:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2002.wikimedia.org
  • 15:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1002.wikimedia.org
  • 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1027', diff saved to https://phabricator.wikimedia.org/P48344 and previous config saved to /var/cache/conftool/dbconfig/20230517-151458-ladsgroup.json
  • 15:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1002.wikimedia.org
  • 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2028', diff saved to https://phabricator.wikimedia.org/P48343 and previous config saved to /var/cache/conftool/dbconfig/20230517-151438-ladsgroup.json
  • 15:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host zookeeper-test1002.eqiad.wmnet
  • 15:07 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 15:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host zookeeper-test1002.eqiad.wmnet
  • 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1027', diff saved to https://phabricator.wikimedia.org/P48342 and previous config saved to /var/cache/conftool/dbconfig/20230517-145952-ladsgroup.json
  • 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2028', diff saved to https://phabricator.wikimedia.org/P48341 and previous config saved to /var/cache/conftool/dbconfig/20230517-145932-ladsgroup.json
  • 14:48 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on P{aqs101[6-9]*} and A:aqs
  • 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1027 (T335845)', diff saved to https://phabricator.wikimedia.org/P48340 and previous config saved to /var/cache/conftool/dbconfig/20230517-144446-ladsgroup.json
  • 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2028 (T335845)', diff saved to https://phabricator.wikimedia.org/P48339 and previous config saved to /var/cache/conftool/dbconfig/20230517-144425-ladsgroup.json
  • 14:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2028 (T335845)', diff saved to https://phabricator.wikimedia.org/P48338 and previous config saved to /var/cache/conftool/dbconfig/20230517-144025-ladsgroup.json
  • 14:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2028.codfw.wmnet with reason: Maintenance
  • 14:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2028.codfw.wmnet with reason: Maintenance
  • 14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1027 (T335845)', diff saved to https://phabricator.wikimedia.org/P48337 and previous config saved to /var/cache/conftool/dbconfig/20230517-143949-ladsgroup.json
  • 14:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1027.eqiad.wmnet with reason: Maintenance
  • 14:39 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams - EventBus: produce to mediawiki.page_change.v1 stream - T336817 (duration: 06m 20s)
  • 14:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1027.eqiad.wmnet with reason: Maintenance
  • 14:38 btullis@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:dse-k8s-worker
  • 14:36 moritzm: installing jackson-databind security updates
  • 14:34 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@ad1cc7c]: deploying hotfix for T336800 (duration: 00m 09s)
  • 14:34 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@ad1cc7c]: deploying hotfix for T336800
  • 14:33 ottomata: EventBus: produce to mediawiki.page_change.v1 stream - T336817
  • 14:30 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
  • 14:30 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
  • 14:28 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
  • 14:28 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
  • 14:27 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync
  • 14:27 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: sync
  • 14:27 ottomata: rolling restart of eventgate-main to pick up new mediawiki.page_change.v1 stream config - T336817
  • 14:17 elukey: run authdns-update for new ml-serve/ores discovery endpoints - T336726
  • 14:15 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on P{aqs101[6-9]*} and A:aqs
  • 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on P{aqs101[2-5]*} and A:aqs
  • 14:14 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: wgEventStreams - Declare mediawiki.page_change.v1 stream - T336817 (duration: 07m 30s)
  • 14:10 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:09 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:09 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:08 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:07 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1101.eqiad.wmnet
  • 13:59 taavi@deploy1002: Finished scap: Backport for Define $maintClass in maintenance script for compatibility (T317375) (duration: 07m 24s)
  • 13:59 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1101.eqiad.wmnet
  • 13:59 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1100.eqiad.wmnet
  • 13:54 taavi@deploy1002: matmarex and taavi: Backport for Define $maintClass in maintenance script for compatibility (T317375) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 13:52 taavi@deploy1002: Started scap: Backport for Define $maintClass in maintenance script for compatibility (T317375)
  • 13:50 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1100.eqiad.wmnet
  • 13:50 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1099.eqiad.wmnet
  • 13:47 taavi@deploy1002: Finished scap: Backport for dblists: Close akwiki (T336675) (duration: 08m 11s)
  • 13:42 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on P{aqs101[2-5]*} and A:aqs
  • 13:42 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on P{aqs102[0-1]*} and A:aqs
  • 13:41 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1099.eqiad.wmnet
  • 13:41 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1098.eqiad.wmnet
  • 13:40 taavi@deploy1002: taavi and maurelio: Backport for dblists: Close akwiki (T336675) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:38 taavi@deploy1002: Started scap: Backport for dblists: Close akwiki (T336675)
  • 13:38 taavi@deploy1002: Finished scap: Backport for plwiki: Show language selector in main page header (T336707) (duration: 07m 39s)
  • 13:33 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1098.eqiad.wmnet
  • 13:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1097.eqiad.wmnet
  • 13:32 taavi@deploy1002: stang and taavi: Backport for plwiki: Show language selector in main page header (T336707) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:30 taavi@deploy1002: Started scap: Backport for plwiki: Show language selector in main page header (T336707)
  • 13:29 taavi@deploy1002: Finished scap: Backport for Enable wmgWikibaseTmpWbsubscribersSensibleOutput on wikidata (T336760), Enable wmgWikibaseTmpEnableLabelsInApiSummaries on Wikidata (T335099) (duration: 09m 15s)
  • 13:25 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on P{aqs102[0-1]*} and A:aqs
  • 13:25 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1097.eqiad.wmnet
  • 13:25 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1096.eqiad.wmnet
  • 13:25 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on P{aqs1011*} and A:aqs
  • 13:24 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:23 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:23 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 13:22 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 13:22 taavi@deploy1002: gtzatchkova and taavi: Backport for Enable wmgWikibaseTmpWbsubscribersSensibleOutput on wikidata (T336760), Enable wmgWikibaseTmpEnableLabelsInApiSummaries on Wikidata (T335099) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 13:22 btullis@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker
  • 13:20 taavi@deploy1002: Started scap: Backport for Enable wmgWikibaseTmpWbsubscribersSensibleOutput on wikidata (T336760), Enable wmgWikibaseTmpEnableLabelsInApiSummaries on Wikidata (T335099)
  • 13:20 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:19 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 13:18 daniel@deploy1002: Finished scap: Backport for Revert "Revert "Add getMultiHttpClient function to make HTTP requests to Mathoid."" (T335347), Use MultiHttpClient instead of VirtualRESTService. (T335347) (duration: 11m 52s)
  • 13:17 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on P{aqs1011*} and A:aqs
  • 13:16 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1096.eqiad.wmnet
  • 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on A:aqs-canary
  • 13:07 daniel@deploy1002: daniel: Backport for Revert "Revert "Add getMultiHttpClient function to make HTTP requests to Mathoid."" (T335347), Use MultiHttpClient instead of VirtualRESTService. (T335347) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 13:06 daniel@deploy1002: Started scap: Backport for Revert "Revert "Add getMultiHttpClient function to make HTTP requests to Mathoid."" (T335347), Use MultiHttpClient instead of VirtualRESTService. (T335347)
  • 13:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-coord1004.eqiad.wmnet
  • 13:00 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on A:aqs-canary
  • 12:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034 (T335845)', diff saved to https://phabricator.wikimedia.org/P48335 and previous config saved to /var/cache/conftool/dbconfig/20230517-125952-ladsgroup.json
  • 12:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1033 (T335845)', diff saved to https://phabricator.wikimedia.org/P48334 and previous config saved to /var/cache/conftool/dbconfig/20230517-125824-ladsgroup.json
  • 12:56 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-coord1004.eqiad.wmnet
  • 12:56 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-coord1003.eqiad.wmnet
  • 12:54 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:54 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records following puppetdb bulk import - cmooney@cumin1001"
  • 12:52 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records following puppetdb bulk import - cmooney@cumin1001"
  • 12:50 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:49 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-coord1003.eqiad.wmnet
  • 12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034', diff saved to https://phabricator.wikimedia.org/P48333 and previous config saved to /var/cache/conftool/dbconfig/20230517-124446-ladsgroup.json
  • 12:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P48332 and previous config saved to /var/cache/conftool/dbconfig/20230517-124318-ladsgroup.json
  • 12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034', diff saved to https://phabricator.wikimedia.org/P48331 and previous config saved to /var/cache/conftool/dbconfig/20230517-122940-ladsgroup.json
  • 12:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P48330 and previous config saved to /var/cache/conftool/dbconfig/20230517-122812-ladsgroup.json
  • 12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034 (T335845)', diff saved to https://phabricator.wikimedia.org/P48329 and previous config saved to /var/cache/conftool/dbconfig/20230517-121434-ladsgroup.json
  • 12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1033 (T335845)', diff saved to https://phabricator.wikimedia.org/P48328 and previous config saved to /var/cache/conftool/dbconfig/20230517-121306-ladsgroup.json
  • 12:12 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
  • 12:11 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
  • 12:06 topranks: Merging CR822439 and beginning bulk puppetdb -> netbox import to update host interfaces
  • 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1034 (T335845)', diff saved to https://phabricator.wikimedia.org/P48327 and previous config saved to /var/cache/conftool/dbconfig/20230517-115943-ladsgroup.json
  • 11:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1034.eqiad.wmnet with reason: Maintenance
  • 11:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1034.eqiad.wmnet with reason: Maintenance
  • 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028 (T335845)', diff saved to https://phabricator.wikimedia.org/P48326 and previous config saved to /var/cache/conftool/dbconfig/20230517-115908-ladsgroup.json
  • 11:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1033 (T335845)', diff saved to https://phabricator.wikimedia.org/P48325 and previous config saved to /var/cache/conftool/dbconfig/20230517-115612-ladsgroup.json
  • 11:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance
  • 11:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance
  • 11:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1026 (T335845)', diff saved to https://phabricator.wikimedia.org/P48324 and previous config saved to /var/cache/conftool/dbconfig/20230517-115538-ladsgroup.json
  • 11:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2034 (T335845)', diff saved to https://phabricator.wikimedia.org/P48323 and previous config saved to /var/cache/conftool/dbconfig/20230517-115303-ladsgroup.json
  • 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P48322 and previous config saved to /var/cache/conftool/dbconfig/20230517-114402-ladsgroup.json
  • 11:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P48321 and previous config saved to /var/cache/conftool/dbconfig/20230517-114032-ladsgroup.json
  • 11:38 kart_: Update MinT to 2023-05-17-052844-production: Set CT2_USE_EXPERIMENTAL_PACKED_GEMM for better performance
  • 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2034', diff saved to https://phabricator.wikimedia.org/P48320 and previous config saved to /var/cache/conftool/dbconfig/20230517-113757-ladsgroup.json
  • 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2033 (T335845)', diff saved to https://phabricator.wikimedia.org/P48319 and previous config saved to /var/cache/conftool/dbconfig/20230517-113531-ladsgroup.json
  • 11:33 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P48318 and previous config saved to /var/cache/conftool/dbconfig/20230517-112856-ladsgroup.json
  • 11:28 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 11:26 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P48317 and previous config saved to /var/cache/conftool/dbconfig/20230517-112526-ladsgroup.json
  • 11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2034', diff saved to https://phabricator.wikimedia.org/P48316 and previous config saved to /var/cache/conftool/dbconfig/20230517-112251-ladsgroup.json
  • 11:22 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2033', diff saved to https://phabricator.wikimedia.org/P48315 and previous config saved to /var/cache/conftool/dbconfig/20230517-112024-ladsgroup.json
  • 11:15 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 11:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028 (T335845)', diff saved to https://phabricator.wikimedia.org/P48314 and previous config saved to /var/cache/conftool/dbconfig/20230517-111350-ladsgroup.json
  • 11:13 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 11:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1026 (T335845)', diff saved to https://phabricator.wikimedia.org/P48313 and previous config saved to /var/cache/conftool/dbconfig/20230517-111020-ladsgroup.json
  • 11:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2034 (T335845)', diff saved to https://phabricator.wikimedia.org/P48312 and previous config saved to /var/cache/conftool/dbconfig/20230517-110745-ladsgroup.json
  • 11:07 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 11:06 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 11:05 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 11:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2033', diff saved to https://phabricator.wikimedia.org/P48311 and previous config saved to /var/cache/conftool/dbconfig/20230517-110518-ladsgroup.json
  • 11:05 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 11:04 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 11:04 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 11:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2034 (T335845)', diff saved to https://phabricator.wikimedia.org/P48310 and previous config saved to /var/cache/conftool/dbconfig/20230517-110251-ladsgroup.json
  • 11:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2034.codfw.wmnet with reason: Maintenance
  • 11:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2034.codfw.wmnet with reason: Maintenance
  • 11:02 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 11:01 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 11:01 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 11:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1026 (T335845)', diff saved to https://phabricator.wikimedia.org/P48309 and previous config saved to /var/cache/conftool/dbconfig/20230517-110130-ladsgroup.json
  • 11:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance
  • 11:01 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 11:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance
  • 11:00 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 11:00 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 10:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1028 (T335845)', diff saved to https://phabricator.wikimedia.org/P48308 and previous config saved to /var/cache/conftool/dbconfig/20230517-105957-ladsgroup.json
  • 10:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance
  • 10:59 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 10:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance
  • 10:59 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 10:58 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 10:58 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 10:57 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 10:57 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 10:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2033 (T335845)', diff saved to https://phabricator.wikimedia.org/P48307 and previous config saved to /var/cache/conftool/dbconfig/20230517-105012-ladsgroup.json
  • 10:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2033 (T335845)', diff saved to https://phabricator.wikimedia.org/P48306 and previous config saved to /var/cache/conftool/dbconfig/20230517-104519-ladsgroup.json
  • 10:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2033.codfw.wmnet with reason: Maintenance
  • 10:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2033.codfw.wmnet with reason: Maintenance
  • 10:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2026 (T335845)', diff saved to https://phabricator.wikimedia.org/P48305 and previous config saved to /var/cache/conftool/dbconfig/20230517-104454-ladsgroup.json
  • 10:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P48304 and previous config saved to /var/cache/conftool/dbconfig/20230517-103815-ladsgroup.json
  • 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48303 and previous config saved to /var/cache/conftool/dbconfig/20230517-103129-root.json
  • 10:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2026', diff saved to https://phabricator.wikimedia.org/P48302 and previous config saved to /var/cache/conftool/dbconfig/20230517-102948-ladsgroup.json
  • 10:26 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 10:25 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P48301 and previous config saved to /var/cache/conftool/dbconfig/20230517-102310-ladsgroup.json
  • 10:19 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 10:18 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 10:17 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 10:17 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 10:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48300 and previous config saved to /var/cache/conftool/dbconfig/20230517-101624-root.json
  • 10:16 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 10:16 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2026', diff saved to https://phabricator.wikimedia.org/P48299 and previous config saved to /var/cache/conftool/dbconfig/20230517-101442-ladsgroup.json
  • 10:09 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 10:08 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 10:08 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 10:08 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P48298 and previous config saved to /var/cache/conftool/dbconfig/20230517-100805-ladsgroup.json
  • 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48297 and previous config saved to /var/cache/conftool/dbconfig/20230517-100120-root.json
  • 09:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2026 (T335845)', diff saved to https://phabricator.wikimedia.org/P48296 and previous config saved to /var/cache/conftool/dbconfig/20230517-095936-ladsgroup.json
  • 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2026 (T335845)', diff saved to https://phabricator.wikimedia.org/P48295 and previous config saved to /var/cache/conftool/dbconfig/20230517-095443-ladsgroup.json
  • 09:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2026.codfw.wmnet with reason: Maintenance
  • 09:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2026.codfw.wmnet with reason: Maintenance
  • 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P48294 and previous config saved to /var/cache/conftool/dbconfig/20230517-095301-ladsgroup.json
  • 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48293 and previous config saved to /var/cache/conftool/dbconfig/20230517-094615-root.json
  • 09:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2029 (T335845)', diff saved to https://phabricator.wikimedia.org/P48292 and previous config saved to /var/cache/conftool/dbconfig/20230517-093928-ladsgroup.json
  • 09:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2029.codfw.wmnet with reason: Maintenance
  • 09:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2029.codfw.wmnet with reason: Maintenance
  • 09:39 elukey: roll restart pybal on lvs2010, lvs2009, lvs1020, lvs1019 to pick up a VIP (see https://gerrit.wikimedia.org/r/c/operations/puppet/+/920219) - T336726
  • 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48291 and previous config saved to /var/cache/conftool/dbconfig/20230517-093110-root.json
  • 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48290 and previous config saved to /var/cache/conftool/dbconfig/20230517-091606-root.json
  • 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1220 cleaning gtid_domain_id', diff saved to https://phabricator.wikimedia.org/P48289 and previous config saved to /var/cache/conftool/dbconfig/20230517-091407-root.json
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48288 and previous config saved to /var/cache/conftool/dbconfig/20230517-085855-root.json
  • 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48287 and previous config saved to /var/cache/conftool/dbconfig/20230517-084350-root.json
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48285 and previous config saved to /var/cache/conftool/dbconfig/20230517-082846-root.json
  • 08:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1001.eqiad.wmnet
  • 08:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb1001.eqiad.wmnet
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48284 and previous config saved to /var/cache/conftool/dbconfig/20230517-081341-root.json
  • 08:08 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 08:08 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 08:05 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 08:04 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48283 and previous config saved to /var/cache/conftool/dbconfig/20230517-075836-root.json
  • 07:57 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 07:57 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 07:48 moritzm: upgrading krb1001 to Bullseye T331695
  • 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on krb1001.eqiad.wmnet with reason: Update to Bullseye
  • 07:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on krb1001.eqiad.wmnet with reason: Update to Bullseye
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48278 and previous config saved to /var/cache/conftool/dbconfig/20230517-074332-root.json
  • 07:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'clear' for AS: 37468
  • 07:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'clear' for AS: 37468
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 4%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48277 and previous config saved to /var/cache/conftool/dbconfig/20230517-072827-root.json
  • 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1122 for decommissioning', diff saved to https://phabricator.wikimedia.org/P48276 and previous config saved to /var/cache/conftool/dbconfig/20230517-072508-root.json
  • 07:19 kartik@deploy1002: Finished scap: Backport for Revert "Enable the new Special:Contribute page entry point for desktop on selected wikis" (duration: 07m 22s)
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48275 and previous config saved to /var/cache/conftool/dbconfig/20230517-071428-root.json
  • 07:13 kartik@deploy1002: trainbranchbot and kartik: Backport for Revert "Enable the new Special:Contribute page entry point for desktop on selected wikis" synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 3%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48274 and previous config saved to /var/cache/conftool/dbconfig/20230517-071322-root.json
  • 07:11 kartik@deploy1002: Started scap: Backport for Revert "Enable the new Special:Contribute page entry point for desktop on selected wikis"
  • 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 T336725', diff saved to https://phabricator.wikimedia.org/P48273 and previous config saved to /var/cache/conftool/dbconfig/20230517-071039-root.json
  • 07:09 kartik@deploy1002: Backport cancelled.
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48272 and previous config saved to /var/cache/conftool/dbconfig/20230517-065923-root.json
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 2%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48271 and previous config saved to /var/cache/conftool/dbconfig/20230517-065817-root.json
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48270 and previous config saved to /var/cache/conftool/dbconfig/20230517-064419-root.json
  • 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48269 and previous config saved to /var/cache/conftool/dbconfig/20230517-064313-root.json
  • 06:40 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 06:39 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 06:39 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 06:38 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 06:37 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 06:37 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48268 and previous config saved to /var/cache/conftool/dbconfig/20230517-062914-root.json
  • 06:22 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 06:21 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 06:20 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 06:20 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 06:19 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 06:18 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48267 and previous config saved to /var/cache/conftool/dbconfig/20230517-061409-root.json
  • 06:01 volans: restarted ferm on ms-be1047
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48265 and previous config saved to /var/cache/conftool/dbconfig/20230517-055904-root.json
  • 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2096', diff saved to https://phabricator.wikimedia.org/P48264 and previous config saved to /var/cache/conftool/dbconfig/20230517-055310-root.json
  • 05:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1115.eqiad.wmnet
  • 05:49 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 05:49 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1115.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 05:48 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1115.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 05:46 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 05:41 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1115.eqiad.wmnet
  • 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1112 from dbctl T336332', diff saved to https://phabricator.wikimedia.org/P48263 and previous config saved to /var/cache/conftool/dbconfig/20230517-052007-marostegui.json
  • 05:16 marostegui: Optimize s7 on dbstore1003 T336733
  • 00:21 krinkle@deploy1002: Synchronized src/: I4cfa4a2474b4e (duration: 06m 01s)
  • 00:15 krinkle@deploy1002: Synchronized wmf-config/: I4cfa4a2474b4e (duration: 06m 14s)
  • 00:07 krinkle@deploy1002: Synchronized lib/: I4cfa4a2474b4e (duration: 06m 51s)

2023-05-16

  • 20:59 jdrewniak@deploy1002: Finished scap: Backport for Add maint script to opt out active users from the new topic tool (T317375) (duration: 07m 18s)
  • 20:53 jdrewniak@deploy1002: jdrewniak and matmarex: Backport for Add maint script to opt out active users from the new topic tool (T317375) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:52 jdrewniak@deploy1002: Started scap: Backport for Add maint script to opt out active users from the new topic tool (T317375)
  • 20:49 volans@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device ssw1-a8-codfw.mgmt.codfw.wmnet
  • 20:49 jdrewniak@deploy1002: Finished scap: Backport for Consolidate watchstar icon updating logic under watchstar.js (T336640 T336641) (duration: 09m 19s)
  • 20:41 jdrewniak@deploy1002: jdrewniak: Backport for Consolidate watchstar icon updating logic under watchstar.js (T336640 T336641) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:39 jdrewniak@deploy1002: Started scap: Backport for Consolidate watchstar icon updating logic under watchstar.js (T336640 T336641)
  • 20:36 jdrewniak@deploy1002: Finished scap: Backport for Ensure mw-watchlink is used for the sticky header watchlink (T336640 T336641) (duration: 07m 44s)
  • 20:30 jdrewniak@deploy1002: jdrewniak: Backport for Ensure mw-watchlink is used for the sticky header watchlink (T336640 T336641) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 20:30 brett: Rolling out maglev LVS scheduler in drmrs (for real this time) - T263797
  • 20:29 jdrewniak@deploy1002: Started scap: Backport for Ensure mw-watchlink is used for the sticky header watchlink (T336640 T336641)
  • 19:13 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:13 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a8-codfw - volans@cumin2002"
  • 19:12 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a8-codfw - volans@cumin2002"
  • 19:10 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 19:10 volans@cumin2002: START - Cookbook sre.network.provision for device ssw1-a8-codfw.mgmt.codfw.wmnet
  • 19:04 sukhe: dummry run of authdns-update to confirm new hosts
  • 19:00 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns2003.wikimedia.org
  • 19:00 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:00 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns2003.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 18:59 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns2003.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 18:57 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 18:54 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_codfw
  • 18:54 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_codfw
  • 18:52 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns2003.wikimedia.org
  • 18:50 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2022.*
  • 18:50 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2021.*
  • 18:50 volans@cumin2002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a8-codfw.mgmt.codfw.wmnet
  • 18:50 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:50 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a8-codfw - volans@cumin2002"
  • 18:49 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a8-codfw - volans@cumin2002"
  • 18:47 ryankemper: [WDQS] Pooled `wdqs2012`
  • 18:46 ryankemper: [WDQS] Pooled `wdqs2006` (not sure why was depooled)
  • 18:46 sukhe: homer "cr*-codfw*" commit "Gerrit: 920363 remove to-be decommissioned host dns2003": T335777
  • 18:46 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 18:43 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:43 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a8-codfw - volans@cumin2002"
  • 18:42 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a8-codfw - volans@cumin2002"
  • 18:41 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 18:41 volans@cumin2002: START - Cookbook sre.network.provision for device ssw1-a8-codfw.mgmt.codfw.wmnet
  • 18:36 sukhe: set routing-options static route 208.80.153.231/32 next-hop [ 208.80.153.48 208.80.153.74 208.80.153.107 ]: T326688
  • 18:34 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.9 refs T330215
  • 18:28 sukhe: homer "cr*-codfw*" commit "Gerrit: 920358 add new DNS host dns2006": T326688
  • 18:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2006.wikimedia.org with OS bullseye
  • 18:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2006.wikimedia.org with reason: host reimage
  • 18:02 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2006.wikimedia.org with reason: host reimage
  • 18:01 sukhe: enable puppet on A:cp-text
  • 17:58 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 17:57 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 17:56 mbsantos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 17:55 mbsantos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 17:52 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 17:52 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 17:47 volans@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a8-codfw.mgmt.codfw.wmnet
  • 17:47 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:47 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a8-codfw - volans@cumin1001"
  • 17:46 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a8-codfw - volans@cumin1001"
  • 17:45 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns2006.wikimedia.org with OS bullseye
  • 17:44 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 17:40 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:40 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a8-codfw - volans@cumin1001"
  • 17:40 moritzm: installing avahi security updates on buster
  • 17:39 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a8-codfw - volans@cumin1001"
  • 17:37 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 17:37 volans@cumin1001: START - Cookbook sre.network.provision for device ssw1-a8-codfw.mgmt.codfw.wmnet
  • 17:34 joal@deploy1002: Finished deploy [airflow-dags/analytics@7816937]: Regular analytics weekly train - Hotfix [airflow-dags@7816937] (duration: 00m 10s)
  • 17:34 joal@deploy1002: Started deploy [airflow-dags/analytics@7816937]: Regular analytics weekly train - Hotfix [airflow-dags@7816937]
  • 17:27 volans@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 17:27 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:27 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - volans@cumin1001"
  • 17:27 brett: Rolling out maglev LVS scheduler in drmrs - T263797
  • 17:26 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - volans@cumin1001"
  • 17:24 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 17:20 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:20 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - volans@cumin1001"
  • 17:19 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - volans@cumin1001"
  • 17:18 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns2002.wikimedia.org
  • 17:18 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:18 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns2002.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 17:17 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 17:17 volans@cumin1001: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 17:16 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns2002.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 17:14 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 17:09 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns2002.wikimedia.org
  • 17:00 sukhe: homer "cr*-codfw*" commit "Gerrit: 920320 remove to-be decommissioned host dns2002" T335777
  • 16:59 moritzm: installing 5.10.179 kernels on Bullseye hosts
  • 16:55 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor100[1256].eqiad.wmnet
  • 16:30 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 16:30 volans: restarting wikibugs ( https://www.mediawiki.org/wiki/Wikibugs#Help )
  • 16:06 mutante: gitlab-runner2003 - installed rsync client for debugging an issue with rsync from inside containers, comparing to from outside container
  • 15:49 sukhe: run authdns-update for CR 920314
  • 15:41 joal@deploy1002: Finished deploy [airflow-dags/analytics@7fa2dcd]: Regular analytics weekly train [airflow-dags@7fa2dcd] (duration: 00m 10s)
  • 15:41 joal@deploy1002: Started deploy [airflow-dags/analytics@7fa2dcd]: Regular analytics weekly train [airflow-dags@7fa2dcd]
  • 15:36 hashar: Some CI jobs started failing after an upgrade of some Jenkins plugins. I have upgraded a couple more and it seems to work now T336775
  • 15:33 sukhe: set routing-options static route 208.80.153.231/32 next-hop [ 208.80.153.10 208.80.153.48 208.80.153.74 ]: T326688
  • 15:33 sukhe: set routing-options static route 208.80.153.231/32 next-hop [ 208.80.153.10 208.80.153.48 208.80.153.74 ]
  • 15:32 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 15:32 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 15:27 hashar: Restarting CI Jenkins
  • 15:26 Emperor: rebalance codfw swift rings T335280
  • 15:18 hashar: CI Jenkins jobs are stall following the plugins upgrade :/
  • 15:07 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 15:04 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 15:03 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 14:59 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1001.eqiad.wmnet with OS bullseye
  • 14:55 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:49 moritzm: installing libxml2 security updates on buster
  • 14:48 sukhe: [done] "cr*-codfw*" commit "Gerrit: 919876 add new DNS host dns2005": T326688
  • 14:47 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:46 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 14:43 hashar: Restarting CI Jenkins
  • 14:42 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 14:42 sukhe: "cr*-codfw*" commit "Gerrit: 919876 add new DNS host dns2005": T326688
  • 14:36 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:32 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 14:32 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 14:32 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 14:31 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:31 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:30 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
  • 14:30 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
  • 14:27 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 14:27 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 14:26 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 14:26 bking@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:26 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 14:26 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2005.wikimedia.org with OS bullseye
  • 14:18 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@0c82f2d] (releasing): (no justification provided) (duration: 00m 45s)
  • 14:17 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@0c82f2d] (releasing): (no justification provided)
  • 14:10 akosiaris@cumin1001: END (FAIL) - Cookbook sre.discovery.datacenter (exit_code=93) pool all active/active services in codfw: codfw row D switches upgrade done - T335042
  • 14:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2005.wikimedia.org with reason: host reimage
  • 14:06 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2005.wikimedia.org with reason: host reimage
  • 13:54 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: codfw row D switches upgrade done - T335042
  • 13:53 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns2005.wikimedia.org with OS bullseye
  • 13:49 oblivian@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-eqiad
  • 13:46 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye
  • 13:46 Emperor: repool ms-fe2012 T335042
  • 13:45 oblivian@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-eqiad
  • 13:39 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=eventschemas,dc=codfw,name=schema2004.codfw.wmnet
  • 13:39 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=eventschemas,dc=codfw,name=schema2004.eqiad.wmnet
  • 13:33 mvernon@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe2003.codfw.wmnet,service=thanos-web
  • 13:33 mvernon@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe2003.codfwm.wmnet,service=thanos-web
  • 13:32 taavi@deploy1002: Finished scap: Backport for Add stream config for mobile apps schema (T336508) (duration: 09m 08s)
  • 13:32 Emperor: repool thanos-fe2003 T335042
  • 13:30 sukhe: running authdns-update to repool codfw
  • 13:26 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica2006.wikimedia.org
  • 13:25 taavi@deploy1002: mazevedo and taavi: Backport for Add stream config for mobile apps schema (T336508) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 13:25 moritzm: enabled Puppet in codfw/esams/ulsfo for switch maintenance T335042
  • 13:23 taavi@deploy1002: Started scap: Backport for Add stream config for mobile apps schema (T336508)
  • 13:01 XioNoX: asw-d-codfw> request system reboot all-members - T335042
  • 12:52 Emperor: depool ms-fe2012 T335042
  • 12:51 Emperor: depool thanos-fe2003 T335042
  • 12:50 moritzm: disabling Puppet in codfw/esams/ulsfo for switch maintenance T335042
  • 12:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 189 hosts with reason: codfw row D upgrade
  • 12:46 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 189 hosts with reason: codfw row D upgrade
  • 12:45 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1009.eqiad.wmnet
  • 12:39 akosiaris: reboot rdb1009 for kernel upgrades: possibly affected apps: netbox, changeprop, cpjobqueue, api-gateway, redisLockManager. Should be harmless however
  • 12:39 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1009.eqiad.wmnet
  • 12:35 godog: start cadvisor 0.44 upgrade to buster hosts - T336740
  • 12:29 joal@deploy1002: Finished deploy [analytics/refinery@2a0b1f2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2a0b1f2] (duration: 01m 30s)
  • 12:28 joal@deploy1002: Started deploy [analytics/refinery@2a0b1f2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2a0b1f2]
  • 12:27 joal@deploy1002: Finished deploy [analytics/refinery@2a0b1f2] (thin): Regular analytics weekly train THIN [analytics/refinery@2a0b1f2] (duration: 00m 04s)
  • 12:27 joal@deploy1002: Started deploy [analytics/refinery@2a0b1f2] (thin): Regular analytics weekly train THIN [analytics/refinery@2a0b1f2]
  • 12:24 sukhe: [done] running authdns-update to disable codfw for switch upgrade: T335042
  • 12:22 sukhe: running authdns-update to disable codfw for switch upgrade: T335042
  • 12:21 XioNoX: disable ping offload in codfw - T335042
  • 12:20 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 12:15 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 12:15 joal@deploy1002: Finished deploy [analytics/refinery@2a0b1f2] (thin): Regular analytics weekly train THIN [analytics/refinery@2a0b1f2] (duration: 00m 10s)
  • 12:15 joal@deploy1002: Started deploy [analytics/refinery@2a0b1f2] (thin): Regular analytics weekly train THIN [analytics/refinery@2a0b1f2]
  • 12:09 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 12:06 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 12:04 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 12:02 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 11:59 kart_: Updated cxserver to 2023-05-16-061239-production (T336657)
  • 11:57 XioNoX: stage upgrade on asw-d-codfw - T335042
  • 11:56 joal@deploy1002: Finished deploy [analytics/refinery@2a0b1f2]: Regular analytics weekly train [analytics/refinery@2a0b1f2] (duration: 10m 45s)
  • 11:56 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 11:55 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 11:55 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 11:55 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 11:53 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 11:52 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 11:51 oblivian@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-codfw
  • 11:50 marostegui: install 10.4.29 on db1151 T336462
  • 11:50 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 11:49 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 11:47 oblivian@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-codfw
  • 11:46 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 11:46 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 11:45 joal@deploy1002: Started deploy [analytics/refinery@2a0b1f2]: Regular analytics weekly train [analytics/refinery@2a0b1f2]
  • 11:44 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 11:43 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 11:30 jmm@cumin2002: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host testvm2002.codfw.wmnet with OS bookworm
  • 11:24 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2007.codfw.wmnet
  • 11:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 14 hosts with reason: maintenance
  • 11:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 14 hosts with reason: maintenance
  • 11:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 11 hosts with reason: maintenance
  • 11:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 11 hosts with reason: maintenance
  • 11:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: maintenance
  • 11:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 13 hosts with reason: maintenance
  • 11:20 akosiaris: reboot rdb2007 for kernel upgrades: possibly affected apps: netbox, changeprop, cpjobqueue, api-gateway, redisLockManager. Should be harmless however
  • 11:18 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2002.codfw.wmnet with OS bookworm
  • 11:17 jmm@cumin2002: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host testvm2004.codfw.wmnet with OS bookworm
  • 11:16 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2007.codfw.wmnet
  • 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2004.codfw.wmnet with OS bookworm
  • 11:00 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2010.codfw.wmnet
  • 11:00 moritzm: updated bookworm image to RC3 T330495
  • 10:59 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1012.eqiad.wmnet
  • 10:58 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2008.codfw.wmnet
  • 10:58 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1010.eqiad.wmnet
  • 10:52 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 10:52 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 10:51 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2010.codfw.wmnet
  • 10:51 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2008.codfw.wmnet
  • 10:51 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1012.eqiad.wmnet
  • 10:50 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1010.eqiad.wmnet
  • 10:50 hnowlan@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:49 hnowlan@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 10:48 akosiaris@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None
  • 10:48 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter status all services in all: None - None
  • 10:48 akosiaris@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None
  • 10:48 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter status all services in all: None - None
  • 10:48 akosiaris@cumin1001: END (FAIL) - Cookbook sre.discovery.datacenter (exit_code=93) depool all active/active services in codfw: codfw row D switches upgrade - T335042
  • 10:43 jelto@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host gitlab-runner1003.eqiad.wmnet
  • 10:40 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 10:39 jayme@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 10:39 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 10:38 jayme@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 10:36 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 10:36 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 10:35 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on mc-wf[2001-2002].codfw.wmnet,mc-wf[1001-1002].eqiad.wmnet with reason: kernel upgrade
  • 10:34 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on mc-wf[2001-2002].codfw.wmnet,mc-wf[1001-1002].eqiad.wmnet with reason: kernel upgrade
  • 10:34 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:34 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new VIP records for k8s-ingress-ml-serve - elukey@cumin1001"
  • 10:33 vgutierrez: testing HAProxy 2.7.8 in cp4052 and cp5032 (upload) - T317799
  • 10:33 elukey@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new VIP records for k8s-ingress-ml-serve - elukey@cumin1001"
  • 10:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 10:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 10:29 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter depool all active/active services in codfw: codfw row D switches upgrade - T335042
  • 10:28 elukey@cumin1001: START - Cookbook sre.dns.netbox
  • 10:13 Amir1: cleaning up echo notification table in all wikis (T318523)
  • 10:07 elukey@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 10:06 elukey@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 10:03 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 10:03 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 09:49 btullis@deploy1002: Finished deploy [airflow-dags/analytics_product@7642b62]: (no justification provided) (duration: 00m 09s)
  • 09:49 btullis@deploy1002: Started deploy [airflow-dags/analytics_product@7642b62]: (no justification provided)
  • 09:38 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1003.eqiad.wmnet
  • 09:31 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1004.eqiad.wmnet
  • 09:25 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1004.eqiad.wmnet
  • 09:23 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.reboot-runner (exit_code=1) rolling reboot on A:gitlab-runner
  • 09:23 jnuche@deploy1002: Installing scap version "4.52.2" for 595 hosts
  • 09:21 marostegui: Optimize s5 on dbstore1003 T336733
  • 08:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on es2034.codfw.wmnet with reason: Maintenance
  • 08:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on es2034.codfw.wmnet with reason: Maintenance
  • 08:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on es2033.codfw.wmnet with reason: Maintenance
  • 08:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on es2033.codfw.wmnet with reason: Maintenance
  • 08:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on es[2023-2025].codfw.wmnet with reason: maintenance
  • 08:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on es[2023-2025].codfw.wmnet with reason: maintenance
  • 08:18 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica2006.wikimedia.org
  • 08:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc2014.codfw.wmnet with reason: Maintenance
  • 08:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc2014.codfw.wmnet with reason: Maintenance
  • 08:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbproxy2004.codfw.wmnet with reason: Maintenance
  • 08:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbproxy2004.codfw.wmnet with reason: Maintenance
  • 08:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbproxy2003.codfw.wmnet with reason: Maintenance
  • 08:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbproxy2003.codfw.wmnet with reason: Maintenance
  • 07:52 jelto@cumin1001: START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:gitlab-runner
  • 07:28 Emperor: restart vopsbot.service on alert1001
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48254 and previous config saved to /var/cache/conftool/dbconfig/20230516-071509-root.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48253 and previous config saved to /var/cache/conftool/dbconfig/20230516-071453-root.json
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48252 and previous config saved to /var/cache/conftool/dbconfig/20230516-070005-root.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48251 and previous config saved to /var/cache/conftool/dbconfig/20230516-065948-root.json
  • 06:57 elukey@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 06:56 marostegui@deploy1002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc3 master" (duration: 06m 58s)
  • 06:51 marostegui@deploy1002: marostegui: Backport for Revert "ProductionServices.php: Promote pc1014 to pc3 master" synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 06:50 eileen: civicrm: revision d97a371e, config 686d3cb4
  • 06:49 marostegui@deploy1002: Started scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc3 master"
  • 06:49 _joe_: running docker image prune -a in build2001
  • 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48250 and previous config saved to /var/cache/conftool/dbconfig/20230516-064500-root.json
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48249 and previous config saved to /var/cache/conftool/dbconfig/20230516-064444-root.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48248 and previous config saved to /var/cache/conftool/dbconfig/20230516-062955-root.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48247 and previous config saved to /var/cache/conftool/dbconfig/20230516-062939-root.json
  • 06:24 marostegui@deploy1002: Finished scap: Backport for ProductionServices.php: Promote pc1014 to pc3 master (duration: 07m 08s)
  • 06:24 eileen: civicrm upgraded from ef7b3822 to d97a371e
  • 06:18 marostegui@deploy1002: marostegui: Backport for ProductionServices.php: Promote pc1014 to pc3 master synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 06:17 marostegui@deploy1002: Started scap: Backport for ProductionServices.php: Promote pc1014 to pc3 master
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48246 and previous config saved to /var/cache/conftool/dbconfig/20230516-061450-root.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48245 and previous config saved to /var/cache/conftool/dbconfig/20230516-061434-root.json
  • 06:05 marostegui@deploy1002: Finished scap: Backport for Revert "ProductionServices.php: Failover pc3 codfw host" (duration: 07m 21s)
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48244 and previous config saved to /var/cache/conftool/dbconfig/20230516-055946-root.json
  • 05:59 marostegui@deploy1002: marostegui: Backport for Revert "ProductionServices.php: Failover pc3 codfw host" synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48243 and previous config saved to /var/cache/conftool/dbconfig/20230516-055929-root.json
  • 05:58 marostegui@deploy1002: Started scap: Backport for Revert "ProductionServices.php: Failover pc3 codfw host"
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 T336332', diff saved to https://phabricator.wikimedia.org/P48242 and previous config saved to /var/cache/conftool/dbconfig/20230516-055122-root.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48241 and previous config saved to /var/cache/conftool/dbconfig/20230516-054441-root.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48240 and previous config saved to /var/cache/conftool/dbconfig/20230516-054425-root.json
  • 05:43 marostegui@deploy1002: Finished scap: Backport for ProductionServices.php: Failover pc3 codfw host (duration: 07m 15s)
  • 05:38 marostegui@deploy1002: marostegui: Backport for ProductionServices.php: Failover pc3 codfw host synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 05:36 marostegui@deploy1002: Started scap: Backport for ProductionServices.php: Failover pc3 codfw host
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48239 and previous config saved to /var/cache/conftool/dbconfig/20230516-052936-root.json
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48238 and previous config saved to /var/cache/conftool/dbconfig/20230516-052920-root.json
  • 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1221 T336337', diff saved to https://phabricator.wikimedia.org/P48237 and previous config saved to /var/cache/conftool/dbconfig/20230516-052026-root.json
  • 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 T336337', diff saved to https://phabricator.wikimedia.org/P48236 and previous config saved to /var/cache/conftool/dbconfig/20230516-052014-root.json
  • 03:54 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.6, 1.41.0-wmf.7 (duration: 02m 26s)
  • 03:51 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.9 refs T330215 (duration: 48m 47s)
  • 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.9 refs T330215

2023-05-15

  • 23:37 eileen: civicrm upgraded from db6e8d69 to ef7b3822
  • 22:02 maryum: deployed patch for T323651
  • 21:51 maryum: Deployed patch for T335612
  • 21:42 ejegg: payments-wiki upgraded from c0da741f to 8988a598 (and globalcollect settings deleted)
  • 20:00 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:00 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:56 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:56 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:55 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2003.codfw.wmnet
  • 19:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1003.eqiad.wmnet
  • 19:50 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic[2050-2054,2060,2067-2068,2072,2084-2086]* for row D switch upgrade - bking@cumin1001 - T335042
  • 19:50 bking@cumin1001: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic[2050-2054,2060,2067-2068,2072,2084-2086]* for row D switch upgrade - bking@cumin1001 - T335042
  • 19:50 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2003.codfw.wmnet
  • 19:49 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: elastic[2050-2054,2060,2067-2068,2072,2084-2086] for row D switch upgrade - bking@cumin1001 - T335042
  • 19:49 bking@cumin1001: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic[2050-2054,2060,2067-2068,2072,2084-2086] for row D switch upgrade - bking@cumin1001 - T335042
  • 19:47 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1003.eqiad.wmnet
  • 19:47 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 2:00:00 on 20 hosts with reason: T335042 maintenance
  • 19:47 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 2:00:00 on 20 hosts with reason: T335042 maintenance
  • 19:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2002.codfw.wmnet
  • 19:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1002.eqiad.wmnet
  • 19:33 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1002.eqiad.wmnet
  • 19:32 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2002.codfw.wmnet
  • 19:28 bking@deploy1002: Finished deploy [wdqs/wdqs@41174d5] (wcqs): deploy 0.3.124 to WCQS (duration: 02m 03s)
  • 19:26 bking@deploy1002: Started deploy [wdqs/wdqs@41174d5] (wcqs): deploy 0.3.124 to WCQS
  • 19:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1001.eqiad.wmnet
  • 19:22 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2001.codfw.wmnet
  • 19:19 bking@deploy1002: Finished deploy [wdqs/wdqs@41174d5]: (no justification provided) (duration: 00m 05s)
  • 19:19 bking@deploy1002: Started deploy [wdqs/wdqs@41174d5]: (no justification provided)
  • 19:18 bking@deploy1002: Finished deploy [wdqs/wdqs@41174d5]: (no justification provided) (duration: 00m 05s)
  • 19:18 bking@deploy1002: Started deploy [wdqs/wdqs@41174d5]: (no justification provided)
  • 19:18 bking@deploy1002: Finished deploy [wdqs/wdqs@41174d5]: (no justification provided) (duration: 05m 46s)
  • 19:15 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1001.eqiad.wmnet
  • 19:15 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2001.codfw.wmnet
  • 19:12 bking@deploy1002: Started deploy [wdqs/wdqs@41174d5]: (no justification provided)
  • 19:12 bking@deploy1002: Finished deploy [wdqs/wdqs@41174d5]: 0.3.124 (duration: 10m 05s)
  • 19:03 inflatador: [WDQS Deploy] Tests passing following deploy of `0.3.124` on canary `wdqs1003`; proceeding to rest of fleet
  • 19:02 bking@deploy1002: Started deploy [wdqs/wdqs@41174d5]: 0.3.124
  • 18:54 mutante: LDAP - added uid 'adee' to groups wmde and nda - T336434
  • 18:54 sukhe: set routing-options static route 208.80.153.231/32 next-hop [ 208.80.153.48 208.80.153.10 ]: codfw row D maint 2023/05/16 [dns2002] T335042
  • 18:33 brett: Rolling out maglev LVS scheduler in eqsin - T263797
  • 18:11 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns2005.wikimedia.org with OS bullseye
  • 18:11 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns2005.wikimedia.org with OS bullseye
  • 18:06 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns2005.wikimedia.org with OS bullseye
  • 18:06 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns2005.wikimedia.org with OS bullseye
  • 17:47 volans@cumin2002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 17:47 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:47 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - volans@cumin2002"
  • 17:46 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - volans@cumin2002"
  • 17:42 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 17:42 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:42 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - volans@cumin2002"
  • 17:41 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - volans@cumin2002"
  • 17:39 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 17:39 volans@cumin2002: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 17:30 volans@cumin2002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 17:30 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:30 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - volans@cumin2002"
  • 17:29 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - volans@cumin2002"
  • 17:27 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 17:27 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:27 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - volans@cumin2002"
  • 17:26 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - volans@cumin2002"
  • 17:15 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 17:15 volans@cumin2002: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 15:00 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: Setup Incomplete
  • 15:00 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: Setup Incomplete
  • 14:24 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: testing transferpy cookbook
  • 14:24 bking@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: testing transferpy cookbook
  • 14:21 volans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
  • 14:20 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:20 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:17 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:16 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:03 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 14:00 volans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 13:56 volans: re-enabled puppet on the install hosts to deploy changes for T336485
  • 13:45 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 13:33 volans: disabling puppet on the install hosts to deploy changes for T336485
  • 13:00 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 13:00 jelto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:58 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 12:58 jelto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 11:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1023 (T335845)', diff saved to https://phabricator.wikimedia.org/P48228 and previous config saved to /var/cache/conftool/dbconfig/20230515-111624-ladsgroup.json
  • 11:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1023', diff saved to https://phabricator.wikimedia.org/P48227 and previous config saved to /var/cache/conftool/dbconfig/20230515-110118-ladsgroup.json
  • 10:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1023', diff saved to https://phabricator.wikimedia.org/P48226 and previous config saved to /var/cache/conftool/dbconfig/20230515-104611-ladsgroup.json
  • 10:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1023 (T335845)', diff saved to https://phabricator.wikimedia.org/P48225 and previous config saved to /var/cache/conftool/dbconfig/20230515-103105-ladsgroup.json
  • 10:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1023 (T335845)', diff saved to https://phabricator.wikimedia.org/P48224 and previous config saved to /var/cache/conftool/dbconfig/20230515-102038-ladsgroup.json
  • 10:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1023.eqiad.wmnet with reason: Maintenance
  • 10:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1023.eqiad.wmnet with reason: Maintenance
  • 10:19 Amir1: Removing db1123 from zarcillo T334910
  • 10:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1123.eqiad.wmnet
  • 10:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1123.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
  • 10:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1020 (T335845)', diff saved to https://phabricator.wikimedia.org/P48223 and previous config saved to /var/cache/conftool/dbconfig/20230515-101329-ladsgroup.json
  • 10:13 ladsgroup@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1123.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
  • 10:11 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
  • 10:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1123.eqiad.wmnet
  • 09:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1020', diff saved to https://phabricator.wikimedia.org/P48222 and previous config saved to /var/cache/conftool/dbconfig/20230515-095823-ladsgroup.json
  • 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Remove db1123 from dbctl T334910', diff saved to https://phabricator.wikimedia.org/P48221 and previous config saved to /var/cache/conftool/dbconfig/20230515-095412-ladsgroup.json
  • 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1123 T334910', diff saved to https://phabricator.wikimedia.org/P48220 and previous config saved to /var/cache/conftool/dbconfig/20230515-094938-ladsgroup.json
  • 09:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1020', diff saved to https://phabricator.wikimedia.org/P48219 and previous config saved to /var/cache/conftool/dbconfig/20230515-094317-ladsgroup.json
  • 09:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15802
  • 09:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15802
  • 09:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1020 (T335845)', diff saved to https://phabricator.wikimedia.org/P48218 and previous config saved to /var/cache/conftool/dbconfig/20230515-092810-ladsgroup.json
  • 09:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1020 (T335845)', diff saved to https://phabricator.wikimedia.org/P48217 and previous config saved to /var/cache/conftool/dbconfig/20230515-091139-ladsgroup.json
  • 09:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1020.eqiad.wmnet with reason: Maintenance
  • 09:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1020.eqiad.wmnet with reason: Maintenance
  • 09:08 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 09:05 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 08:45 jelto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:45 jelto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 08:26 elukey: restart pybal on lvs2010 and lvs2009 to pick up new LVS VIP for ml-staging k8s ingress - T335756
  • 08:26 volans: installed spicerack_7.1.0 on cumin1001
  • 08:22 volans: installed spicerack_7.1.0 on cumin2002
  • 08:08 volans: uploaded spicerack_7.1.0 to apt.wikimedia.org bullseye-wikimedia
  • 05:36 _joe_: building bookworm image for the first time T335560

2023-05-12

  • 22:59 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudswift1001.eqiad.wmnet with OS bullseye
  • 22:35 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye
  • 22:34 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:34 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update cloudswift ip address - pt1979@cumin2002"
  • 22:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update cloudswift ip address - pt1979@cumin2002"
  • 22:32 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudswift1001.eqiad.wmnet with OS bullseye
  • 22:31 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 22:18 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye
  • 21:05 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1001.eqiad.wmnet with OS buster
  • 21:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS buster
  • 20:08 mutante: gerrit1001 - systemctl mask gerrit T326368
  • 18:13 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1001.eqiad.wmnet with OS bullseye
  • 18:13 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye
  • 18:08 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1001.eqiad.wmnet with OS bullseye
  • 18:08 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye
  • 18:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudswift1001']
  • 17:59 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudswift1001']
  • 17:59 sukhe: running authdns-update for CR 919388
  • 17:31 sukhe@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767 (duration: 150m 34s)
  • 17:27 sukhe: set routing-options static route 208.80.153.240/28 [high-traffic2, codfw] next-hop 10.192.16.140: T326767
  • 17:21 sukhe: restart pybal on lvs2012 to pick up bgp med change: T326767
  • 17:11 sukhe: homer "cr*-codfw*" commit "Gerrit: 917924 add new LVS host lvs2012": T326767
  • 17:10 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye
  • 16:48 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2012.codfw.wmnet with OS bullseye
  • 16:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1001.eqiad.wmnet with OS bullseye
  • 16:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2012.codfw.wmnet with reason: host reimage
  • 16:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on lists1003.wikimedia.org with reason: maintenance
  • 16:26 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on lists1003.wikimedia.org with reason: maintenance
  • 16:23 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2012.codfw.wmnet with reason: host reimage
  • 16:08 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
  • 16:08 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2012.codfw.wmnet with OS bullseye
  • 16:03 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
  • 16:00 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2012.codfw.wmnet with OS bullseye
  • 15:54 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
  • 15:54 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2012.codfw.wmnet with OS bullseye
  • 15:53 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
  • 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye
  • 15:09 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2012.codfw.wmnet with OS bullseye
  • 15:08 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
  • 15:02 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dns2001.wikimedia.org
  • 15:02 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:01 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 15:01 sukhe@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767
  • 14:56 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns2001.wikimedia.org
  • 14:39 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:38 cdanis: silencing jobrunner/videoscaler probes for the weekend -- silence ID 21903b52-047b-43d9-94be-908a4b92b5a7
  • 14:38 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 14:36 cdanis: silencing jobrunner/videoscaler probes for the weekend
  • 14:35 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dns2001.wikimedia.wmnet
  • 14:35 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:35 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns2001.wikimedia.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 14:34 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns2001.wikimedia.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 14:29 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 14:24 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns2001.wikimedia.wmnet
  • 14:15 sukhe: [done] homer "cr*-codfw*" commit "Gerrit: 917364 remove to-be decommissioned host dns2001": T335777
  • 14:13 sukhe: homer "cr*-codfw*" commit "Gerrit: 917364 remove to-be decommissioned host dns2001": T335777
  • 13:54 sukhe: enable puppet and run agent in A:dns-rec: done deploying CR 919067
  • 13:38 sukhe: disable puppet on A:dns-rec to merge CR 919067
  • 13:22 sukhe: sudo cumin -b1 -s1200 'A:cp and A:eqiad' 'varnish-frontend-restart': T253093
  • 13:06 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 13:06 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 12:46 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 12:45 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 12:37 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 12:26 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 11:58 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 11:56 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48215 and previous config saved to /var/cache/conftool/dbconfig/20230512-113514-root.json
  • 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48213 and previous config saved to /var/cache/conftool/dbconfig/20230512-112010-root.json
  • 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48212 and previous config saved to /var/cache/conftool/dbconfig/20230512-110505-root.json
  • 10:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48211 and previous config saved to /var/cache/conftool/dbconfig/20230512-105000-root.json
  • 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48210 and previous config saved to /var/cache/conftool/dbconfig/20230512-103455-root.json
  • 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48209 and previous config saved to /var/cache/conftool/dbconfig/20230512-101950-root.json
  • 10:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2184.codfw.wmnet with reason: Maintenance
  • 10:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2184.codfw.wmnet with reason: Maintenance
  • 10:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 10:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48208 and previous config saved to /var/cache/conftool/dbconfig/20230512-100446-root.json
  • 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48206 and previous config saved to /var/cache/conftool/dbconfig/20230512-094941-root.json
  • 09:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2131.codfw.wmnet with reason: Maintenance
  • 09:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2131.codfw.wmnet with reason: Maintenance
  • 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2131', diff saved to https://phabricator.wikimedia.org/P48205 and previous config saved to /var/cache/conftool/dbconfig/20230512-093950-root.json
  • 09:18 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.8 refs T330214
  • 09:13 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name=mw1494.eqiad.wmnet
  • 09:13 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name=mw146[7-9].eqiad.wmnet
  • 09:08 hashar@deploy1002: Finished scap: Backport for Reset the cached skin in RequestContext::setUser() (T336504) (duration: 16m 27s)
  • 08:54 hashar@deploy1002: hashar: Backport for Reset the cached skin in RequestContext::setUser() (T336504) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 08:52 hashar@deploy1002: Started scap: Backport for Reset the cached skin in RequestContext::setUser() (T336504)
  • 08:03 _joe_: restarting envoy on all jobrunners pooled in the jobrunner cluster T336554
  • 08:00 _joe_: do it also on mw1438
  • 07:59 _joe_: restaring envoyproxy on mw1439 to rebalance connections (see T336554)
  • 07:57 taavi@deploy1002: Finished scap: Backport for Disable Graph (again) (T336556) (duration: 12m 29s)
  • 07:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 13335
  • 07:46 taavi@deploy1002: taavi: Backport for Disable Graph (again) (T336556) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 07:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 13335
  • 07:45 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
  • 07:44 taavi@deploy1002: Started scap: Backport for Disable Graph (again) (T336556)
  • 07:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 20940
  • 07:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 20940
  • 07:28 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
  • 07:27 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox-canary
  • 07:27 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox-canary
  • 05:33 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name=mw1495.eqiad.wmnet
  • 05:32 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name=mw1466.eqiad.wmnet
  • 05:32 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name=mw1458.eqiad.wmnet
  • 05:31 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name=mw1461.eqiad.wmnet
  • 02:33 ejegg: payments-wiki upgraded from d1c5fefc to c0da741f
  • 02:32 ejegg: SmashPig upgraded from a9fa7a2c to 5460dbe2
  • 01:08 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts prometheus6001.drmrs.wmnet
  • 01:08 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:08 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: prometheus6001.drmrs.wmnet decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
  • 01:07 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: prometheus6001.drmrs.wmnet decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
  • 01:01 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 00:57 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts prometheus6001.drmrs.wmnet
  • 00:51 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts prometheus5001.eqsin.wmnet
  • 00:51 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:51 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: prometheus5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
  • 00:50 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: prometheus5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
  • 00:48 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 00:44 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts prometheus5001.eqsin.wmnet
  • 00:32 denisse: manually removing prometheus4001.ulsfo.wmnet from the Ganeti master after a failed step in the decommission cookbook - T335585
  • 00:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on prometheus3001.esams.wmnet with reason: maintenance
  • 00:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on prometheus3001.esams.wmnet with reason: maintenance

2023-05-11

  • 23:39 mutante: LDAP - added uid lorenjohnson to groups wmde nda T335858
  • 23:39 mutante: LDAP - added uid roti to groups wmde and nda T336435
  • 23:24 rzl@cumin2002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw14(3[789]|4[056]|57)\.eqiad\.wmnet
  • 23:22 rzl@cumin2002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw14(5[89]|6[016789]|9[45])\.eqiad\.wmnet
  • 23:22 rzl@cumin2002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw14(3[789]|4[056]57)\.eqiad\.wmnet
  • 23:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudswift1002']
  • 22:46 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudswift1002']
  • 22:41 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudswift1001']
  • 22:23 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudswift1001']
  • 21:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudswift1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:45 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudswift1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudswift1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:10 eevans@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe-eqiad
  • 21:07 eevans@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe-eqiad
  • 21:07 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts db1225.eqiad.wmnet
  • 21:07 eevans@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe-codfw
  • 21:06 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts db1225.eqiad.wmnet
  • 21:05 eevans@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe-codfw
  • 20:58 urbanecm@deploy1002: Finished scap: Backport for Personalized praise: Do not suggest users with Homepage disabled (T336300), Personalized praise: Do not suggest users with Homepage disabled (T336300) (duration: 07m 30s)
  • 20:52 urbanecm@deploy1002: urbanecm: Backport for Personalized praise: Do not suggest users with Homepage disabled (T336300), Personalized praise: Do not suggest users with Homepage disabled (T336300) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 20:51 urbanecm@deploy1002: Started scap: Backport for Personalized praise: Do not suggest users with Homepage disabled (T336300), Personalized praise: Do not suggest users with Homepage disabled (T336300)
  • 20:50 urbanecm@deploy1002: Finished scap: Backport for [Growth] Remove config variables provided by extension (duration: 20m 04s)
  • 20:37 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts prometheus4001.ulsfo.wment
  • 20:37 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:36 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 20:32 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts prometheus4001.ulsfo.wment
  • 20:31 urbanecm@deploy1002: urbanecm: Backport for [Growth] Remove config variables provided by extension synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:30 urbanecm@deploy1002: Started scap: Backport for [Growth] Remove config variables provided by extension
  • 20:24 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudswift1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:22 thcipriani@deploy1002: Finished scap: Backport for Allow http://localhost callback URL (T299737) (duration: 09m 37s)
  • 20:22 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts prometheus3001.esams.wment
  • 20:22 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:21 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: prometheus3001.esams.wment decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
  • 20:20 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: prometheus3001.esams.wment decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
  • 20:18 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 20:17 denisse: manually remove prometheus3001.esams.wmnet from the ganeti master after a failed step in the decommission cookbook.
  • 20:14 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts prometheus3001.esams.wment
  • 20:14 thcipriani@deploy1002: bd808 and thcipriani: Backport for Allow http://localhost callback URL (T299737) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:12 thcipriani@deploy1002: Started scap: Backport for Allow http://localhost callback URL (T299737)
  • 19:56 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts prometheus3001.esams.wment
  • 19:56 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:55 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 19:51 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts prometheus3001.esams.wment
  • 19:06 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 19:06 bking@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 18:46 ejegg: civicrm upgraded from d8a1a562 to db6e8d69
  • 17:46 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-airflow1006.eqiad.wmnet with reason: Silence error notifications/alerts during setup
  • 17:46 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on an-airflow1006.eqiad.wmnet with reason: Silence error notifications/alerts during setup
  • 17:24 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 17:12 brennen@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.8 refs T330214 (duration: 06m 14s)
  • 17:12 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 17:11 bking@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 17:10 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 17:08 bking@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 17:07 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 17:06 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 17:06 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 17:06 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.8 refs T330214
  • 17:05 bking@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 17:01 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 17:00 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 16:58 bking@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 16:58 bking@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 16:57 bking@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 16:56 bking@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 16:56 bking@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:56 bking@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 16:56 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 16:56 bking@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 16:55 bking@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 16:54 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 16:54 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 16:50 hashar: CI / Zuul was slow to report build results back to Gerrit most probably due to lack of IPv6 (T336524) which should be solved now.
  • 16:48 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 16:48 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 16:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2020 (T335845)', diff saved to https://phabricator.wikimedia.org/P48203 and previous config saved to /var/cache/conftool/dbconfig/20230511-164125-ladsgroup.json
  • 16:37 brennen: train 1.41.0-wmf.8 (T330214): rolling back to group1 to test for T336504 presence/absence on enwiki
  • 16:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2020', diff saved to https://phabricator.wikimedia.org/P48201 and previous config saved to /var/cache/conftool/dbconfig/20230511-162619-ladsgroup.json
  • 16:16 elukey: benthos webrequest live instances migrated to kafka-franz (new consumer client, data may have some holes) - T331801
  • 16:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2020', diff saved to https://phabricator.wikimedia.org/P48200 and previous config saved to /var/cache/conftool/dbconfig/20230511-161113-ladsgroup.json
  • 16:08 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab2002.wikimedia.org with OS bullseye
  • 16:01 Amir1: Removing db1110 from zarcillo T335011
  • 16:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1110.eqiad.wmnet
  • 16:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1110.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
  • 15:58 ladsgroup@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1110.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
  • 15:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2020 (T335845)', diff saved to https://phabricator.wikimedia.org/P48199 and previous config saved to /var/cache/conftool/dbconfig/20230511-155607-ladsgroup.json
  • 15:49 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
  • 15:48 hashar: CI back up and fully operation (after the Gerrit upgrade)
  • 15:48 mutante: gerrit maintenance period ended - gerrit switched to new hardware, IP and distro version
  • 15:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2020 (T335845)', diff saved to https://phabricator.wikimedia.org/P48198 and previous config saved to /var/cache/conftool/dbconfig/20230511-154533-ladsgroup.json
  • 15:45 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2002.wikimedia.org with reason: host reimage
  • 15:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2020.codfw.wmnet with reason: Maintenance
  • 15:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2020.codfw.wmnet with reason: Maintenance
  • 15:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1110.eqiad.wmnet
  • 15:42 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab2002.wikimedia.org with reason: host reimage
  • 15:27 sukhe: [done] running homer for CR 919151: resolve connection issues to gerrit.wikimedia.org
  • 15:27 eoghan@cumin1001: START - Cookbook sre.hosts.reimage for host gitlab2002.wikimedia.org with OS bullseye
  • 15:21 sukhe: running homer for CR 919151: resolve connection issues to gerrit.wikimedia.org
  • 15:18 urandom: altering image_suggestions schema (generated data platform) — T336424
  • 14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2024 (T335845)', diff saved to https://phabricator.wikimedia.org/P48197 and previous config saved to /var/cache/conftool/dbconfig/20230511-144959-ladsgroup.json
  • 14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2024', diff saved to https://phabricator.wikimedia.org/P48195 and previous config saved to /var/cache/conftool/dbconfig/20230511-143453-ladsgroup.json
  • 14:27 moritzm: installing avahi security updates
  • 14:26 sukhe@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs2012
  • 14:26 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs2012
  • 14:25 bking@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 14:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1003.wikimedia.org with reason: maintenance
  • 14:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1003.wikimedia.org with reason: maintenance
  • 14:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2024', diff saved to https://phabricator.wikimedia.org/P48194 and previous config saved to /var/cache/conftool/dbconfig/20230511-141947-ladsgroup.json
  • 14:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on gerrit1001.wikimedia.org with reason: maintenance
  • 14:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on gerrit1001.wikimedia.org with reason: maintenance
  • 14:15 bking@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:15 bking@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:15 sukhe: sudo cumin -b1 -s1200 'A:cp and A:codfw' 'varnish-frontend-restart': T253093
  • 14:11 bking@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 14:09 bking@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:08 bking@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:08 thcipriani: starting Gerrit Switchover (Take II): The Reckoning
  • 14:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2024 (T335845)', diff saved to https://phabricator.wikimedia.org/P48192 and previous config saved to /var/cache/conftool/dbconfig/20230511-140440-ladsgroup.json
  • 13:57 elukey: upgrade benthos (4.9.1 -> 4.15.0) on centrallog nodes - T331801
  • 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2024 (T335845)', diff saved to https://phabricator.wikimedia.org/P48191 and previous config saved to /var/cache/conftool/dbconfig/20230511-135335-ladsgroup.json
  • 13:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2024.codfw.wmnet with reason: Maintenance
  • 13:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2024.codfw.wmnet with reason: Maintenance
  • 13:49 moritzm: uploaded wmf-laptop 0.5.7 to component/wmf-sre-laptop
  • 13:41 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm2002.codfw.wmnet with OS bullseye
  • 13:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:29 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
  • 13:26 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:26 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:26 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
  • 13:23 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:23 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:22 elukey: upload benthos 4.15.0-1 to {buster,bullseye}-wikimedia - T331801
  • 13:13 slyngshede@cumin1001: START - Cookbook sre.hosts.reimage for host testvm2002.codfw.wmnet with OS bullseye
  • 13:07 filippo@cumin1001: conftool action : set/weight=100; selector: name=thanos-fe2004.codfw.wmnet
  • 13:07 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe2004.codfw.wmnet,service=thanos-web
  • 13:07 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=thanos-fe2004.codfw.wmnet
  • 13:07 filippo@cumin1001: conftool action : set/pooled=true; selector: name=thanos-fe2004.eqiad.wmnet
  • 13:06 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=thanos-fe2004.eqiad.wmnet
  • 13:06 filippo@cumin1001: conftool action : set/weight=100; selector: name=thanos-fe2004.eqiad.wmnet
  • 13:05 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe1004.eqiad.wmnet,service=thanos-web
  • 13:05 filippo@cumin1001: conftool action : set/weight=100; selector: name=thanos-fe1004.eqiad.wmnet
  • 12:58 ladsgroup@deploy1002: Finished scap: Backport for Add outreachwiki to wikidataclient dblist (T171140) (duration: 11m 05s)
  • 12:54 godog: roll-restart thanos-fe swift-proxy to apply config changes - T336348
  • 12:48 ladsgroup@deploy1002: ladsgroup: Backport for Add outreachwiki to wikidataclient dblist (T171140) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 12:47 ladsgroup@deploy1002: Started scap: Backport for Add outreachwiki to wikidataclient dblist (T171140)
  • 12:41 Amir1: creating wikidata client tables for outreachwiki (T171140)
  • 12:18 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab2002.wikimedia.org with OS bullseye
  • 12:01 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade
  • 11:57 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2002.wikimedia.org with reason: host reimage
  • 11:54 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab2002.wikimedia.org with reason: host reimage
  • 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48190 and previous config saved to /var/cache/conftool/dbconfig/20230511-115201-root.json
  • 11:39 eoghan@cumin1001: START - Cookbook sre.hosts.reimage for host gitlab2002.wikimedia.org with OS bullseye
  • 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48189 and previous config saved to /var/cache/conftool/dbconfig/20230511-113657-root.json
  • 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48187 and previous config saved to /var/cache/conftool/dbconfig/20230511-112152-root.json
  • 11:08 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade
  • 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48186 and previous config saved to /var/cache/conftool/dbconfig/20230511-110647-root.json
  • 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48185 and previous config saved to /var/cache/conftool/dbconfig/20230511-105142-root.json
  • 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48184 and previous config saved to /var/cache/conftool/dbconfig/20230511-103638-root.json
  • 10:24 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 10:23 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48183 and previous config saved to /var/cache/conftool/dbconfig/20230511-102133-root.json
  • 10:17 moritzm: installing modsecurity-crs security updates
  • 10:10 moritzm: installing protobuf security updates
  • 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48182 and previous config saved to /var/cache/conftool/dbconfig/20230511-100628-root.json
  • 09:35 moritzm: installing distro-info-data updates on buster
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1137', diff saved to https://phabricator.wikimedia.org/P48181 and previous config saved to /var/cache/conftool/dbconfig/20230511-092848-root.json
  • 08:59 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Install software version upgrade
  • 08:56 jelto@cumin1001: END (ERROR) - Cookbook sre.gitlab.upgrade (exit_code=97) on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade
  • 08:41 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:cassandra-dev
  • 08:40 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade
  • 08:40 elukey: `apt-get clean` on orespoolcounter nodes to free space in the root partition
  • 08:33 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Install software version upgrade
  • 08:13 moritzm: installing Linux 4.19.282 updates on Buster systems
  • 08:08 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.8 refs T330214
  • 08:06 jmm@cumin2002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:cassandra-dev
  • 08:05 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Install software version upgrade
  • 07:43 jmm@cumin2002: END (FAIL) - Cookbook sre.cassandra.roll-reboot (exit_code=1) rolling reboot on A:cassandra-dev
  • 07:43 jmm@cumin2002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:cassandra-dev
  • 07:41 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Install software version upgrade
  • 07:39 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.peering (exit_code=97) with action 'configure' for AS: 2518
  • 07:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 2518
  • 07:14 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 20940
  • 07:13 marostegui@deploy1002: Finished scap: Backport for Revert "ProductionServices.php: Failover pc2 eqiad master" (duration: 07m 41s)
  • 07:07 marostegui@deploy1002: marostegui: Backport for Revert "ProductionServices.php: Failover pc2 eqiad master" synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 07:05 marostegui@deploy1002: Started scap: Backport for Revert "ProductionServices.php: Failover pc2 eqiad master"
  • 06:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast2002.wikimedia.org
  • 06:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast2002.wikimedia.org
  • 06:42 marostegui@deploy1002: Finished scap: Backport for ProductionServices.php: Failover pc2 eqiad master (duration: 08m 23s)
  • 06:36 marostegui@deploy1002: marostegui: Backport for ProductionServices.php: Failover pc2 eqiad master synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 06:34 marostegui@deploy1002: Started scap: Backport for ProductionServices.php: Failover pc2 eqiad master
  • 06:29 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 20940
  • 06:29 marostegui@deploy1002: Finished scap: Backport for Revert "ProductionServices.php: Failover pc2 codfw master" (duration: 08m 12s)
  • 06:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 17676
  • 06:23 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 17676
  • 06:22 marostegui@deploy1002: marostegui: Backport for Revert "ProductionServices.php: Failover pc2 codfw master" synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 06:21 marostegui@deploy1002: Started scap: Backport for Revert "ProductionServices.php: Failover pc2 codfw master"
  • 06:21 XioNoX: Configure/reconfigure 1:1 NAT for new fr-tech hosts (frbast2002, frmon2002) - T336450
  • 06:15 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.peering (exit_code=97) with action 'configure' for AS: 13335
  • 06:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 13335
  • 06:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 714
  • 06:07 marostegui@deploy1002: Finished scap: Backport for ProductionServices.php: Failover pc2 codfw master (duration: 07m 42s)
  • 06:05 kart_: Updated MinT to 2023-05-11-051736-production
  • 06:01 marostegui@deploy1002: marostegui: Backport for ProductionServices.php: Failover pc2 codfw master synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 06:00 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 05:59 marostegui@deploy1002: Started scap: Backport for ProductionServices.php: Failover pc2 codfw master
  • 05:58 marostegui@deploy1002: marostegui: Backport for ProductionServices.php: Failover pc2 codfw master synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 05:58 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 714
  • 05:57 marostegui@deploy1002: Started scap: Backport for ProductionServices.php: Failover pc2 codfw master
  • 05:56 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 05:55 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 05:53 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 05:48 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2139.codfw.wmnet with reason: T335396
  • 05:48 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2139.codfw.wmnet with reason: T335396
  • 05:45 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 05:44 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply

2023-05-10

  • 22:08 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2021.codfw.wmnet with OS buster
  • 21:52 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2021.codfw.wmnet with reason: host reimage
  • 21:49 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2021.codfw.wmnet with reason: host reimage
  • 21:32 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2021.codfw.wmnet with OS buster
  • 21:31 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2021.codfw.wmnet with OS buster
  • 21:31 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2021.codfw.wmnet with OS buster
  • 20:58 ejegg: payments-wiki upgraded from 2125cea7 to d1c5fefc
  • 20:58 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2021.codfw.wmnet with OS bullseye
  • 20:55 milimetric@deploy1002: Finished deploy [airflow-dags/analytics@02d6ac9]: (no justification provided) (duration: 00m 11s)
  • 20:55 milimetric@deploy1002: Started deploy [airflow-dags/analytics@02d6ac9]: (no justification provided)
  • 20:33 hashar@deploy1002: Finished deploy [gerrit/gerrit@e815301]: Gerrit to 3.5.6 on gerrit1003 | T336339 (duration: 00m 06s)
  • 20:33 hashar@deploy1002: Started deploy [gerrit/gerrit@e815301]: Gerrit to 3.5.6 on gerrit1003 | T336339
  • 20:32 cjming: end of UTC late backport window
  • 20:21 cjming@deploy1002: Finished scap: Backport for Remove unnecessary jQuery closure (T324913) (duration: 09m 02s)
  • 20:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T335845)', diff saved to https://phabricator.wikimedia.org/P48177 and previous config saved to /var/cache/conftool/dbconfig/20230510-202014-ladsgroup.json
  • 20:14 cjming@deploy1002: cjming and jdlrobson: Backport for Remove unnecessary jQuery closure (T324913) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:12 cjming@deploy1002: Started scap: Backport for Remove unnecessary jQuery closure (T324913)
  • 20:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P48176 and previous config saved to /var/cache/conftool/dbconfig/20230510-200508-ladsgroup.json
  • 20:01 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2021.codfw.wmnet with OS bullseye
  • 20:00 milimetric@deploy1002: Finished deploy [analytics/refinery@4ccc172] (thin): Regular analytics weekly train THIN [analytics/refinery@4ccc172] (duration: 00m 05s)
  • 20:00 milimetric@deploy1002: Started deploy [analytics/refinery@4ccc172] (thin): Regular analytics weekly train THIN [analytics/refinery@4ccc172]
  • 20:00 milimetric@deploy1002: Finished deploy [analytics/refinery@4ccc172] (thin): Regular analytics weekly train THIN [analytics/refinery@4ccc172] (duration: 00m 26s)
  • 19:59 milimetric@deploy1002: Started deploy [analytics/refinery@4ccc172] (thin): Regular analytics weekly train THIN [analytics/refinery@4ccc172]
  • 19:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P48175 and previous config saved to /var/cache/conftool/dbconfig/20230510-195001-ladsgroup.json
  • 19:47 bking@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=codfw
  • 19:35 milimetric@deploy1002: Finished deploy [analytics/refinery@4ccc172]: Regular analytics weekly train [analytics/refinery@4ccc172] (duration: 40m 28s)
  • 19:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T335845)', diff saved to https://phabricator.wikimedia.org/P48174 and previous config saved to /var/cache/conftool/dbconfig/20230510-193455-ladsgroup.json
  • 19:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1219 (T335845)', diff saved to https://phabricator.wikimedia.org/P48173 and previous config saved to /var/cache/conftool/dbconfig/20230510-192746-ladsgroup.json
  • 19:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 19:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 19:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T335845)', diff saved to https://phabricator.wikimedia.org/P48172 and previous config saved to /var/cache/conftool/dbconfig/20230510-192722-ladsgroup.json
  • 19:25 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Rolling restart to apply Cassandra 3.11.14 upgrade - eevans@cumin1001
  • 19:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P48171 and previous config saved to /var/cache/conftool/dbconfig/20230510-191216-ladsgroup.json
  • 19:08 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Rolling restart to apply Cassandra 3.11.14 upgrade - eevans@cumin1001
  • 19:00 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Rolling restart to apply Cassandra 3.11.14 upgrade - eevans@cumin1001
  • 18:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P48170 and previous config saved to /var/cache/conftool/dbconfig/20230510-185710-ladsgroup.json
  • 18:54 milimetric@deploy1002: Started deploy [analytics/refinery@4ccc172]: Regular analytics weekly train [analytics/refinery@4ccc172]
  • 18:45 sukhe@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767 (duration: 191m 53s)
  • 18:43 ejegg: payments-wiki upgraded from ec5a5e92 to 2125cea7
  • 18:43 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Rolling restart to apply Cassandra 3.11.14 upgrade - eevans@cumin1001
  • 18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T335845)', diff saved to https://phabricator.wikimedia.org/P48169 and previous config saved to /var/cache/conftool/dbconfig/20230510-184202-ladsgroup.json
  • 18:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1218 (T335845)', diff saved to https://phabricator.wikimedia.org/P48168 and previous config saved to /var/cache/conftool/dbconfig/20230510-183441-ladsgroup.json
  • 18:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 18:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 18:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T335845)', diff saved to https://phabricator.wikimedia.org/P48167 and previous config saved to /var/cache/conftool/dbconfig/20230510-183418-ladsgroup.json
  • 18:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P48166 and previous config saved to /var/cache/conftool/dbconfig/20230510-181912-ladsgroup.json
  • 18:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P48165 and previous config saved to /var/cache/conftool/dbconfig/20230510-180406-ladsgroup.json
  • 17:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T335845)', diff saved to https://phabricator.wikimedia.org/P48164 and previous config saved to /var/cache/conftool/dbconfig/20230510-174859-ladsgroup.json
  • 17:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1207 (T335845)', diff saved to https://phabricator.wikimedia.org/P48163 and previous config saved to /var/cache/conftool/dbconfig/20230510-174143-ladsgroup.json
  • 17:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 17:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 17:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T335845)', diff saved to https://phabricator.wikimedia.org/P48162 and previous config saved to /var/cache/conftool/dbconfig/20230510-174119-ladsgroup.json
  • 17:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P48161 and previous config saved to /var/cache/conftool/dbconfig/20230510-172613-ladsgroup.json
  • 17:23 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2012.codfw.wmnet with OS bullseye
  • 17:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2012.codfw.wmnet with reason: host reimage
  • 17:11 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2012.codfw.wmnet with reason: host reimage
  • 17:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P48160 and previous config saved to /var/cache/conftool/dbconfig/20230510-171107-ladsgroup.json
  • 16:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T335845)', diff saved to https://phabricator.wikimedia.org/P48159 and previous config saved to /var/cache/conftool/dbconfig/20230510-165601-ladsgroup.json
  • 16:54 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
  • 16:50 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2012.codfw.wmnet with OS bullseye
  • 16:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1206 (T335845)', diff saved to https://phabricator.wikimedia.org/P48158 and previous config saved to /var/cache/conftool/dbconfig/20230510-164842-ladsgroup.json
  • 16:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 16:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 16:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T335845)', diff saved to https://phabricator.wikimedia.org/P48157 and previous config saved to /var/cache/conftool/dbconfig/20230510-164818-ladsgroup.json
  • 16:36 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
  • 16:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P48156 and previous config saved to /var/cache/conftool/dbconfig/20230510-163312-ladsgroup.json
  • 16:31 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2012.codfw.wmnet with OS bullseye
  • 16:27 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
  • 16:25 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2012.codfw.wmnet with OS bullseye
  • 16:20 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
  • 16:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P48155 and previous config saved to /var/cache/conftool/dbconfig/20230510-161806-ladsgroup.json
  • 16:15 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2012.codfw.wmnet with OS bullseye
  • 16:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2012.codfw.wmnet with reason: host reimage
  • 16:03 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2012.codfw.wmnet with reason: host reimage
  • 16:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T335845)', diff saved to https://phabricator.wikimedia.org/P48154 and previous config saved to /var/cache/conftool/dbconfig/20230510-160258-ladsgroup.json
  • 16:02 sukhe: sudo cumin -b1 -s1200 'A:cp and A:drmrs' 'varnish-frontend-restart': T253093
  • 15:59 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 15:58 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 15:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1196 (T335845)', diff saved to https://phabricator.wikimedia.org/P48153 and previous config saved to /var/cache/conftool/dbconfig/20230510-155429-ladsgroup.json
  • 15:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 15:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 15:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T335845)', diff saved to https://phabricator.wikimedia.org/P48152 and previous config saved to /var/cache/conftool/dbconfig/20230510-155357-ladsgroup.json
  • 15:48 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
  • 15:47 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2012.codfw.wmnet with OS bullseye
  • 15:43 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
  • 15:42 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2012.codfw.wmnet with OS bullseye
  • 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P48151 and previous config saved to /var/cache/conftool/dbconfig/20230510-153851-ladsgroup.json
  • 15:37 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
  • 15:33 sukhe@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767
  • 15:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P48150 and previous config saved to /var/cache/conftool/dbconfig/20230510-152345-ladsgroup.json
  • 15:17 sukhe: running authdns-update for CR 918527
  • 15:16 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 15:16 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 15:14 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 15:14 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 15:12 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 15:12 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T335845)', diff saved to https://phabricator.wikimedia.org/P48149 and previous config saved to /var/cache/conftool/dbconfig/20230510-150838-ladsgroup.json
  • 15:00 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 15:00 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 15:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1186 (T335845)', diff saved to https://phabricator.wikimedia.org/P48148 and previous config saved to /var/cache/conftool/dbconfig/20230510-150009-ladsgroup.json
  • 15:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 14:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T335845)', diff saved to https://phabricator.wikimedia.org/P48147 and previous config saved to /var/cache/conftool/dbconfig/20230510-145946-ladsgroup.json
  • 14:58 cwhite: install vopsbot 0.3.4 on alert2001 T329791
  • 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P48146 and previous config saved to /var/cache/conftool/dbconfig/20230510-144440-ladsgroup.json
  • 14:44 moritzm: restarting FPM/Apache on mw canaries to pick up libxml2 updates
  • 14:41 moritzm: installing libxml2 security updates on buster
  • 14:40 thcipriani: stopping gerrit on gerrit1001
  • 14:40 thcipriani: stopping gerrit on gerrit1003
  • 14:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1003.wikimedia.org with reason: migration
  • 14:37 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1003.wikimedia.org with reason: migration
  • 14:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on gerrit1001.wikimedia.org with reason: migration
  • 14:37 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on gerrit1001.wikimedia.org with reason: migration
  • 14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P48145 and previous config saved to /var/cache/conftool/dbconfig/20230510-142934-ladsgroup.json
  • 14:26 thcipriani: gerrit1003 switchover happening
  • 14:25 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 14:25 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T335845)', diff saved to https://phabricator.wikimedia.org/P48144 and previous config saved to /var/cache/conftool/dbconfig/20230510-141427-ladsgroup.json
  • 14:08 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
  • 14:08 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
  • 14:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T335845)', diff saved to https://phabricator.wikimedia.org/P48143 and previous config saved to /var/cache/conftool/dbconfig/20230510-140708-ladsgroup.json
  • 14:07 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox-canary
  • 14:07 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox-canary
  • 14:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 14:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 14:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T335845)', diff saved to https://phabricator.wikimedia.org/P48142 and previous config saved to /var/cache/conftool/dbconfig/20230510-140644-ladsgroup.json
  • 14:02 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 13:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P48140 and previous config saved to /var/cache/conftool/dbconfig/20230510-135138-ladsgroup.json
  • 13:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P48139 and previous config saved to /var/cache/conftool/dbconfig/20230510-133632-ladsgroup.json
  • 13:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T335845)', diff saved to https://phabricator.wikimedia.org/P48138 and previous config saved to /var/cache/conftool/dbconfig/20230510-132126-ladsgroup.json
  • 13:19 taavi@deploy1002: Finished scap: Backport for [arwikisource] Replace the current logo with an identical HD version (T336193) (duration: 08m 00s)
  • 13:15 _joe_: rolling back vopsbot to 0.3.3
  • 13:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T335845)', diff saved to https://phabricator.wikimedia.org/P48137 and previous config saved to /var/cache/conftool/dbconfig/20230510-131412-ladsgroup.json
  • 13:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 13:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 13:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T335845)', diff saved to https://phabricator.wikimedia.org/P48136 and previous config saved to /var/cache/conftool/dbconfig/20230510-131347-ladsgroup.json
  • 13:13 taavi@deploy1002: superpes and taavi: Backport for [arwikisource] Replace the current logo with an identical HD version (T336193) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 13:11 taavi@deploy1002: Started scap: Backport for [arwikisource] Replace the current logo with an identical HD version (T336193)
  • 13:06 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2002.codfw.wmnet
  • 13:03 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host netbox-dev2002.codfw.wmnet
  • 12:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P48135 and previous config saved to /var/cache/conftool/dbconfig/20230510-125840-ladsgroup.json
  • 12:56 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp2002.wikimedia.org
  • 12:52 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host idp2002.wikimedia.org
  • 12:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P48134 and previous config saved to /var/cache/conftool/dbconfig/20230510-124334-ladsgroup.json
  • 12:30 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 12:30 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 12:29 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 12:29 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 12:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T335845)', diff saved to https://phabricator.wikimedia.org/P48133 and previous config saved to /var/cache/conftool/dbconfig/20230510-122828-ladsgroup.json
  • 12:27 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 12:27 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 12:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T335845)', diff saved to https://phabricator.wikimedia.org/P48132 and previous config saved to /var/cache/conftool/dbconfig/20230510-122316-ladsgroup.json
  • 12:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 12:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 12:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T335845)', diff saved to https://phabricator.wikimedia.org/P48131 and previous config saved to /var/cache/conftool/dbconfig/20230510-122253-ladsgroup.json
  • 12:13 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 12:13 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 12:12 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 12:11 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 12:10 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 12:10 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 12:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P48129 and previous config saved to /var/cache/conftool/dbconfig/20230510-120747-ladsgroup.json
  • 11:58 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 11:58 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 11:57 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 11:57 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 11:56 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 11:56 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 11:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P48128 and previous config saved to /var/cache/conftool/dbconfig/20230510-115241-ladsgroup.json
  • 11:49 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 11:49 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 11:46 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 11:46 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 11:43 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 11:43 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host testvm2005.codfw.wmnet with OS bookworm
  • 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T335845)', diff saved to https://phabricator.wikimedia.org/P48127 and previous config saved to /var/cache/conftool/dbconfig/20230510-113734-ladsgroup.json
  • 11:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T335845)', diff saved to https://phabricator.wikimedia.org/P48126 and previous config saved to /var/cache/conftool/dbconfig/20230510-113215-ladsgroup.json
  • 11:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 11:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 11:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2005.codfw.wmnet with reason: host reimage
  • 11:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 11:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T335845)', diff saved to https://phabricator.wikimedia.org/P48125 and previous config saved to /var/cache/conftool/dbconfig/20230510-112855-ladsgroup.json
  • 11:26 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2005.codfw.wmnet with reason: host reimage
  • 11:18 _joe_: installing vopsbot 0.3.4 on alert1001 T329791
  • 11:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P48124 and previous config saved to /var/cache/conftool/dbconfig/20230510-111349-ladsgroup.json
  • 11:11 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2005.codfw.wmnet with OS bookworm
  • 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P48123 and previous config saved to /var/cache/conftool/dbconfig/20230510-105843-ladsgroup.json
  • 10:52 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye
  • 10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T335845)', diff saved to https://phabricator.wikimedia.org/P48122 and previous config saved to /var/cache/conftool/dbconfig/20230510-104337-ladsgroup.json
  • 10:38 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 10:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T335845)', diff saved to https://phabricator.wikimedia.org/P48121 and previous config saved to /var/cache/conftool/dbconfig/20230510-103712-ladsgroup.json
  • 10:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 10:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 10:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T335845)', diff saved to https://phabricator.wikimedia.org/P48120 and previous config saved to /var/cache/conftool/dbconfig/20230510-103649-ladsgroup.json
  • 10:26 Amir1: Removing db1113 from zarcillo T336029
  • 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1220 (T335845)', diff saved to https://phabricator.wikimedia.org/P48119 and previous config saved to /var/cache/conftool/dbconfig/20230510-102302-ladsgroup.json
  • 10:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P48118 and previous config saved to /var/cache/conftool/dbconfig/20230510-102142-ladsgroup.json
  • 10:21 Amir1: start of clean up of echo notification in wikidatawiki (T318523)
  • 10:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1113.eqiad.wmnet
  • 10:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1113.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
  • 10:16 ladsgroup@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1113.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
  • 10:13 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
  • 10:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1113.eqiad.wmnet
  • 10:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1220', diff saved to https://phabricator.wikimedia.org/P48117 and previous config saved to /var/cache/conftool/dbconfig/20230510-100756-ladsgroup.json
  • 10:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P48116 and previous config saved to /var/cache/conftool/dbconfig/20230510-100636-ladsgroup.json
  • 10:01 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-fe2004.codfw.wmnet
  • 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48115 and previous config saved to /var/cache/conftool/dbconfig/20230510-095309-root.json
  • 09:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1220', diff saved to https://phabricator.wikimedia.org/P48114 and previous config saved to /var/cache/conftool/dbconfig/20230510-095250-ladsgroup.json
  • 09:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T335845)', diff saved to https://phabricator.wikimedia.org/P48113 and previous config saved to /var/cache/conftool/dbconfig/20230510-095130-ladsgroup.json
  • 09:50 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2004.codfw.wmnet
  • 09:50 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-fe1004.eqiad.wmnet
  • 09:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T335845)', diff saved to https://phabricator.wikimedia.org/P48112 and previous config saved to /var/cache/conftool/dbconfig/20230510-094452-ladsgroup.json
  • 09:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 09:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 09:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T335845)', diff saved to https://phabricator.wikimedia.org/P48111 and previous config saved to /var/cache/conftool/dbconfig/20230510-094429-ladsgroup.json
  • 09:39 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1004.eqiad.wmnet
  • 09:38 daniel@deploy1002: Finished scap: Backport for Enable parser cache warming jobs for parsoid on medium wikis (T329366) (duration: 08m 10s)
  • 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48110 and previous config saved to /var/cache/conftool/dbconfig/20230510-093804-root.json
  • 09:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1220 (T335845)', diff saved to https://phabricator.wikimedia.org/P48109 and previous config saved to /var/cache/conftool/dbconfig/20230510-093743-ladsgroup.json
  • 09:31 daniel@deploy1002: daniel: Backport for Enable parser cache warming jobs for parsoid on medium wikis (T329366) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48108 and previous config saved to /var/cache/conftool/dbconfig/20230510-093128-root.json
  • 09:30 daniel@deploy1002: Started scap: Backport for Enable parser cache warming jobs for parsoid on medium wikis (T329366)
  • 09:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P48107 and previous config saved to /var/cache/conftool/dbconfig/20230510-092923-ladsgroup.json
  • 09:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1220 (T335845)', diff saved to https://phabricator.wikimedia.org/P48106 and previous config saved to /var/cache/conftool/dbconfig/20230510-092531-ladsgroup.json
  • 09:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1220.eqiad.wmnet with reason: Maintenance
  • 09:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1220.eqiad.wmnet with reason: Maintenance
  • 09:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137 (T335845)', diff saved to https://phabricator.wikimedia.org/P48105 and previous config saved to /var/cache/conftool/dbconfig/20230510-092507-ladsgroup.json
  • 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48104 and previous config saved to /var/cache/conftool/dbconfig/20230510-092259-root.json
  • 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48103 and previous config saved to /var/cache/conftool/dbconfig/20230510-091624-root.json
  • 09:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P48102 and previous config saved to /var/cache/conftool/dbconfig/20230510-091417-ladsgroup.json
  • 09:12 jmm@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet
  • 09:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137', diff saved to https://phabricator.wikimedia.org/P48101 and previous config saved to /var/cache/conftool/dbconfig/20230510-091001-ladsgroup.json
  • 09:09 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48100 and previous config saved to /var/cache/conftool/dbconfig/20230510-090755-root.json
  • 09:06 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage
  • 09:01 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet
  • 09:01 hashar: Gerrit restarted at version 3.5.6 | T336339
  • 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48099 and previous config saved to /var/cache/conftool/dbconfig/20230510-090119-root.json
  • 08:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T335845)', diff saved to https://phabricator.wikimedia.org/P48098 and previous config saved to /var/cache/conftool/dbconfig/20230510-085910-ladsgroup.json
  • 08:57 hashar@deploy1002: Finished deploy [gerrit/gerrit@67ba7ab]: Gerrit to 3.5.6 on gerrit1001 | T336339 (duration: 00m 05s)
  • 08:57 hashar@deploy1002: Started deploy [gerrit/gerrit@67ba7ab]: Gerrit to 3.5.6 on gerrit1001 | T336339
  • 08:56 hashar@deploy1002: Finished deploy [gerrit/gerrit@67ba7ab]: Gerrit to 3.5.6 on gerrit1001 | T336339 (duration: 00m 09s)
  • 08:56 hashar@deploy1002: Started deploy [gerrit/gerrit@67ba7ab]: Gerrit to 3.5.6 on gerrit1001 | T336339
  • 08:55 hashar: Stopping Gerrit for 3.5.5 > 3.5.6 upgrade T336339
  • 08:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137', diff saved to https://phabricator.wikimedia.org/P48097 and previous config saved to /var/cache/conftool/dbconfig/20230510-085455-ladsgroup.json
  • 08:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1118 (T335845)', diff saved to https://phabricator.wikimedia.org/P48096 and previous config saved to /var/cache/conftool/dbconfig/20230510-085330-ladsgroup.json
  • 08:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 08:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48095 and previous config saved to /var/cache/conftool/dbconfig/20230510-085250-root.json
  • 08:51 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye
  • 08:49 hashar@deploy1002: Finished deploy [gerrit/gerrit@67ba7ab]: Gerrit to 3.5.6 on gerrit2002 | T336339 (duration: 00m 07s)
  • 08:49 hashar@deploy1002: Started deploy [gerrit/gerrit@67ba7ab]: Gerrit to 3.5.6 on gerrit2002 | T336339
  • 08:48 hashar: deploy1002: git reset `/srv/deployment/gerrit/gerrit` which had bunch of locally modified files for some reason # T336339
  • 08:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 08:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48094 and previous config saved to /var/cache/conftool/dbconfig/20230510-084614-root.json
  • 08:40 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
  • 08:40 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
  • 08:39 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox-canary
  • 08:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137 (T335845)', diff saved to https://phabricator.wikimedia.org/P48093 and previous config saved to /var/cache/conftool/dbconfig/20230510-083948-ladsgroup.json
  • 08:39 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox-canary
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48092 and previous config saved to /var/cache/conftool/dbconfig/20230510-083745-root.json
  • 08:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1137 (T335845)', diff saved to https://phabricator.wikimedia.org/P48091 and previous config saved to /var/cache/conftool/dbconfig/20230510-083253-ladsgroup.json
  • 08:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1137.eqiad.wmnet with reason: Maintenance
  • 08:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1137.eqiad.wmnet with reason: Maintenance
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48090 and previous config saved to /var/cache/conftool/dbconfig/20230510-083109-root.json
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48089 and previous config saved to /var/cache/conftool/dbconfig/20230510-082240-root.json
  • 08:21 hashar@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.8 refs T330214 (duration: 05m 55s)
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48088 and previous config saved to /var/cache/conftool/dbconfig/20230510-081605-root.json
  • 08:15 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.8 refs T330214
  • 08:14 godog: re-enable eqsin remote syslog towards centrallog - T336345
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48087 and previous config saved to /var/cache/conftool/dbconfig/20230510-080736-root.json
  • 08:03 jmm@cumin2002: END (PASS) - Cookbook sre.netbox.restart-reboot (exit_code=0) rolling reboot on A:netbox
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 3%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48086 and previous config saved to /var/cache/conftool/dbconfig/20230510-080100-root.json
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48085 and previous config saved to /var/cache/conftool/dbconfig/20230510-080003-root.json
  • 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48084 and previous config saved to /var/cache/conftool/dbconfig/20230510-075957-root.json
  • 07:50 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox.discovery.wmnet. on all recursors
  • 07:49 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netbox.discovery.wmnet. on all recursors
  • 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host testvm2005.codfw.wmnet with OS bookworm
  • 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 1%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48083 and previous config saved to /var/cache/conftool/dbconfig/20230510-074555-root.json
  • 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox.discovery.wmnet. on all recursors
  • 07:45 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netbox.discovery.wmnet. on all recursors
  • 07:45 jmm@cumin2002: START - Cookbook sre.netbox.restart-reboot rolling reboot on A:netbox
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48082 and previous config saved to /var/cache/conftool/dbconfig/20230510-074458-root.json
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48081 and previous config saved to /var/cache/conftool/dbconfig/20230510-074452-root.json
  • 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2117 T334650', diff saved to https://phabricator.wikimedia.org/P48080 and previous config saved to /var/cache/conftool/dbconfig/20230510-074237-root.json
  • 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48079 and previous config saved to /var/cache/conftool/dbconfig/20230510-073833-root.json
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48078 and previous config saved to /var/cache/conftool/dbconfig/20230510-072954-root.json
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48077 and previous config saved to /var/cache/conftool/dbconfig/20230510-072948-root.json
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48076 and previous config saved to /var/cache/conftool/dbconfig/20230510-072329-root.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48075 and previous config saved to /var/cache/conftool/dbconfig/20230510-071449-root.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48074 and previous config saved to /var/cache/conftool/dbconfig/20230510-071443-root.json
  • 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48073 and previous config saved to /var/cache/conftool/dbconfig/20230510-070824-root.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48072 and previous config saved to /var/cache/conftool/dbconfig/20230510-065944-root.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48071 and previous config saved to /var/cache/conftool/dbconfig/20230510-065938-root.json
  • 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48070 and previous config saved to /var/cache/conftool/dbconfig/20230510-065319-root.json
  • 06:52 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2005.codfw.wmnet with OS bookworm
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48069 and previous config saved to /var/cache/conftool/dbconfig/20230510-064439-root.json
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48068 and previous config saved to /var/cache/conftool/dbconfig/20230510-064433-root.json
  • 06:44 marostegui: dbmaint eqiad failover s3 sanitarium master T336252
  • 06:41 marostegui@cumin2002: dbctl commit (dc=all): 'Depool db1112 db1212 T336252', diff saved to https://phabricator.wikimedia.org/P48067 and previous config saved to /var/cache/conftool/dbconfig/20230510-064119-marostegui.json
  • 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48066 and previous config saved to /var/cache/conftool/dbconfig/20230510-063814-root.json
  • 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48065 and previous config saved to /var/cache/conftool/dbconfig/20230510-062309-root.json
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 3%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48064 and previous config saved to /var/cache/conftool/dbconfig/20230510-060805-root.json
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2180', diff saved to https://phabricator.wikimedia.org/P48063 and previous config saved to /var/cache/conftool/dbconfig/20230510-060656-root.json
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48062 and previous config saved to /var/cache/conftool/dbconfig/20230510-055929-root.json
  • 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 1%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48061 and previous config saved to /var/cache/conftool/dbconfig/20230510-055300-root.json
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2151', diff saved to https://phabricator.wikimedia.org/P48060 and previous config saved to /var/cache/conftool/dbconfig/20230510-054833-root.json
  • 05:42 kart_: Updated MinT to 2023-05-10-045734-production (T331505)
  • 05:42 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 05:37 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 05:35 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 05:32 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 05:28 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 05:26 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 04:10 mutante: gerrit1001 - rsyncing data over to gerrit1003, as root in a screen, but slowly with bwlimit 5m
  • 01:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye
  • 01:34 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"

2023-05-09

  • 23:43 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad
  • 23:25 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad
  • 23:22 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad
  • 23:02 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad
  • 23:00 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_codfw
  • 22:46 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:42 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_codfw
  • 22:38 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_codfw
  • 22:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131 (T335845)', diff saved to https://phabricator.wikimedia.org/P48058 and previous config saved to /var/cache/conftool/dbconfig/20230509-223346-ladsgroup.json
  • 22:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage
  • 22:28 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage
  • 22:23 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye
  • 22:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P48057 and previous config saved to /var/cache/conftool/dbconfig/20230509-221840-ladsgroup.json
  • 22:18 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_codfw
  • 22:06 inflatador: bking@wcqs1002 depool wcqs1002 while it catches up on lag
  • 22:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P48056 and previous config saved to /var/cache/conftool/dbconfig/20230509-220333-ladsgroup.json
  • 21:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131 (T335845)', diff saved to https://phabricator.wikimedia.org/P48055 and previous config saved to /var/cache/conftool/dbconfig/20230509-214827-ladsgroup.json
  • 21:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye
  • 21:42 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_esams
  • 21:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2131 (T335845)', diff saved to https://phabricator.wikimedia.org/P48054 and previous config saved to /var/cache/conftool/dbconfig/20230509-213834-ladsgroup.json
  • 21:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2131.codfw.wmnet with reason: Maintenance
  • 21:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2131.codfw.wmnet with reason: Maintenance
  • 21:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2096 (T335845)', diff saved to https://phabricator.wikimedia.org/P48053 and previous config saved to /var/cache/conftool/dbconfig/20230509-213808-ladsgroup.json
  • 21:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2096', diff saved to https://phabricator.wikimedia.org/P48052 and previous config saved to /var/cache/conftool/dbconfig/20230509-212302-ladsgroup.json
  • 21:19 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_esams
  • 21:17 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_esams
  • 21:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2096', diff saved to https://phabricator.wikimedia.org/P48051 and previous config saved to /var/cache/conftool/dbconfig/20230509-210755-ladsgroup.json
  • 20:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2096 (T335845)', diff saved to https://phabricator.wikimedia.org/P48050 and previous config saved to /var/cache/conftool/dbconfig/20230509-205249-ladsgroup.json
  • 20:52 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_esams
  • 20:50 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:50 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2096 (T335845)', diff saved to https://phabricator.wikimedia.org/P48049 and previous config saved to /var/cache/conftool/dbconfig/20230509-204604-ladsgroup.json
  • 20:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2096.codfw.wmnet with reason: Maintenance
  • 20:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2096.codfw.wmnet with reason: Maintenance
  • 20:43 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye
  • 20:42 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye
  • 20:42 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_drmrs
  • 20:41 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye
  • 20:31 urbanecm@deploy1002: Finished scap: Backport for Add padding to limited-width toggle to account for close icon (T336274), Add padding to limited-width toggle to account for close icon (T336274) (duration: 08m 59s)
  • 20:24 urbanecm@deploy1002: urbanecm and jdrewniak: Backport for Add padding to limited-width toggle to account for close icon (T336274), Add padding to limited-width toggle to account for close icon (T336274) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:22 urbanecm@deploy1002: Started scap: Backport for Add padding to limited-width toggle to account for close icon (T336274), Add padding to limited-width toggle to account for close icon (T336274)
  • 20:22 urbanecm@deploy1002: Finished scap: Backport for Remove unused parsoidSettings, nativeGalleryEnabled (duration: 07m 11s)
  • 20:19 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_drmrs
  • 20:19 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_drmrs
  • 20:14 urbanecm@deploy1002: Started scap: Backport for Remove unused parsoidSettings, nativeGalleryEnabled
  • 20:10 urbanecm@deploy1002: Finished scap: Backport for [Growth] Add mediawiki.mentor_dashboard.personalized_praise stream (duration: 07m 26s)
  • 20:03 urbanecm@deploy1002: Started scap: Backport for [Growth] Add mediawiki.mentor_dashboard.personalized_praise stream
  • 20:01 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye
  • 19:54 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_drmrs
  • 19:34 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqsin
  • 19:08 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:08 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:07 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqsin
  • 18:57 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqsin
  • 18:45 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye
  • 18:44 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye
  • 18:28 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqsin
  • 18:06 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo
  • 18:01 aokoth@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host vrts2001.codfw.wmnet with OS bullseye
  • 17:49 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 17:49 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on vrts2001.codfw.wmnet with reason: host reimage
  • 17:49 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 17:49 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 17:48 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 17:48 rzl@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 17:48 rzl@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 17:48 rzl@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 17:48 rzl@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 17:48 rzl@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 17:48 rzl@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 17:48 rzl@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 17:48 rzl@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 17:47 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 17:47 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo
  • 17:47 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 17:47 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 17:46 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 17:46 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 17:46 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on vrts2001.codfw.wmnet with reason: host reimage
  • 17:46 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 17:46 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 17:45 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 17:43 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 17:43 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 17:42 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 17:42 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 17:31 aokoth@cumin1001: START - Cookbook sre.ganeti.reimage for host vrts2001.codfw.wmnet with OS bullseye
  • 17:31 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on vrts2001.codfw.wmnet with reason: Re-image w/ Bullseye
  • 17:31 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on vrts2001.codfw.wmnet with reason: Re-image w/ Bullseye
  • 17:28 aokoth@cumin1001: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host vrts2001.codfw.wmnet with OS bullseye
  • 17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T335845)', diff saved to https://phabricator.wikimedia.org/P48048 and previous config saved to /var/cache/conftool/dbconfig/20230509-172826-ladsgroup.json
  • 17:20 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo
  • 17:17 rzl: rolling restart apache on eqiad appservers T225778
  • 17:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2012.codfw.wmnet with OS bullseye
  • 17:13 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P48047 and previous config saved to /var/cache/conftool/dbconfig/20230509-171320-ladsgroup.json
  • 17:12 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 17:11 rzl: rolling restart apache on codfw appservers T225778
  • 17:00 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo
  • 17:00 brett@cumin2002: END (FAIL) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=1) rolling upgrade of HAProxy on A:cp-text_ulsfo
  • 17:00 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo
  • 16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P48046 and previous config saved to /var/cache/conftool/dbconfig/20230509-165813-ladsgroup.json
  • 16:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2012.codfw.wmnet with reason: host reimage
  • 16:55 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye
  • 16:55 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:55 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for cloudcontrol2001-dev - pt1979@cumin2002"
  • 16:54 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2012.codfw.wmnet with reason: host reimage
  • 16:54 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for cloudcontrol2001-dev - pt1979@cumin2002"
  • 16:51 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:46 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudbackup2001-dev.codfw.wmnet with OS bullseye
  • 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T335845)', diff saved to https://phabricator.wikimedia.org/P48045 and previous config saved to /var/cache/conftool/dbconfig/20230509-164307-ladsgroup.json
  • 16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T335845)', diff saved to https://phabricator.wikimedia.org/P48044 and previous config saved to /var/cache/conftool/dbconfig/20230509-163646-ladsgroup.json
  • 16:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 16:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T335845)', diff saved to https://phabricator.wikimedia.org/P48043 and previous config saved to /var/cache/conftool/dbconfig/20230509-163621-ladsgroup.json
  • 16:35 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
  • 16:33 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudbackup2001-dev.codfw.wmnet with OS bullseye
  • 16:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for cloudcontrol2001-dev - pt1979@cumin2002"
  • 16:33 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['lvs2012']
  • 16:32 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for cloudcontrol2001-dev - pt1979@cumin2002"
  • 16:30 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T335845)', diff saved to https://phabricator.wikimedia.org/P48042 and previous config saved to /var/cache/conftool/dbconfig/20230509-162904-ladsgroup.json
  • 16:27 rzl: resumed puppet on appservers - T225778
  • 16:25 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2012']
  • 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs2012']
  • 16:23 rzl: rzl@mwdebug1001:~$ sudo apache2ctl restart
  • 16:22 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2012']
  • 16:21 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-coord1002.eqiad.wmnet
  • 16:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P48041 and previous config saved to /var/cache/conftool/dbconfig/20230509-162115-ladsgroup.json
  • 16:19 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 16:15 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 16:15 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-coord1002.eqiad.wmnet
  • 16:14 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P48039 and previous config saved to /var/cache/conftool/dbconfig/20230509-161358-ladsgroup.json
  • 16:11 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 16:10 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 16:10 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 16:09 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:09 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 16:08 jnuche@deploy1002: Installing scap version "4.52.1" for 593 hosts
  • 16:07 rzl: stopping puppet on appservers - T225778
  • 16:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs2012.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P48038 and previous config saved to /var/cache/conftool/dbconfig/20230509-160608-ladsgroup.json
  • 16:04 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on vrts2001.codfw.wmnet with reason: host reimage
  • 16:01 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on vrts2001.codfw.wmnet with reason: host reimage
  • 15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P48037 and previous config saved to /var/cache/conftool/dbconfig/20230509-155852-ladsgroup.json
  • 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T335845)', diff saved to https://phabricator.wikimedia.org/P48036 and previous config saved to /var/cache/conftool/dbconfig/20230509-155102-ladsgroup.json
  • 15:50 aokoth@cumin1001: START - Cookbook sre.ganeti.reimage for host vrts2001.codfw.wmnet with OS bullseye
  • 15:48 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on vrts2001.codfw.wmnet with reason: Re-image w/ Bullseye
  • 15:48 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on vrts2001.codfw.wmnet with reason: Re-image w/ Bullseye
  • 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T335845)', diff saved to https://phabricator.wikimedia.org/P48035 and previous config saved to /var/cache/conftool/dbconfig/20230509-154346-ladsgroup.json
  • 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T335845)', diff saved to https://phabricator.wikimedia.org/P48034 and previous config saved to /var/cache/conftool/dbconfig/20230509-154338-ladsgroup.json
  • 15:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 15:43 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:43 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for cloudcontrol2001-dev - pt1979@cumin2002"
  • 15:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T335845)', diff saved to https://phabricator.wikimedia.org/P48033 and previous config saved to /var/cache/conftool/dbconfig/20230509-154313-ladsgroup.json
  • 15:42 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for cloudcontrol2001-dev - pt1979@cumin2002"
  • 15:40 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1223 (T335845)', diff saved to https://phabricator.wikimedia.org/P48032 and previous config saved to /var/cache/conftool/dbconfig/20230509-153715-ladsgroup.json
  • 15:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 15:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T335845)', diff saved to https://phabricator.wikimedia.org/P48031 and previous config saved to /var/cache/conftool/dbconfig/20230509-153651-ladsgroup.json
  • 15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P48030 and previous config saved to /var/cache/conftool/dbconfig/20230509-152804-ladsgroup.json
  • 15:23 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:23 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host lvs2012.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:22 aborrero@cumin2002: START - Cookbook sre.dns.netbox
  • 15:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P48029 and previous config saved to /var/cache/conftool/dbconfig/20230509-152145-ladsgroup.json
  • 15:20 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:20 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for lvs2012 - pt1979@cumin2002"
  • 15:19 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for lvs2012 - pt1979@cumin2002"
  • 15:17 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P48028 and previous config saved to /var/cache/conftool/dbconfig/20230509-151258-ladsgroup.json
  • 15:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host testvm2005.codfw.wmnet with OS bookworm
  • 15:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P48027 and previous config saved to /var/cache/conftool/dbconfig/20230509-150639-ladsgroup.json
  • 15:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2180']
  • 14:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T335845)', diff saved to https://phabricator.wikimedia.org/P48026 and previous config saved to /var/cache/conftool/dbconfig/20230509-145752-ladsgroup.json
  • 14:54 sukhe@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767 (duration: 45m 45s)
  • 14:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T335845)', diff saved to https://phabricator.wikimedia.org/P48025 and previous config saved to /var/cache/conftool/dbconfig/20230509-145133-ladsgroup.json
  • 14:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T335845)', diff saved to https://phabricator.wikimedia.org/P48024 and previous config saved to /var/cache/conftool/dbconfig/20230509-145128-ladsgroup.json
  • 14:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 14:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 14:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 14:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T335845)', diff saved to https://phabricator.wikimedia.org/P48023 and previous config saved to /var/cache/conftool/dbconfig/20230509-145057-ladsgroup.json
  • 14:50 sukhe: homer "cr*-codfw*" commit "Gerrit: 917885 remove decommissioned host lvs2008"
  • 14:46 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2180']
  • 14:45 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs2008.codfw.wmnet
  • 14:45 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:45 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs2008.codfw.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1212 (T335845)', diff saved to https://phabricator.wikimedia.org/P48022 and previous config saved to /var/cache/conftool/dbconfig/20230509-144457-ladsgroup.json
  • 14:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 14:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T335845)', diff saved to https://phabricator.wikimedia.org/P48021 and previous config saved to /var/cache/conftool/dbconfig/20230509-144433-ladsgroup.json
  • 14:44 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs2008.codfw.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 14:41 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 14:37 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:37 bking@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P48020 and previous config saved to /var/cache/conftool/dbconfig/20230509-143550-ladsgroup.json
  • 14:32 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs2008.codfw.wmnet
  • 14:32 sukhe: decommission lvs2008
  • 14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P48019 and previous config saved to /var/cache/conftool/dbconfig/20230509-142927-ladsgroup.json
  • 14:29 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host backup1010
  • 14:29 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host backup1010
  • 14:29 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host backup1011
  • 14:29 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host backup1011
  • 14:27 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:25 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 14:24 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db2180']
  • 14:23 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2180']
  • 14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P48018 and previous config saved to /var/cache/conftool/dbconfig/20230509-142044-ladsgroup.json
  • 14:15 sukhe: set routing-options static route 208.80.153.240/28 next-hop 10.192.49.7 [move static route for high-traffic2 to lvs2010]: T335777
  • 14:15 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2005.codfw.wmnet with OS bookworm
  • 14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P48017 and previous config saved to /var/cache/conftool/dbconfig/20230509-141421-ladsgroup.json
  • 14:08 sukhe@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767
  • 14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T335845)', diff saved to https://phabricator.wikimedia.org/P48016 and previous config saved to /var/cache/conftool/dbconfig/20230509-140535-ladsgroup.json
  • 13:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T335845)', diff saved to https://phabricator.wikimedia.org/P48015 and previous config saved to /var/cache/conftool/dbconfig/20230509-135915-ladsgroup.json
  • 13:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T335845)', diff saved to https://phabricator.wikimedia.org/P48014 and previous config saved to /var/cache/conftool/dbconfig/20230509-135815-ladsgroup.json
  • 13:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 13:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T335845)', diff saved to https://phabricator.wikimedia.org/P48013 and previous config saved to /var/cache/conftool/dbconfig/20230509-135750-ladsgroup.json
  • 13:49 taavi@deploy1002: Finished scap: Backport for Add $wmgUseRealMe (T324535) (duration: 07m 51s)
  • 13:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1198 (T335845)', diff saved to https://phabricator.wikimedia.org/P48012 and previous config saved to /var/cache/conftool/dbconfig/20230509-134952-ladsgroup.json
  • 13:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 13:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 13:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T335845)', diff saved to https://phabricator.wikimedia.org/P48011 and previous config saved to /var/cache/conftool/dbconfig/20230509-134929-ladsgroup.json
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2180 T336031', diff saved to https://phabricator.wikimedia.org/P48010 and previous config saved to /var/cache/conftool/dbconfig/20230509-134921-root.json
  • 13:44 moritzm: rearmed keyholder on netmon* post reboot
  • 13:43 taavi@deploy1002: taavi: Backport for Add $wmgUseRealMe (T324535) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:42 sukhe: sudo cumin -b1 -s1200 'A:cp and A:esams' 'varnish-frontend-restart: T253093
  • 13:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P48009 and previous config saved to /var/cache/conftool/dbconfig/20230509-134244-ladsgroup.json
  • 13:42 taavi@deploy1002: Started scap: Backport for Add $wmgUseRealMe (T324535)
  • 13:38 taavi@deploy1002: Finished scap: Backport for Add RealMe to extension-list (T324535) (duration: 35m 47s)
  • 13:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P48008 and previous config saved to /var/cache/conftool/dbconfig/20230509-133416-ladsgroup.json
  • 13:28 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on an-worker1088.eqiad.wmnet with reason: Replacing RAID controller battery
  • 13:28 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-client1001.eqiad.wmnet
  • 13:28 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on an-worker1088.eqiad.wmnet with reason: Replacing RAID controller battery
  • 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P48007 and previous config saved to /var/cache/conftool/dbconfig/20230509-132737-ladsgroup.json
  • 13:27 moritzm: updated bookworm d-i image to 2022-05-09 daily build T330495
  • 13:23 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-client1001.eqiad.wmnet
  • 13:23 taavi@deploy1002: taavi: Backport for Add RealMe to extension-list (T324535) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:23 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for an-worker1088.eqiad.wmnet
  • 13:23 btullis@cumin1001: START - Cookbook sre.hosts.remove-downtime for an-worker1088.eqiad.wmnet
  • 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P48006 and previous config saved to /var/cache/conftool/dbconfig/20230509-131910-ladsgroup.json
  • 13:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T335845)', diff saved to https://phabricator.wikimedia.org/P48005 and previous config saved to /var/cache/conftool/dbconfig/20230509-131231-ladsgroup.json
  • 13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T335845)', diff saved to https://phabricator.wikimedia.org/P48004 and previous config saved to /var/cache/conftool/dbconfig/20230509-130524-ladsgroup.json
  • 13:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 13:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T335845)', diff saved to https://phabricator.wikimedia.org/P48003 and previous config saved to /var/cache/conftool/dbconfig/20230509-130459-ladsgroup.json
  • 13:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "sync after adding ldap-rw servers - jmm@cumin2002"
  • 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T335845)', diff saved to https://phabricator.wikimedia.org/P48002 and previous config saved to /var/cache/conftool/dbconfig/20230509-130404-ladsgroup.json
  • 13:02 taavi@deploy1002: Started scap: Backport for Add RealMe to extension-list (T324535)
  • 13:01 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync after adding ldap-rw servers - jmm@cumin2002"
  • 12:58 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-worker1088.eqiad.wmnet with reason: Upgrading RAID controller firmware
  • 12:58 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-worker1088.eqiad.wmnet with reason: Upgrading RAID controller firmware
  • 12:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ldap-rw2001.wikimedia.org with OS bullseye
  • 12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T335845)', diff saved to https://phabricator.wikimedia.org/P48001 and previous config saved to /var/cache/conftool/dbconfig/20230509-125644-ladsgroup.json
  • 12:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 12:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T335845)', diff saved to https://phabricator.wikimedia.org/P48000 and previous config saved to /var/cache/conftool/dbconfig/20230509-125620-ladsgroup.json
  • 12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P47999 and previous config saved to /var/cache/conftool/dbconfig/20230509-124953-ladsgroup.json
  • 12:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ldap-rw2001.wikimedia.org with reason: host reimage
  • 12:41 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ldap-rw2001.wikimedia.org with reason: host reimage
  • 12:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P47997 and previous config saved to /var/cache/conftool/dbconfig/20230509-124114-ladsgroup.json
  • 12:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P47996 and previous config saved to /var/cache/conftool/dbconfig/20230509-123447-ladsgroup.json
  • 12:31 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host eventlog1003.eqiad.wmnet
  • 12:29 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host ldap-rw2001.wikimedia.org with OS bullseye
  • 12:27 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host eventlog1003.eqiad.wmnet
  • 12:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P47995 and previous config saved to /var/cache/conftool/dbconfig/20230509-122608-ladsgroup.json
  • 12:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T335845)', diff saved to https://phabricator.wikimedia.org/P47994 and previous config saved to /var/cache/conftool/dbconfig/20230509-121941-ladsgroup.json
  • 12:14 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aphlict1001.eqiad.wmnet
  • 12:14 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:14 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aphlict1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eoghan@cumin1001"
  • 12:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T335845)', diff saved to https://phabricator.wikimedia.org/P47992 and previous config saved to /var/cache/conftool/dbconfig/20230509-121119-ladsgroup.json
  • 12:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 12:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T335845)', diff saved to https://phabricator.wikimedia.org/P47991 and previous config saved to /var/cache/conftool/dbconfig/20230509-121102-ladsgroup.json
  • 12:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 12:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T335845)', diff saved to https://phabricator.wikimedia.org/P47990 and previous config saved to /var/cache/conftool/dbconfig/20230509-121053-ladsgroup.json
  • 12:06 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aphlict1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eoghan@cumin1001"
  • 12:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T335845)', diff saved to https://phabricator.wikimedia.org/P47989 and previous config saved to /var/cache/conftool/dbconfig/20230509-120433-ladsgroup.json
  • 12:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 12:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 12:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T335845)', diff saved to https://phabricator.wikimedia.org/P47988 and previous config saved to /var/cache/conftool/dbconfig/20230509-120410-ladsgroup.json
  • 12:02 eoghan@cumin1001: START - Cookbook sre.dns.netbox
  • 12:02 kart_: Updated cxserver to 2023-05-08-134152-production (T336115, T335987, T331835)
  • 11:58 eoghan@cumin1001: START - Cookbook sre.hosts.decommission for hosts aphlict1001.eqiad.wmnet
  • 11:58 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 11:57 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 11:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P47987 and previous config saved to /var/cache/conftool/dbconfig/20230509-115547-ladsgroup.json
  • 11:53 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 11:53 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 11:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P47986 and previous config saved to /var/cache/conftool/dbconfig/20230509-114903-ladsgroup.json
  • 11:45 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 11:45 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 11:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P47985 and previous config saved to /var/cache/conftool/dbconfig/20230509-114041-ladsgroup.json
  • 11:36 kart_: Updated MinT to 2023-05-09-110213-production (T331505, T335725, T331505)
  • 11:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P47984 and previous config saved to /var/cache/conftool/dbconfig/20230509-113357-ladsgroup.json
  • 11:33 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 11:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ldap-rw1001.wikimedia.org with OS bullseye
  • 11:29 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 11:27 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T335845)', diff saved to https://phabricator.wikimedia.org/P47983 and previous config saved to /var/cache/conftool/dbconfig/20230509-112535-ladsgroup.json
  • 11:23 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 11:20 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 11:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ldap-rw1001.wikimedia.org with reason: host reimage
  • 11:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T335845)', diff saved to https://phabricator.wikimedia.org/P47982 and previous config saved to /var/cache/conftool/dbconfig/20230509-111851-ladsgroup.json
  • 11:18 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 11:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T335845)', diff saved to https://phabricator.wikimedia.org/P47981 and previous config saved to /var/cache/conftool/dbconfig/20230509-111755-ladsgroup.json
  • 11:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 11:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 11:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T335845)', diff saved to https://phabricator.wikimedia.org/P47980 and previous config saved to /var/cache/conftool/dbconfig/20230509-111730-ladsgroup.json
  • 11:16 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ldap-rw1001.wikimedia.org with reason: host reimage
  • 11:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T335845)', diff saved to https://phabricator.wikimedia.org/P47979 and previous config saved to /var/cache/conftool/dbconfig/20230509-111235-ladsgroup.json
  • 11:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 11:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 11:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T335845)', diff saved to https://phabricator.wikimedia.org/P47978 and previous config saved to /var/cache/conftool/dbconfig/20230509-111211-ladsgroup.json
  • 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host ldap-rw1001.wikimedia.org with OS bullseye
  • 11:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P47977 and previous config saved to /var/cache/conftool/dbconfig/20230509-110222-ladsgroup.json
  • 10:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P47976 and previous config saved to /var/cache/conftool/dbconfig/20230509-105704-ladsgroup.json
  • 10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P47975 and previous config saved to /var/cache/conftool/dbconfig/20230509-104715-ladsgroup.json
  • 10:45 aborrero@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudcontrol2001-dev.wikimedia.org
  • 10:45 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:44 aborrero@cumin2002: START - Cookbook sre.dns.netbox
  • 10:42 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
  • 10:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P47974 and previous config saved to /var/cache/conftool/dbconfig/20230509-104158-ladsgroup.json
  • 10:39 aborrero@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudcontrol2001-dev.wikimedia.org
  • 10:36 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
  • 10:36 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
  • 10:32 aborrero@cumin2002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts cloudcontrol2001-dev.wikimedia.org
  • 10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T335845)', diff saved to https://phabricator.wikimedia.org/P47973 and previous config saved to /var/cache/conftool/dbconfig/20230509-103209-ladsgroup.json
  • 10:29 aborrero@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudcontrol2001-dev.wikimedia.org
  • 10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T335845)', diff saved to https://phabricator.wikimedia.org/P47972 and previous config saved to /var/cache/conftool/dbconfig/20230509-102652-ladsgroup.json
  • 10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T335845)', diff saved to https://phabricator.wikimedia.org/P47971 and previous config saved to /var/cache/conftool/dbconfig/20230509-102644-ladsgroup.json
  • 10:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 10:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 10:26 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
  • 10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T335845)', diff saved to https://phabricator.wikimedia.org/P47970 and previous config saved to /var/cache/conftool/dbconfig/20230509-102619-ladsgroup.json
  • 10:26 volans@cumin1001: END (FAIL) - Cookbook sre.netbox.update-extras (exit_code=1) rolling update on A:netbox-canary
  • 10:26 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox-canary
  • 10:24 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e1-eqiad.mgmt with reason: test on ssw1-e1-eqiad will take ospf on lsw1-e1-eqiad down.
  • 10:24 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e1-eqiad.mgmt with reason: test on ssw1-e1-eqiad will take ospf on lsw1-e1-eqiad down.
  • 10:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T335845)', diff saved to https://phabricator.wikimedia.org/P47969 and previous config saved to /var/cache/conftool/dbconfig/20230509-102001-ladsgroup.json
  • 10:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 10:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T335845)', diff saved to https://phabricator.wikimedia.org/P47968 and previous config saved to /var/cache/conftool/dbconfig/20230509-101938-ladsgroup.json
  • 10:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P47967 and previous config saved to /var/cache/conftool/dbconfig/20230509-101113-ladsgroup.json
  • 10:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P47966 and previous config saved to /var/cache/conftool/dbconfig/20230509-100431-ladsgroup.json
  • 09:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P47965 and previous config saved to /var/cache/conftool/dbconfig/20230509-095607-ladsgroup.json
  • 09:55 aborrero@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudcontrol2001-dev.wikimedia.org
  • 09:55 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:55 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2001-dev.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
  • 09:53 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2001-dev.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
  • 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P47964 and previous config saved to /var/cache/conftool/dbconfig/20230509-094925-ladsgroup.json
  • 09:49 aborrero@cumin2002: START - Cookbook sre.dns.netbox
  • 09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T335845)', diff saved to https://phabricator.wikimedia.org/P47962 and previous config saved to /var/cache/conftool/dbconfig/20230509-094100-ladsgroup.json
  • 09:37 aborrero@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudcontrol2001-dev.wikimedia.org
  • 09:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping3003.esams.wmnet
  • 09:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T335845)', diff saved to https://phabricator.wikimedia.org/P47961 and previous config saved to /var/cache/conftool/dbconfig/20230509-093419-ladsgroup.json
  • 09:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2103 (T335845)', diff saved to https://phabricator.wikimedia.org/P47960 and previous config saved to /var/cache/conftool/dbconfig/20230509-093320-ladsgroup.json
  • 09:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 09:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 09:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping3003.esams.wmnet
  • 09:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
  • 09:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 09:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 09:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T335845)', diff saved to https://phabricator.wikimedia.org/P47959 and previous config saved to /var/cache/conftool/dbconfig/20230509-092843-ladsgroup.json
  • 09:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 09:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 09:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 09:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 09:23 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
  • 09:17 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.8 refs T330214
  • 09:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping2003.codfw.wmnet
  • 09:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping2003.codfw.wmnet
  • 08:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping1003.eqiad.wmnet
  • 08:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping1003.eqiad.wmnet
  • 08:40 marostegui: Stop mariadb on db1115 (old zarcillo master) T334455
  • 08:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
  • 08:39 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
  • 08:37 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-rw1001.wikimedia.org
  • 08:37 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-rw1001.wikimedia.org - jmm@cumin2002"
  • 08:36 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-rw1001.wikimedia.org - jmm@cumin2002"
  • 08:36 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ldap-rw1001.wikimedia.org on all recursors
  • 08:36 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ldap-rw1001.wikimedia.org on all recursors
  • 08:36 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:36 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-rw1001.wikimedia.org - jmm@cumin2002"
  • 08:35 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-rw1001.wikimedia.org - jmm@cumin2002"
  • 08:30 marostegui: Failover m5-master from dbproxy1021 to dbproxy1017
  • 08:28 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:28 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ldap-rw1001.wikimedia.org
  • 08:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-rw2001.wikimedia.org
  • 08:24 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-rw2001.wikimedia.org - jmm@cumin2002"
  • 08:19 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-rw2001.wikimedia.org - jmm@cumin2002"
  • 08:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ldap-rw2001.wikimedia.org on all recursors
  • 08:19 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ldap-rw2001.wikimedia.org on all recursors
  • 08:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:18 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:16 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox2002.codfw.wmnet,netbox1002.eqiad.wmnet with reason: Release v3.2.9-wmf2 to production - volans@cumin1001 - T314933
  • 08:13 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox2002.codfw.wmnet,netbox1002.eqiad.wmnet with reason: Release v3.2.9-wmf2 to production - volans@cumin1001 - T314933
  • 08:13 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on netbox2002.codfw.wmnet,netbox1002.eqiad.wmnet with reason: netbox upgrade
  • 08:13 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on netbox2002.codfw.wmnet,netbox1002.eqiad.wmnet with reason: netbox upgrade
  • 08:12 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 08:12 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-rw2001.wikimedia.org - jmm@cumin2002"
  • 08:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-rw2001.wikimedia.org - jmm@cumin2002"
  • 08:08 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox2002.codfw.wmnet,netbox1002.eqiad.wmnet with reason: Release v3.2.9-wmf2 to production - volans@cumin1001 - T314933
  • 08:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:05 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ldap-rw2001.wikimedia.org
  • 08:04 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox2002.codfw.wmnet,netbox1002.eqiad.wmnet with reason: Release v3.2.9-wmf2 to production - volans@cumin1001 - T314933
  • 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon2002.wikimedia.org
  • 06:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netmon2002.wikimedia.org
  • 06:48 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host netmon1003.wikimedia.org
  • 06:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netmon1003.wikimedia.org
  • 06:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host build2001.codfw.wmnet
  • 06:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host build2001.codfw.wmnet
  • 05:28 marostegui: Starting db-inventory eqiad failover from db1115 to db1215 - T335014
  • 05:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2185.codfw.wmnet,db[1115,1215].eqiad.wmnet with reason: Primary switchover db_inventory T335014
  • 05:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2185.codfw.wmnet,db[1115,1215].eqiad.wmnet with reason: Primary switchover db_inventory T335014
  • 03:50 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.8 refs T330214 (duration: 47m 55s)
  • 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.8 refs T330214
  • 01:53 cstone: civicrm upgraded from 301e24e4 to d8a1a562
  • 00:43 eileen: civicrm upgraded from d5229d22 to 301e24e4
  • 00:06 zabe@deploy1002: Finished scap: Backport for Start writing to af_actor/afh_actor everywhere (T334295) (duration: 07m 22s)
  • 00:00 zabe@deploy1002: zabe: Backport for Start writing to af_actor/afh_actor everywhere (T334295) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet

2023-05-08

  • 23:58 zabe@deploy1002: Started scap: Backport for Start writing to af_actor/afh_actor everywhere (T334295)
  • 23:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T335845)', diff saved to https://phabricator.wikimedia.org/P47958 and previous config saved to /var/cache/conftool/dbconfig/20230508-233832-ladsgroup.json
  • 23:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P47957 and previous config saved to /var/cache/conftool/dbconfig/20230508-232325-ladsgroup.json
  • 23:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P47956 and previous config saved to /var/cache/conftool/dbconfig/20230508-230819-ladsgroup.json
  • 22:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T335845)', diff saved to https://phabricator.wikimedia.org/P47955 and previous config saved to /var/cache/conftool/dbconfig/20230508-225313-ladsgroup.json
  • 22:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1224 (T335845)', diff saved to https://phabricator.wikimedia.org/P47954 and previous config saved to /var/cache/conftool/dbconfig/20230508-224657-ladsgroup.json
  • 22:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 22:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 22:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T335845)', diff saved to https://phabricator.wikimedia.org/P47953 and previous config saved to /var/cache/conftool/dbconfig/20230508-224622-ladsgroup.json
  • 22:34 eileen: config revision changed from 7ac11236 to 48f7485f - disabled populate contribution tracking
  • 22:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P47952 and previous config saved to /var/cache/conftool/dbconfig/20230508-223115-ladsgroup.json
  • 22:23 cstone: civicrm upgraded from 05523a9d to d5229d22
  • 22:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P47951 and previous config saved to /var/cache/conftool/dbconfig/20230508-221609-ladsgroup.json
  • 22:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T335845)', diff saved to https://phabricator.wikimedia.org/P47950 and previous config saved to /var/cache/conftool/dbconfig/20230508-220103-ladsgroup.json
  • 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T335845)', diff saved to https://phabricator.wikimedia.org/P47949 and previous config saved to /var/cache/conftool/dbconfig/20230508-215323-ladsgroup.json
  • 21:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 21:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T335845)', diff saved to https://phabricator.wikimedia.org/P47948 and previous config saved to /var/cache/conftool/dbconfig/20230508-215300-ladsgroup.json
  • 21:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P47947 and previous config saved to /var/cache/conftool/dbconfig/20230508-213754-ladsgroup.json
  • 21:24 mstyles@deploy1002: Finished scap: Backport for Disable translation memory on collabwiki (T313241) (duration: 06m 45s)
  • 21:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P47946 and previous config saved to /var/cache/conftool/dbconfig/20230508-212248-ladsgroup.json
  • 21:21 mforns@deploy1002: Finished deploy [airflow-dags/analytics@a6a3ceb]: (no justification provided) (duration: 00m 09s)
  • 21:21 mforns@deploy1002: Started deploy [airflow-dags/analytics@a6a3ceb]: (no justification provided)
  • 21:18 mstyles@deploy1002: mstyles and sbassett: Backport for Disable translation memory on collabwiki (T313241) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 21:17 mstyles@deploy1002: Started scap: Backport for Disable translation memory on collabwiki (T313241)
  • 21:09 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-logging1001.eqiad.wmnet
  • 21:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T335845)', diff saved to https://phabricator.wikimedia.org/P47945 and previous config saved to /var/cache/conftool/dbconfig/20230508-210742-ladsgroup.json
  • 21:02 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-logging1001.eqiad.wmnet
  • 21:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T335845)', diff saved to https://phabricator.wikimedia.org/P47944 and previous config saved to /var/cache/conftool/dbconfig/20230508-210119-ladsgroup.json
  • 21:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 21:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 21:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T335845)', diff saved to https://phabricator.wikimedia.org/P47943 and previous config saved to /var/cache/conftool/dbconfig/20230508-210056-ladsgroup.json
  • 20:59 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-logging1002.eqiad.wmnet
  • 20:53 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-logging1002.eqiad.wmnet
  • 20:49 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-logging1004.eqiad.wmnet
  • 20:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P47942 and previous config saved to /var/cache/conftool/dbconfig/20230508-204549-ladsgroup.json
  • 20:43 mutante: miscweb2003 - rebooting
  • 20:41 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-logging1004.eqiad.wmnet
  • 20:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on miscweb2003.codfw.wmnet with reason: reboot
  • 20:41 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on miscweb2003.codfw.wmnet with reason: reboot
  • 20:39 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-logging1003.eqiad.wmnet
  • 20:38 taavi@deploy1002: Finished scap: Backport for Deploy fixed width indicator to wikis (T335307) (duration: 08m 22s)
  • 20:36 aokoth@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host vrts1001.eqiad.wmnet with OS bullseye
  • 20:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on miscweb2003.codfw.wmnet with reason: reboot
  • 20:32 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on miscweb2003.codfw.wmnet with reason: reboot
  • 20:32 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-logging1003.eqiad.wmnet
  • 20:31 taavi@deploy1002: jdlrobson and taavi: Backport for Deploy fixed width indicator to wikis (T335307) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P47941 and previous config saved to /var/cache/conftool/dbconfig/20230508-203043-ladsgroup.json
  • 20:30 taavi@deploy1002: Started scap: Backport for Deploy fixed width indicator to wikis (T335307)
  • 20:29 taavi@deploy1002: Finished scap: Backport for Ensure page load popupNotification is closed when the toggle button is clicked (T335153) (duration: 07m 58s)
  • 20:25 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2007.codfw.wmnet
  • 20:24 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on vrts1001.eqiad.wmnet with reason: host reimage
  • 20:23 taavi@deploy1002: jdlrobson and taavi: Backport for Ensure page load popupNotification is closed when the toggle button is clicked (T335153) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 20:21 taavi@deploy1002: Started scap: Backport for Ensure page load popupNotification is closed when the toggle button is clicked (T335153)
  • 20:21 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on vrts1001.eqiad.wmnet with reason: host reimage
  • 20:20 taavi@deploy1002: Finished scap: Backport for Update a/b test code for visual enhancements a/b test (T333715), Enable DiscussionTools visual enhancements a/b test (T302358) (duration: 11m 54s)
  • 20:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T335845)', diff saved to https://phabricator.wikimedia.org/P47939 and previous config saved to /var/cache/conftool/dbconfig/20230508-201537-ladsgroup.json
  • 20:11 aokoth@cumin1001: START - Cookbook sre.ganeti.reimage for host vrts1001.eqiad.wmnet with OS bullseye
  • 20:09 taavi@deploy1002: kemayo and taavi: Backport for Update a/b test code for visual enhancements a/b test (T333715), Enable DiscussionTools visual enhancements a/b test (T302358) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T335845)', diff saved to https://phabricator.wikimedia.org/P47938 and previous config saved to /var/cache/conftool/dbconfig/20230508-200825-ladsgroup.json
  • 20:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 20:08 taavi@deploy1002: Started scap: Backport for Update a/b test code for visual enhancements a/b test (T333715), Enable DiscussionTools visual enhancements a/b test (T302358)
  • 20:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 20:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T335845)', diff saved to https://phabricator.wikimedia.org/P47937 and previous config saved to /var/cache/conftool/dbconfig/20230508-200802-ladsgroup.json
  • 20:05 aokoth@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host vrts1001.eqiad.wmnet
  • 20:05 aokoth@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM vrts1001.eqiad.wmnet - aokoth@cumin1001"
  • 20:04 aokoth@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM vrts1001.eqiad.wmnet - aokoth@cumin1001"
  • 20:04 aokoth@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) vrts1001.eqiad.wmnet on all recursors
  • 20:04 aokoth@cumin1001: START - Cookbook sre.dns.wipe-cache vrts1001.eqiad.wmnet on all recursors
  • 20:04 aokoth@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:04 aokoth@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM vrts1001.eqiad.wmnet - aokoth@cumin1001"
  • 20:03 aokoth@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM vrts1001.eqiad.wmnet - aokoth@cumin1001"
  • 19:59 aokoth@cumin1001: START - Cookbook sre.dns.netbox
  • 19:59 aokoth@cumin1001: START - Cookbook sre.ganeti.makevm for new host vrts1001.eqiad.wmnet
  • 19:54 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-logging1005.eqiad.wmnet
  • 19:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P47936 and previous config saved to /var/cache/conftool/dbconfig/20230508-195256-ladsgroup.json
  • 19:52 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs2007.codfw.wmnet
  • 19:51 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2004.codfw.wmnet
  • 19:46 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-logging1005.eqiad.wmnet
  • 19:45 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs2004.codfw.wmnet
  • 19:45 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-logging2004.codfw.wmnet
  • 19:41 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2006.codfw.wmnet
  • 19:39 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-logging2004.codfw.wmnet
  • 19:38 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-logging2005.codfw.wmnet
  • 19:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P47935 and previous config saved to /var/cache/conftool/dbconfig/20230508-193750-ladsgroup.json
  • 19:34 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs2006.codfw.wmnet
  • 19:32 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-logging2005.codfw.wmnet
  • 19:31 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-logging2003.codfw.wmnet
  • 19:24 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-logging2003.codfw.wmnet
  • 19:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T335845)', diff saved to https://phabricator.wikimedia.org/P47934 and previous config saved to /var/cache/conftool/dbconfig/20230508-192243-ladsgroup.json
  • 19:20 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on wdqs2006.codfw.wmnet with reason: rebooting to help with lag
  • 19:20 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on wdqs2006.codfw.wmnet with reason: rebooting to help with lag
  • 19:20 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs2006.codfw.wmnet
  • 19:20 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for wdqs2006.codfw.wmnet
  • 19:18 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on wdqs1004.eqiad.wmnet with reason: rebooting to help with lag
  • 19:18 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on wdqs1004.eqiad.wmnet with reason: rebooting to help with lag
  • 19:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1173 (T335845)', diff saved to https://phabricator.wikimedia.org/P47933 and previous config saved to /var/cache/conftool/dbconfig/20230508-191630-ladsgroup.json
  • 19:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 19:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 19:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T335845)', diff saved to https://phabricator.wikimedia.org/P47932 and previous config saved to /var/cache/conftool/dbconfig/20230508-191607-ladsgroup.json
  • 19:12 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 19 hosts with reason: rebooting to help with lag
  • 19:12 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 19 hosts with reason: rebooting to help with lag
  • 19:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P47931 and previous config saved to /var/cache/conftool/dbconfig/20230508-190100-ladsgroup.json
  • 18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P47930 and previous config saved to /var/cache/conftool/dbconfig/20230508-184554-ladsgroup.json
  • 18:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T335845)', diff saved to https://phabricator.wikimedia.org/P47929 and previous config saved to /var/cache/conftool/dbconfig/20230508-183048-ladsgroup.json
  • 18:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T335845)', diff saved to https://phabricator.wikimedia.org/P47928 and previous config saved to /var/cache/conftool/dbconfig/20230508-182350-ladsgroup.json
  • 18:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 18:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 18:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T335845)', diff saved to https://phabricator.wikimedia.org/P47927 and previous config saved to /var/cache/conftool/dbconfig/20230508-182327-ladsgroup.json
  • 18:09 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 18:08 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 18:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P47926 and previous config saved to /var/cache/conftool/dbconfig/20230508-180820-ladsgroup.json
  • 18:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 18:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 18:04 sukhe@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767 (duration: 113m 03s)
  • 18:03 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-logging2002.codfw.wmnet
  • 18:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 18:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 18:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T335845)', diff saved to https://phabricator.wikimedia.org/P47925 and previous config saved to /var/cache/conftool/dbconfig/20230508-180239-ladsgroup.json
  • 17:57 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-logging2002.codfw.wmnet
  • 17:54 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-logging2001.codfw.wmnet
  • 17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P47923 and previous config saved to /var/cache/conftool/dbconfig/20230508-175314-ladsgroup.json
  • 17:51 sukhe: set routing-options static route 208.80.153.224/28 [high-traffic1, codfw] next-hop 10.192.0.29: T326767
  • 17:48 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-logging2001.codfw.wmnet
  • 17:48 sukhe: restart pybal on lvs2011 to pick up bgp med change: T326767
  • 17:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P47922 and previous config saved to /var/cache/conftool/dbconfig/20230508-174732-ladsgroup.json
  • 17:39 sukhe: homer "cr*-codfw*" commit "Gerrit: 914871 add new LVS host lvs2011": T326767
  • 17:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T335845)', diff saved to https://phabricator.wikimedia.org/P47920 and previous config saved to /var/cache/conftool/dbconfig/20230508-173808-ladsgroup.json
  • 17:38 volans: installed spicerack 7.0.0 on cumin1001
  • 17:36 bking@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-airflow1001.eqiad.wmnet
  • 17:36 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:36 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-airflow1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin1001"
  • 17:35 sukhe@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs2011
  • 17:35 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs2011
  • 17:33 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2011.codfw.wmnet
  • 17:33 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs2011.codfw.wmnet
  • 17:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P47919 and previous config saved to /var/cache/conftool/dbconfig/20230508-173226-ladsgroup.json
  • 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T335845)', diff saved to https://phabricator.wikimedia.org/P47918 and previous config saved to /var/cache/conftool/dbconfig/20230508-173152-ladsgroup.json
  • 17:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 17:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 17:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 17:31 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-airflow1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin1001"
  • 17:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 17:31 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1132.eqiad.wmnet with OS buster
  • 17:29 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin2002.codfw.wmnet with reason: test spicerack v7.0.0
  • 17:29 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin2002.codfw.wmnet with reason: test spicerack v7.0.0
  • 17:28 volans: installed spicerack 7.0.0 on cumin2002
  • 17:28 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudswift1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 17:27 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudswift1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 17:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T335845)', diff saved to https://phabricator.wikimedia.org/P47917 and previous config saved to /var/cache/conftool/dbconfig/20230508-171720-ladsgroup.json
  • 17:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2011.codfw.wmnet with OS bullseye
  • 17:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1214 (T335845)', diff saved to https://phabricator.wikimedia.org/P47916 and previous config saved to /var/cache/conftool/dbconfig/20230508-170902-ladsgroup.json
  • 17:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 17:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 17:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T335845)', diff saved to https://phabricator.wikimedia.org/P47915 and previous config saved to /var/cache/conftool/dbconfig/20230508-170828-ladsgroup.json
  • 17:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T335845)', diff saved to https://phabricator.wikimedia.org/P47914 and previous config saved to /var/cache/conftool/dbconfig/20230508-170542-ladsgroup.json
  • 16:58 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2011.codfw.wmnet with reason: host reimage
  • 16:55 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2011.codfw.wmnet with reason: host reimage
  • 16:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P47913 and previous config saved to /var/cache/conftool/dbconfig/20230508-165322-ladsgroup.json
  • 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P47912 and previous config saved to /var/cache/conftool/dbconfig/20230508-165036-ladsgroup.json
  • 16:46 volans: uploaded spicerack_7.0.0 to apt.wikimedia.org bullseye-wikimedia
  • 16:39 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 16:39 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
  • 16:39 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 16:38 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2011.codfw.wmnet with OS bullseye
  • 16:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P47910 and previous config saved to /var/cache/conftool/dbconfig/20230508-163816-ladsgroup.json
  • 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P47909 and previous config saved to /var/cache/conftool/dbconfig/20230508-163530-ladsgroup.json
  • 16:33 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:33 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 16:32 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 16:32 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 16:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T335845)', diff saved to https://phabricator.wikimedia.org/P47908 and previous config saved to /var/cache/conftool/dbconfig/20230508-162309-ladsgroup.json
  • 16:20 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 16:20 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2011.codfw.wmnet with OS bullseye
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T335845)', diff saved to https://phabricator.wikimedia.org/P47907 and previous config saved to /var/cache/conftool/dbconfig/20230508-162024-ladsgroup.json
  • 16:14 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2181 (T335845)', diff saved to https://phabricator.wikimedia.org/P47906 and previous config saved to /var/cache/conftool/dbconfig/20230508-161313-ladsgroup.json
  • 16:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 16:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1211 (T335845)', diff saved to https://phabricator.wikimedia.org/P47905 and previous config saved to /var/cache/conftool/dbconfig/20230508-161258-ladsgroup.json
  • 16:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 16:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 16:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 16:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T335845)', diff saved to https://phabricator.wikimedia.org/P47904 and previous config saved to /var/cache/conftool/dbconfig/20230508-161235-ladsgroup.json
  • 16:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T335845)', diff saved to https://phabricator.wikimedia.org/P47903 and previous config saved to /var/cache/conftool/dbconfig/20230508-161234-ladsgroup.json
  • 16:11 sukhe@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767
  • 16:02 bking@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=codfw
  • 15:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P47902 and previous config saved to /var/cache/conftool/dbconfig/20230508-155729-ladsgroup.json
  • 15:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P47901 and previous config saved to /var/cache/conftool/dbconfig/20230508-155728-ladsgroup.json
  • 15:47 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:46 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P47900 and previous config saved to /var/cache/conftool/dbconfig/20230508-154222-ladsgroup.json
  • 15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P47899 and previous config saved to /var/cache/conftool/dbconfig/20230508-154222-ladsgroup.json
  • 15:41 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:39 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:32 sukhe: ns1: remove dns2001, add dns2004 next-hop [ 208.80.153.48 208.80.153.111 208.80.153.10 ]: T335777
  • 15:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T335845)', diff saved to https://phabricator.wikimedia.org/P47898 and previous config saved to /var/cache/conftool/dbconfig/20230508-152716-ladsgroup.json
  • 15:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T335845)', diff saved to https://phabricator.wikimedia.org/P47897 and previous config saved to /var/cache/conftool/dbconfig/20230508-152716-ladsgroup.json
  • 15:25 moritzm: installing grep updates from Bullseye 11.7 point release
  • 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1209 (T335845)', diff saved to https://phabricator.wikimedia.org/P47896 and previous config saved to /var/cache/conftool/dbconfig/20230508-151952-ladsgroup.json
  • 15:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T335845)', diff saved to https://phabricator.wikimedia.org/P47895 and previous config saved to /var/cache/conftool/dbconfig/20230508-151929-ladsgroup.json
  • 15:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T335845)', diff saved to https://phabricator.wikimedia.org/P47894 and previous config saved to /var/cache/conftool/dbconfig/20230508-151556-ladsgroup.json
  • 15:12 sukhe: [done] homer "cr*-codfw*" commit "Gerrit: 917341 add new DNS host dns2004": T326688
  • 15:09 sukhe: homer "cr*-codfw*" commit "Gerrit: 917341 add new DNS host dns2004"
  • 15:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P47893 and previous config saved to /var/cache/conftool/dbconfig/20230508-150423-ladsgroup.json
  • 15:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P47892 and previous config saved to /var/cache/conftool/dbconfig/20230508-150050-ladsgroup.json
  • 14:57 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dns2004.wikimedia.org
  • 14:57 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for dns2004.wikimedia.org
  • 14:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2004.wikimedia.org with OS bullseye
  • 14:51 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.7 refs T330214
  • 14:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P47891 and previous config saved to /var/cache/conftool/dbconfig/20230508-144916-ladsgroup.json
  • 14:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P47890 and previous config saved to /var/cache/conftool/dbconfig/20230508-144544-ladsgroup.json
  • 14:40 brennen: train 1.41.0-wmf.7 (T330213): proceeding to all wikis
  • 14:37 sukhe: sudo cumin -b1 -s1200 'A:cp and A:ulsfo' 'varnish-frontend-restart': T253093
  • 14:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2004.wikimedia.org with reason: host reimage
  • 14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T335845)', diff saved to https://phabricator.wikimedia.org/P47889 and previous config saved to /var/cache/conftool/dbconfig/20230508-143410-ladsgroup.json
  • 14:31 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2004.wikimedia.org with reason: host reimage
  • 14:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T335845)', diff saved to https://phabricator.wikimedia.org/P47888 and previous config saved to /var/cache/conftool/dbconfig/20230508-143038-ladsgroup.json
  • 14:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1203 (T335845)', diff saved to https://phabricator.wikimedia.org/P47887 and previous config saved to /var/cache/conftool/dbconfig/20230508-142543-ladsgroup.json
  • 14:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 14:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 14:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T335845)', diff saved to https://phabricator.wikimedia.org/P47886 and previous config saved to /var/cache/conftool/dbconfig/20230508-142520-ladsgroup.json
  • 14:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3318 (T335845)', diff saved to https://phabricator.wikimedia.org/P47885 and previous config saved to /var/cache/conftool/dbconfig/20230508-142427-ladsgroup.json
  • 14:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T335845)', diff saved to https://phabricator.wikimedia.org/P47884 and previous config saved to /var/cache/conftool/dbconfig/20230508-142302-ladsgroup.json
  • 14:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 14:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T335845)', diff saved to https://phabricator.wikimedia.org/P47883 and previous config saved to /var/cache/conftool/dbconfig/20230508-142237-ladsgroup.json
  • 14:16 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns2004.wikimedia.org with OS bullseye
  • 14:14 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 14:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P47882 and previous config saved to /var/cache/conftool/dbconfig/20230508-141014-ladsgroup.json
  • 14:09 bking@cumin1001: START - Cookbook sre.hosts.decommission for hosts an-airflow1001.eqiad.wmnet
  • 14:08 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: wgEventStreams - Add mediawiki.page_outlink_topic_prediction_change stream - T328899 (duration: 06m 54s)
  • 14:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P47881 and previous config saved to /var/cache/conftool/dbconfig/20230508-140731-ladsgroup.json
  • 13:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow1002.eqiad.wmnet
  • 13:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P47880 and previous config saved to /var/cache/conftool/dbconfig/20230508-135508-ladsgroup.json
  • 13:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P47879 and previous config saved to /var/cache/conftool/dbconfig/20230508-135224-ladsgroup.json
  • 13:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow1002.eqiad.wmnet
  • 13:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow2002.codfw.wmnet
  • 13:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow2002.codfw.wmnet
  • 13:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T335845)', diff saved to https://phabricator.wikimedia.org/P47878 and previous config saved to /var/cache/conftool/dbconfig/20230508-134002-ladsgroup.json
  • 13:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T335845)', diff saved to https://phabricator.wikimedia.org/P47877 and previous config saved to /var/cache/conftool/dbconfig/20230508-133718-ladsgroup.json
  • 13:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1193 (T335845)', diff saved to https://phabricator.wikimedia.org/P47876 and previous config saved to /var/cache/conftool/dbconfig/20230508-133034-ladsgroup.json
  • 13:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 13:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 13:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T335845)', diff saved to https://phabricator.wikimedia.org/P47875 and previous config saved to /var/cache/conftool/dbconfig/20230508-133011-ladsgroup.json
  • 13:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2166 (T335845)', diff saved to https://phabricator.wikimedia.org/P47874 and previous config saved to /var/cache/conftool/dbconfig/20230508-132957-ladsgroup.json
  • 13:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 13:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 13:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T335845)', diff saved to https://phabricator.wikimedia.org/P47873 and previous config saved to /var/cache/conftool/dbconfig/20230508-132932-ladsgroup.json
  • 13:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow3002.esams.wmnet
  • 13:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow3002.esams.wmnet
  • 13:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P47872 and previous config saved to /var/cache/conftool/dbconfig/20230508-131504-ladsgroup.json
  • 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow4002.ulsfo.wmnet
  • 13:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P47871 and previous config saved to /var/cache/conftool/dbconfig/20230508-131426-ladsgroup.json
  • 13:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow4002.ulsfo.wmnet
  • 12:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P47870 and previous config saved to /var/cache/conftool/dbconfig/20230508-125958-ladsgroup.json
  • 12:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P47869 and previous config saved to /var/cache/conftool/dbconfig/20230508-125920-ladsgroup.json
  • 12:56 moritzm: installing ruby-rack security updates
  • 12:55 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudsw1-b1-codfw,cloudsw1-b1-codfw IPv6,cloudsw1-b1-codfw.mgmt
  • 12:55 cmooney@cumin1001: START - Cookbook sre.hosts.remove-downtime for cloudsw1-b1-codfw,cloudsw1-b1-codfw IPv6,cloudsw1-b1-codfw.mgmt
  • 12:51 moritzm: installing python-django security updates on stretch
  • 12:45 moritzm: installing openvswitch securiy updates
  • 12:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow5002.eqsin.wmnet
  • 12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T335845)', diff saved to https://phabricator.wikimedia.org/P47868 and previous config saved to /var/cache/conftool/dbconfig/20230508-124452-ladsgroup.json
  • 12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T335845)', diff saved to https://phabricator.wikimedia.org/P47867 and previous config saved to /var/cache/conftool/dbconfig/20230508-124414-ladsgroup.json
  • 12:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow5002.eqsin.wmnet
  • 12:40 topranks: rebooting cloudsw1-b1-codfw for OS upgrade T333316
  • 12:39 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cloudsw1-b1-codfw,cloudsw1-b1-codfw IPv6,cloudsw1-b1-codfw.mgmt with reason: cloudsw1-b1-codfw OS upgrade
  • 12:38 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cloudsw1-b1-codfw,cloudsw1-b1-codfw IPv6,cloudsw1-b1-codfw.mgmt with reason: cloudsw1-b1-codfw OS upgrade
  • 12:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2164 (T335845)', diff saved to https://phabricator.wikimedia.org/P47866 and previous config saved to /var/cache/conftool/dbconfig/20230508-123654-ladsgroup.json
  • 12:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 12:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 12:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 12:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1192 (T335845)', diff saved to https://phabricator.wikimedia.org/P47865 and previous config saved to /var/cache/conftool/dbconfig/20230508-123624-ladsgroup.json
  • 12:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 12:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T335845)', diff saved to https://phabricator.wikimedia.org/P47864 and previous config saved to /var/cache/conftool/dbconfig/20230508-123614-ladsgroup.json
  • 12:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 12:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 12:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T335845)', diff saved to https://phabricator.wikimedia.org/P47863 and previous config saved to /var/cache/conftool/dbconfig/20230508-123554-ladsgroup.json
  • 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow6001.drmrs.wmnet
  • 12:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow6001.drmrs.wmnet
  • 12:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P47862 and previous config saved to /var/cache/conftool/dbconfig/20230508-122108-ladsgroup.json
  • 12:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P47861 and previous config saved to /var/cache/conftool/dbconfig/20230508-122048-ladsgroup.json
  • 12:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host testvm2005.codfw.wmnet with OS bullseye
  • 12:06 jiji@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2448.codfw.wmnet
  • 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P47860 and previous config saved to /var/cache/conftool/dbconfig/20230508-120602-ladsgroup.json
  • 12:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P47859 and previous config saved to /var/cache/conftool/dbconfig/20230508-120542-ladsgroup.json
  • 11:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2005.codfw.wmnet with reason: host reimage
  • 11:51 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2005.codfw.wmnet with reason: host reimage
  • 11:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T335845)', diff saved to https://phabricator.wikimedia.org/P47858 and previous config saved to /var/cache/conftool/dbconfig/20230508-115056-ladsgroup.json
  • 11:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T335845)', diff saved to https://phabricator.wikimedia.org/P47857 and previous config saved to /var/cache/conftool/dbconfig/20230508-115036-ladsgroup.json
  • 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T335845)', diff saved to https://phabricator.wikimedia.org/P47856 and previous config saved to /var/cache/conftool/dbconfig/20230508-114417-ladsgroup.json
  • 11:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 11:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T335845)', diff saved to https://phabricator.wikimedia.org/P47855 and previous config saved to /var/cache/conftool/dbconfig/20230508-114354-ladsgroup.json
  • 11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2163 (T335845)', diff saved to https://phabricator.wikimedia.org/P47854 and previous config saved to /var/cache/conftool/dbconfig/20230508-114336-ladsgroup.json
  • 11:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 11:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T335845)', diff saved to https://phabricator.wikimedia.org/P47853 and previous config saved to /var/cache/conftool/dbconfig/20230508-114312-ladsgroup.json
  • 11:41 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2005.codfw.wmnet with OS bullseye
  • 11:35 daniel@deploy1002: Finished scap: Backport for Enable parser cache warming jobs for parsoid on small wikis (T329366) (duration: 15m 26s)
  • 11:32 jmm@cumin2002: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host testvm2005.codfw.wmnet with OS bookworm
  • 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P47851 and previous config saved to /var/cache/conftool/dbconfig/20230508-112848-ladsgroup.json
  • 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P47850 and previous config saved to /var/cache/conftool/dbconfig/20230508-112805-ladsgroup.json
  • 11:21 daniel@deploy1002: daniel: Backport for Enable parser cache warming jobs for parsoid on small wikis (T329366) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 11:20 daniel@deploy1002: Started scap: Backport for Enable parser cache warming jobs for parsoid on small wikis (T329366)
  • 11:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P47849 and previous config saved to /var/cache/conftool/dbconfig/20230508-111342-ladsgroup.json
  • 11:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P47848 and previous config saved to /var/cache/conftool/dbconfig/20230508-111259-ladsgroup.json
  • 11:11 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1113 from dbctl T336029', diff saved to https://phabricator.wikimedia.org/P47847 and previous config saved to /var/cache/conftool/dbconfig/20230508-111113-marostegui.json
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 100%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47846 and previous config saved to /var/cache/conftool/dbconfig/20230508-110812-root.json
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 100%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47845 and previous config saved to /var/cache/conftool/dbconfig/20230508-110803-root.json
  • 11:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 100%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47844 and previous config saved to /var/cache/conftool/dbconfig/20230508-110756-root.json
  • 11:07 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 100%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47843 and previous config saved to /var/cache/conftool/dbconfig/20230508-110755-root.json
  • 11:04 duesen: conflig deployment failed because gitlab is down. Prod is out of sync with gerrit, and deploy1002 is in sync with gerrit. Will come back to thin in an hour.
  • 10:59 volans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
  • 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T335845)', diff saved to https://phabricator.wikimedia.org/P47842 and previous config saved to /var/cache/conftool/dbconfig/20230508-105835-ladsgroup.json
  • 10:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T335845)', diff saved to https://phabricator.wikimedia.org/P47841 and previous config saved to /var/cache/conftool/dbconfig/20230508-105753-ladsgroup.json
  • 10:56 hnowlan@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1019*,lvs2009*} and A:lvs (T320967)
  • 10:56 eoghan@cumin1001: END (PASS) - Cookbook sre.gitlab.failover (exit_code=0) Failover of gitlab from gitlab2002.wikimedia.org to gitlab1004.wikimedia.org
  • 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki1001.eqiad.wmnet
  • 10:54 hnowlan@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1019*,lvs2009*} and A:lvs (T320967)
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 75%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47840 and previous config saved to /var/cache/conftool/dbconfig/20230508-105307-root.json
  • 10:53 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2005.codfw.wmnet with OS bookworm
  • 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 75%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47839 and previous config saved to /var/cache/conftool/dbconfig/20230508-105258-root.json
  • 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 75%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47838 and previous config saved to /var/cache/conftool/dbconfig/20230508-105252-root.json
  • 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 75%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47837 and previous config saved to /var/cache/conftool/dbconfig/20230508-105250-root.json
  • 10:52 jmm@cumin2002: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host testvm2005.codfw.wmnet with OS bookworm
  • 10:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T335845)', diff saved to https://phabricator.wikimedia.org/P47836 and previous config saved to /var/cache/conftool/dbconfig/20230508-105215-ladsgroup.json
  • 10:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 10:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 10:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T335845)', diff saved to https://phabricator.wikimedia.org/P47835 and previous config saved to /var/cache/conftool/dbconfig/20230508-105141-ladsgroup.json
  • 10:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki1001.eqiad.wmnet
  • 10:51 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) https://gitlab.wikimedia.org/ https://gitlab-replica.wikimedia.org/ on all recursors
  • 10:51 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache https://gitlab.wikimedia.org/ https://gitlab-replica.wikimedia.org/ on all recursors
  • 10:50 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) gitlab-replica.wikimedia.org on all recursors
  • 10:50 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache gitlab-replica.wikimedia.org on all recursors
  • 10:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2162 (T335845)', diff saved to https://phabricator.wikimedia.org/P47834 and previous config saved to /var/cache/conftool/dbconfig/20230508-105032-ladsgroup.json
  • 10:50 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) gitlab.wikimedia.org on all recursors
  • 10:50 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache gitlab.wikimedia.org on all recursors
  • 10:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 10:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 10:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T335845)', diff saved to https://phabricator.wikimedia.org/P47833 and previous config saved to /var/cache/conftool/dbconfig/20230508-105007-ladsgroup.json
  • 10:47 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) https://gitlab.wikimedia.org/ https://gitlab-replica.wikimedia.org/ on all recursors
  • 10:47 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache https://gitlab.wikimedia.org/ https://gitlab-replica.wikimedia.org/ on all recursors
  • 10:47 hnowlan@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1020*,lvs2010*} and A:lvs (T320967)
  • 10:45 hnowlan@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1020*,lvs2010*} and A:lvs (T320967)
  • 10:44 daniel@deploy1002: scap failed: CalledProcessError Command 'sudo -u mwbuilder /usr/local/bin/update-mediawiki-tools-release' returned non-zero exit status 1. (duration: 00m 05s)
  • 10:44 daniel@deploy1002: Started scap: Backport for Enable parser cache warming jobs for parsoid on small wikis (T329366)
  • 10:41 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 50%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47832 and previous config saved to /var/cache/conftool/dbconfig/20230508-103802-root.json
  • 10:37 volans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 50%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47831 and previous config saved to /var/cache/conftool/dbconfig/20230508-103754-root.json
  • 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 50%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47830 and previous config saved to /var/cache/conftool/dbconfig/20230508-103747-root.json
  • 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 50%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47829 and previous config saved to /var/cache/conftool/dbconfig/20230508-103745-root.json
  • 10:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P47828 and previous config saved to /var/cache/conftool/dbconfig/20230508-103634-ladsgroup.json
  • 10:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P47827 and previous config saved to /var/cache/conftool/dbconfig/20230508-103501-ladsgroup.json
  • 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki2002.codfw.wmnet
  • 10:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki2002.codfw.wmnet
  • 10:28 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2005.codfw.wmnet with OS bookworm
  • 10:27 jmm@cumin2002: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host netflow2003.codfw.wmnet with OS bookworm
  • 10:24 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 25%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47826 and previous config saved to /var/cache/conftool/dbconfig/20230508-102258-root.json
  • 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 25%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47825 and previous config saved to /var/cache/conftool/dbconfig/20230508-102249-root.json
  • 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 25%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47824 and previous config saved to /var/cache/conftool/dbconfig/20230508-102242-root.json
  • 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 25%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47823 and previous config saved to /var/cache/conftool/dbconfig/20230508-102240-root.json
  • 10:22 volans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1001.eqiad.wmnet with OS bullseye
  • 10:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P47822 and previous config saved to /var/cache/conftool/dbconfig/20230508-102128-ladsgroup.json
  • 10:21 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P47821 and previous config saved to /var/cache/conftool/dbconfig/20230508-101955-ladsgroup.json
  • 10:18 volans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 10%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47820 and previous config saved to /var/cache/conftool/dbconfig/20230508-100753-root.json
  • 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 10%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47819 and previous config saved to /var/cache/conftool/dbconfig/20230508-100744-root.json
  • 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 10%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47818 and previous config saved to /var/cache/conftool/dbconfig/20230508-100737-root.json
  • 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 10%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47817 and previous config saved to /var/cache/conftool/dbconfig/20230508-100736-root.json
  • 10:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T335845)', diff saved to https://phabricator.wikimedia.org/P47816 and previous config saved to /var/cache/conftool/dbconfig/20230508-100622-ladsgroup.json
  • 10:04 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 10:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T335845)', diff saved to https://phabricator.wikimedia.org/P47815 and previous config saved to /var/cache/conftool/dbconfig/20230508-100449-ladsgroup.json
  • 10:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T335845)', diff saved to https://phabricator.wikimedia.org/P47814 and previous config saved to /var/cache/conftool/dbconfig/20230508-100003-ladsgroup.json
  • 09:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 09:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 09:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T335845)', diff saved to https://phabricator.wikimedia.org/P47813 and previous config saved to /var/cache/conftool/dbconfig/20230508-095928-ladsgroup.json
  • 09:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2161 (T335845)', diff saved to https://phabricator.wikimedia.org/P47812 and previous config saved to /var/cache/conftool/dbconfig/20230508-095724-ladsgroup.json
  • 09:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 09:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 09:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T335845)', diff saved to https://phabricator.wikimedia.org/P47811 and previous config saved to /var/cache/conftool/dbconfig/20230508-095659-ladsgroup.json
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 5%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47810 and previous config saved to /var/cache/conftool/dbconfig/20230508-095248-root.json
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 5%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47809 and previous config saved to /var/cache/conftool/dbconfig/20230508-095240-root.json
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 5%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47808 and previous config saved to /var/cache/conftool/dbconfig/20230508-095233-root.json
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 5%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47807 and previous config saved to /var/cache/conftool/dbconfig/20230508-095231-root.json
  • 09:48 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host netflow2003.codfw.wmnet with OS bookworm
  • 09:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P47806 and previous config saved to /var/cache/conftool/dbconfig/20230508-094422-ladsgroup.json
  • 09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P47805 and previous config saved to /var/cache/conftool/dbconfig/20230508-094153-ladsgroup.json
  • 09:40 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 09:40 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 09:40 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 09:39 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 09:39 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:39 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 09:38 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:38 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 3%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47804 and previous config saved to /var/cache/conftool/dbconfig/20230508-093743-root.json
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 3%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47803 and previous config saved to /var/cache/conftool/dbconfig/20230508-093735-root.json
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 3%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47802 and previous config saved to /var/cache/conftool/dbconfig/20230508-093728-root.json
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 3%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47801 and previous config saved to /var/cache/conftool/dbconfig/20230508-093726-root.json
  • 09:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P47800 and previous config saved to /var/cache/conftool/dbconfig/20230508-092916-ladsgroup.json
  • 09:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P47799 and previous config saved to /var/cache/conftool/dbconfig/20230508-092647-ladsgroup.json
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 1%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47798 and previous config saved to /var/cache/conftool/dbconfig/20230508-092232-root.json
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 1%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47797 and previous config saved to /var/cache/conftool/dbconfig/20230508-092223-root.json
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 1%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47796 and previous config saved to /var/cache/conftool/dbconfig/20230508-092221-root.json
  • 09:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet with OS bookworm
  • 09:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T335845)', diff saved to https://phabricator.wikimedia.org/P47794 and previous config saved to /var/cache/conftool/dbconfig/20230508-091408-ladsgroup.json
  • 09:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T335845)', diff saved to https://phabricator.wikimedia.org/P47793 and previous config saved to /var/cache/conftool/dbconfig/20230508-091140-ladsgroup.json
  • 09:10 ladsgroup@deploy1002: Finished scap: Backport for Set externallinks migration to read new on mediawiki.org and fawikiquote (T335343) (duration: 14m 04s)
  • 09:05 eoghan@cumin1001: START - Cookbook sre.gitlab.failover Failover of gitlab from gitlab2002.wikimedia.org to gitlab1004.wikimedia.org
  • 09:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2154 (T335845)', diff saved to https://phabricator.wikimedia.org/P47792 and previous config saved to /var/cache/conftool/dbconfig/20230508-090521-ladsgroup.json
  • 09:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 09:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 09:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T335845)', diff saved to https://phabricator.wikimedia.org/P47791 and previous config saved to /var/cache/conftool/dbconfig/20230508-090456-ladsgroup.json
  • 09:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 08:59 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 08:57 ladsgroup@deploy1002: ladsgroup: Backport for Set externallinks migration to read new on mediawiki.org and fawikiquote (T335343) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 08:56 ladsgroup@deploy1002: Started scap: Backport for Set externallinks migration to read new on mediawiki.org and fawikiquote (T335343)
  • 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1022 es1025 es2025 es2022 for reboots', diff saved to https://phabricator.wikimedia.org/P47790 and previous config saved to /var/cache/conftool/dbconfig/20230508-085435-root.json
  • 08:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P47789 and previous config saved to /var/cache/conftool/dbconfig/20230508-084950-ladsgroup.json
  • 08:45 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 08:43 jmm@cumin2002: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host netflow2003.codfw.wmnet with OS bookworm
  • 08:40 marostegui@deploy1002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc1 master" (duration: 13m 18s)
  • 08:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P47788 and previous config saved to /var/cache/conftool/dbconfig/20230508-083444-ladsgroup.json
  • 08:29 marostegui@deploy1002: marostegui: Backport for Revert "ProductionServices.php: Promote pc1014 to pc1 master" synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 08:28 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host netflow2003.codfw.wmnet with OS bookworm
  • 08:27 marostegui@deploy1002: Started scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc1 master"
  • 08:27 vgutierrez: HAProxy updated to 2.6.13 on cp1077 and cp1085 - T334448
  • 08:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T335845)', diff saved to https://phabricator.wikimedia.org/P47787 and previous config saved to /var/cache/conftool/dbconfig/20230508-081937-ladsgroup.json
  • 08:18 jmm@cumin2002: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host netflow2003.codfw.wmnet with OS bookworm
  • 08:17 marostegui: Failover m3-master from dbproxy1020 to dbproxy1016
  • 08:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2152 (T335845)', diff saved to https://phabricator.wikimedia.org/P47786 and previous config saved to /var/cache/conftool/dbconfig/20230508-081415-ladsgroup.json
  • 08:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 08:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 08:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T335845)', diff saved to https://phabricator.wikimedia.org/P47785 and previous config saved to /var/cache/conftool/dbconfig/20230508-081353-ladsgroup.json
  • 08:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 08:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 08:13 marostegui@deploy1002: Finished scap: Backport for ProductionServices.php: Promote pc1014 to pc1 master (duration: 15m 27s)
  • 07:59 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host netflow2003.codfw.wmnet with OS bookworm
  • 07:59 marostegui@deploy1002: marostegui: Backport for ProductionServices.php: Promote pc1014 to pc1 master synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 07:57 marostegui@deploy1002: Started scap: Backport for ProductionServices.php: Promote pc1014 to pc1 master
  • 07:54 _joe_: restarting varnish-frontend on cp5029, last host in eqsin/upload to be restarted
  • 07:53 vgutierrez: fetch HAProxy 2.6.13 on thirdparty/haproxy2.6 (apt.wm.o) - T334448
  • 07:51 jmm@cumin2002: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host netflow2003.codfw.wmnet with OS bookworm
  • 07:29 _joe_: restarting varnish-frontend on upload eqsin
  • 07:25 _joe_: running restart-cdn on cp5030
  • 07:22 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host netflow2003.codfw.wmnet with OS bookworm
  • 07:22 volans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
  • 07:19 moritzm: updated bookworm installer to RC2 T330495
  • 07:11 jmm@cumin2002: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host netflow2003.codfw.wmnet with OS bookworm
  • 07:05 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 07:02 volans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 06:50 moritzm: bounce ferm on aux-k8s-ctrl1001
  • 06:49 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host netflow2003.codfw.wmnet with OS bookworm
  • 06:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host netflow2003.codfw.wmnet with OS bookworm
  • 06:48 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 06:48 kart_: Deployed MinT to the production (T331505)
  • 06:47 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host netflow2003.codfw.wmnet with OS bookworm
  • 06:47 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 06:44 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 06:43 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 06:40 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 05:55 marostegui@deploy1002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc2014 to pc1 master" (duration: 27m 46s)
  • 05:46 phedenskog@deploy1002: Finished deploy [performance/navtiming@9b22d3b]: Measure largest contentful paint element type (duration: 00m 05s)
  • 05:46 phedenskog@deploy1002: Started deploy [performance/navtiming@9b22d3b]: Measure largest contentful paint element type
  • 05:42 marostegui@deploy1002: marostegui: Backport for Revert "ProductionServices.php: Promote pc2014 to pc1 master" synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 05:28 marostegui@deploy1002: Started scap: Backport for Revert "ProductionServices.php: Promote pc2014 to pc1 master"
  • 05:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on pc1014.eqiad.wmnet with reason: Maintenance
  • 05:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on pc1014.eqiad.wmnet with reason: Maintenance
  • 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113 (s5,s6) T336029', diff saved to https://phabricator.wikimedia.org/P47783 and previous config saved to /var/cache/conftool/dbconfig/20230508-051036-root.json
  • 05:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on dbproxy1013.eqiad.wmnet with reason: Maintenance
  • 05:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on dbproxy1013.eqiad.wmnet with reason: Maintenance
  • 04:54 marostegui: Deploy schema change on x1 eqiad wikishared with replication dbmaint T335834

2023-05-07

  • 00:54 sukhe: restart haproxy on cp1087: T334448

2023-05-06

  • 08:51 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: Install software version upgrade
  • 08:03 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade
  • 07:50 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Install software version upgrade
  • 07:44 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: Install software version upgrade
  • 07:07 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade
  • 06:50 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Install software version upgrade

2023-05-05

  • 23:24 tzatziki: removing emails from 230 users per self-requests
  • 18:57 brennen@deploy1002: Finished scap: Backport for Revert "api: Use RevisionStore::newRevisionsFromBatch to fetch revision records" (T336008 T336022) (duration: 14m 21s)
  • 18:44 brennen@deploy1002: umherirrender and brennen: Backport for Revert "api: Use RevisionStore::newRevisionsFromBatch to fetch revision records" (T336008 T336022) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 18:42 brennen@deploy1002: Started scap: Backport for Revert "api: Use RevisionStore::newRevisionsFromBatch to fetch revision records" (T336008 T336022)
  • 18:25 brennen: train 1.41.0-wmf.7 (T330213): trying revert for T336008, T336022
  • 17:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2011.codfw.wmnet with OS bullseye
  • 17:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2011.codfw.wmnet with reason: host reimage
  • 17:24 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2011.codfw.wmnet with reason: host reimage
  • 17:09 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 17:08 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
  • 16:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 16:52 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
  • 16:52 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 16:45 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 16:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
  • 16:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 16:44 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2011.codfw.wmnet with reason: host reimage
  • 16:35 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 16:35 btullis@cumin1001: Added views for new wiki: newiki T334041
  • 16:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 16:28 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2011.codfw.wmnet with OS bullseye
  • 16:20 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1023.eqiad.wmnet
  • 16:20 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:20 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1023.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 16:18 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1024.eqiad.wmnet
  • 16:17 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:17 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1023.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 16:16 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 16:15 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 16:13 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 16:10 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 16:10 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1024.eqiad.wmnet
  • 16:10 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
  • 16:10 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 16:08 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1023.eqiad.wmnet
  • 16:06 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 16:06 btullis@cumin1001: Added views for new wiki: zhwiki T334041
  • 16:03 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
  • 16:03 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 16:00 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
  • 16:00 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 15:51 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt1020.eqiad.wmnet
  • 15:51 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:50 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 15:48 btullis@cumin1001: END (PASS) - Cookbook sre.presto.reboot-workers (exit_code=0) for Presto analytics cluster: Reboot Presto nodes
  • 15:42 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1020.eqiad.wmnet
  • 15:41 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 15:41 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt1020.eqiad.wmnet
  • 15:41 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:41 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1020.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 15:40 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1019.eqiad.wmnet
  • 15:40 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:40 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1020.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 15:39 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 15:37 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 15:27 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1019.eqiad.wmnet
  • 15:27 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1020.eqiad.wmnet
  • 15:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 15:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 15:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T335845)', diff saved to https://phabricator.wikimedia.org/P47778 and previous config saved to /var/cache/conftool/dbconfig/20230505-152222-ladsgroup.json
  • 15:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P47777 and previous config saved to /var/cache/conftool/dbconfig/20230505-150716-ladsgroup.json
  • 15:06 mforns@deploy1002: Finished deploy [airflow-dags/analytics@11fa4e1]: (no justification provided) (duration: 00m 13s)
  • 15:06 mforns@deploy1002: Started deploy [airflow-dags/analytics@11fa4e1]: (no justification provided)
  • 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P47776 and previous config saved to /var/cache/conftool/dbconfig/20230505-145209-ladsgroup.json
  • 14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T335845)', diff saved to https://phabricator.wikimedia.org/P47774 and previous config saved to /var/cache/conftool/dbconfig/20230505-143703-ladsgroup.json
  • 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cuminunpriv1001.eqiad.wmnet
  • 14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1222 (T335845)', diff saved to https://phabricator.wikimedia.org/P47773 and previous config saved to /var/cache/conftool/dbconfig/20230505-142940-ladsgroup.json
  • 14:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 14:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T335845)', diff saved to https://phabricator.wikimedia.org/P47772 and previous config saved to /var/cache/conftool/dbconfig/20230505-142917-ladsgroup.json
  • 14:26 btullis@cumin1001: START - Cookbook sre.presto.reboot-workers for Presto analytics cluster: Reboot Presto nodes
  • 14:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cuminunpriv1001.eqiad.wmnet
  • 14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P47771 and previous config saved to /var/cache/conftool/dbconfig/20230505-141410-ladsgroup.json
  • 14:05 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 14:04 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
  • 14:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host lvs2011.codfw.wmnet
  • 13:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P47770 and previous config saved to /var/cache/conftool/dbconfig/20230505-135904-ladsgroup.json
  • 13:57 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 13:56 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
  • 13:56 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 13:48 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aux-k8s-worker1002.eqiad.wmnet with reason: New kernel, T335835
  • 13:48 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on aux-k8s-worker1002.eqiad.wmnet with reason: New kernel, T335835
  • 13:48 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aux-k8s-worker1001.eqiad.wmnet with reason: New kernel, T335835
  • 13:48 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on aux-k8s-worker1001.eqiad.wmnet with reason: New kernel, T335835
  • 13:48 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aux-k8s-etcd1003.eqiad.wmnet with reason: New kernel, T335835
  • 13:47 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on aux-k8s-etcd1003.eqiad.wmnet with reason: New kernel, T335835
  • 13:47 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aux-k8s-etcd1002.eqiad.wmnet with reason: New kernel, T335835
  • 13:47 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on aux-k8s-etcd1002.eqiad.wmnet with reason: New kernel, T335835
  • 13:47 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aux-k8s-etcd1001.eqiad.wmnet with reason: New kernel, T335835
  • 13:47 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on aux-k8s-etcd1001.eqiad.wmnet with reason: New kernel, T335835
  • 13:47 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aux-k8s-ctrl1002.eqiad.wmnet with reason: New kernel, T335835
  • 13:47 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on aux-k8s-ctrl1002.eqiad.wmnet with reason: New kernel, T335835
  • 13:47 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aux-k8s-ctrl1001.eqiad.wmnet with reason: New kernel, T335835
  • 13:47 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on aux-k8s-ctrl1001.eqiad.wmnet with reason: New kernel, T335835
  • 13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T335845)', diff saved to https://phabricator.wikimedia.org/P47769 and previous config saved to /var/cache/conftool/dbconfig/20230505-134358-ladsgroup.json
  • 13:39 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lists1003.wikimedia.org with reason: New kernel, T335835
  • 13:39 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lists1003.wikimedia.org with reason: New kernel, T335835
  • 13:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1197 (T335845)', diff saved to https://phabricator.wikimedia.org/P47768 and previous config saved to /var/cache/conftool/dbconfig/20230505-133631-ladsgroup.json
  • 13:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 13:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 13:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T335845)', diff saved to https://phabricator.wikimedia.org/P47767 and previous config saved to /var/cache/conftool/dbconfig/20230505-133556-ladsgroup.json
  • 13:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1005.eqiad.wmnet
  • 13:30 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mirror1001.wikimedia.org with reason: New kernel, T335835
  • 13:30 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mirror1001.wikimedia.org with reason: New kernel, T335835
  • 13:26 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx1001.wikimedia.org with reason: New kernel, T335835
  • 13:25 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx1001.wikimedia.org with reason: New kernel, T335835
  • 13:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1005.eqiad.wmnet
  • 13:24 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1004.eqiad.wmnet
  • 13:23 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx2001.wikimedia.org with reason: New kernel, T335835
  • 13:23 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx2001.wikimedia.org with reason: New kernel, T335835
  • 13:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P47766 and previous config saved to /var/cache/conftool/dbconfig/20230505-132050-ladsgroup.json
  • 13:14 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1004.eqiad.wmnet
  • 13:14 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1003.eqiad.wmnet
  • 13:13 andrewbogott: rebooting cloudbackup2001.codfw.wmnet, unresponsive
  • 13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P47765 and previous config saved to /var/cache/conftool/dbconfig/20230505-130544-ladsgroup.json
  • 13:05 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1003.eqiad.wmnet
  • 13:05 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1002.eqiad.wmnet
  • 12:57 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1002.eqiad.wmnet
  • 12:56 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1001.eqiad.wmnet
  • 12:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T335845)', diff saved to https://phabricator.wikimedia.org/P47764 and previous config saved to /var/cache/conftool/dbconfig/20230505-125038-ladsgroup.json
  • 12:46 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1001.eqiad.wmnet
  • 12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1188 (T335845)', diff saved to https://phabricator.wikimedia.org/P47763 and previous config saved to /var/cache/conftool/dbconfig/20230505-124412-ladsgroup.json
  • 12:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 12:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 12:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T335845)', diff saved to https://phabricator.wikimedia.org/P47762 and previous config saved to /var/cache/conftool/dbconfig/20230505-124349-ladsgroup.json
  • 12:31 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-mariadb1002.eqiad.wmnet
  • 12:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P47761 and previous config saved to /var/cache/conftool/dbconfig/20230505-122843-ladsgroup.json
  • 12:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-mariadb1002.eqiad.wmnet
  • 12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P47760 and previous config saved to /var/cache/conftool/dbconfig/20230505-121336-ladsgroup.json
  • 12:06 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-mariadb1001.eqiad.wmnet
  • 11:59 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-mariadb1001.eqiad.wmnet
  • 11:58 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-db1002.eqiad.wmnet
  • 11:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T335845)', diff saved to https://phabricator.wikimedia.org/P47759 and previous config saved to /var/cache/conftool/dbconfig/20230505-115830-ladsgroup.json
  • 11:52 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-db1002.eqiad.wmnet
  • 11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T335845)', diff saved to https://phabricator.wikimedia.org/P47758 and previous config saved to /var/cache/conftool/dbconfig/20230505-115126-ladsgroup.json
  • 11:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 11:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 11:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P47757 and previous config saved to /var/cache/conftool/dbconfig/20230505-112649-ladsgroup.json
  • 11:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P47756 and previous config saved to /var/cache/conftool/dbconfig/20230505-112605-ladsgroup.json
  • 11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P47755 and previous config saved to /var/cache/conftool/dbconfig/20230505-111145-ladsgroup.json
  • 11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P47754 and previous config saved to /var/cache/conftool/dbconfig/20230505-111100-ladsgroup.json
  • 10:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P47753 and previous config saved to /var/cache/conftool/dbconfig/20230505-105640-ladsgroup.json
  • 10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P47752 and previous config saved to /var/cache/conftool/dbconfig/20230505-105555-ladsgroup.json
  • 10:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P47751 and previous config saved to /var/cache/conftool/dbconfig/20230505-104135-ladsgroup.json
  • 10:41 moritzm: installing wireshark security updates
  • 10:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P47750 and previous config saved to /var/cache/conftool/dbconfig/20230505-104050-ladsgroup.json
  • 09:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1170.eqiad.wmnet with reason: Host sad (T336033)
  • 09:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1170.eqiad.wmnet with reason: Host sad (T336033)
  • 09:14 Amir1: power cycled db1170\
  • 09:10 marostegui: Failover m2-master from dbproxy1013 to dbproxy1015
  • 09:08 hnowlan@deploy1002: Finished deploy [restbase/deploy@8aba801]: deploying to host missing from configs (duration: 01m 22s)
  • 09:06 hnowlan@deploy1002: Started deploy [restbase/deploy@8aba801]: deploying to host missing from configs
  • 08:58 XioNoX: deploy CR914772 on all hosts running Bird
  • 08:15 godog: delete wal and chunks_head from prometheus5002 and prometheus4002 to let prometheus start back up and not crashloop - T309979
  • 08:07 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host netflow2003.codfw.wmnet with OS bookworm
  • 08:05 hashar@deploy1002: Finished deploy [integration/docroot@78e6f40]: build: Updating eslint-config-wikimedia to 0.25.0 (duration: 00m 13s)
  • 08:04 hashar@deploy1002: Started deploy [integration/docroot@78e6f40]: build: Updating eslint-config-wikimedia to 0.25.0
  • 07:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12 days, 12:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 07:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12 days, 12:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 07:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12 days, 12:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 07:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12 days, 12:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 06:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2005.codfw.wmnet
  • 06:51 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host netflow2003.codfw.wmnet with OS bookworm
  • 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host netflow2003.codfw.wmnet
  • 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netflow2003.codfw.wmnet - jmm@cumin2002"
  • 06:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2005.codfw.wmnet
  • 06:49 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netflow2003.codfw.wmnet - jmm@cumin2002"
  • 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2004.codfw.wmnet
  • 06:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2004.codfw.wmnet
  • 06:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org
  • 06:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 136907
  • 06:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netflow2003.codfw.wmnet on all recursors
  • 06:39 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netflow2003.codfw.wmnet on all recursors
  • 06:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netflow2003.codfw.wmnet - jmm@cumin2002"
  • 06:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org
  • 06:38 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netflow2003.codfw.wmnet - jmm@cumin2002"
  • 06:37 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 136907
  • 06:35 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 06:35 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host netflow2003.codfw.wmnet
  • 06:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2003.wikimedia.org
  • 06:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2003.wikimedia.org
  • 06:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org
  • 06:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org
  • 06:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1003.wikimedia.org
  • 06:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1003.wikimedia.org
  • 05:22 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835
  • 05:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T335845)', diff saved to https://phabricator.wikimedia.org/P47748 and previous config saved to /var/cache/conftool/dbconfig/20230505-050007-ladsgroup.json
  • 04:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P47747 and previous config saved to /var/cache/conftool/dbconfig/20230505-044500-ladsgroup.json
  • 04:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P47746 and previous config saved to /var/cache/conftool/dbconfig/20230505-042954-ladsgroup.json
  • 04:21 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835
  • 04:18 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835
  • 04:17 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835
  • 04:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T335845)', diff saved to https://phabricator.wikimedia.org/P47745 and previous config saved to /var/cache/conftool/dbconfig/20230505-041448-ladsgroup.json
  • 04:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T335845)', diff saved to https://phabricator.wikimedia.org/P47744 and previous config saved to /var/cache/conftool/dbconfig/20230505-040837-ladsgroup.json
  • 04:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 04:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 04:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T335845)', diff saved to https://phabricator.wikimedia.org/P47743 and previous config saved to /var/cache/conftool/dbconfig/20230505-040812-ladsgroup.json
  • 04:04 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835
  • 03:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P47742 and previous config saved to /var/cache/conftool/dbconfig/20230505-035306-ladsgroup.json
  • 03:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P47741 and previous config saved to /var/cache/conftool/dbconfig/20230505-033800-ladsgroup.json
  • 03:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T335845)', diff saved to https://phabricator.wikimedia.org/P47740 and previous config saved to /var/cache/conftool/dbconfig/20230505-032253-ladsgroup.json
  • 03:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T335845)', diff saved to https://phabricator.wikimedia.org/P47739 and previous config saved to /var/cache/conftool/dbconfig/20230505-031637-ladsgroup.json
  • 03:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P47738 and previous config saved to /var/cache/conftool/dbconfig/20230505-030130-ladsgroup.json
  • 02:54 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835
  • 02:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P47737 and previous config saved to /var/cache/conftool/dbconfig/20230505-024624-ladsgroup.json
  • 02:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 02:41 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835
  • 02:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T335845)', diff saved to https://phabricator.wikimedia.org/P47736 and previous config saved to /var/cache/conftool/dbconfig/20230505-023118-ladsgroup.json
  • 02:30 ryankemper@cumin1001: START - Cookbook sre.wdqs.reboot
  • 02:30 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 02:30 ryankemper@cumin1001: START - Cookbook sre.wdqs.reboot
  • 02:29 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835
  • 02:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T335845)', diff saved to https://phabricator.wikimedia.org/P47735 and previous config saved to /var/cache/conftool/dbconfig/20230505-022510-ladsgroup.json
  • 02:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T335845)', diff saved to https://phabricator.wikimedia.org/P47734 and previous config saved to /var/cache/conftool/dbconfig/20230505-022446-ladsgroup.json
  • 02:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 02:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 02:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T335845)', diff saved to https://phabricator.wikimedia.org/P47733 and previous config saved to /var/cache/conftool/dbconfig/20230505-022421-ladsgroup.json
  • 02:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P47732 and previous config saved to /var/cache/conftool/dbconfig/20230505-020915-ladsgroup.json
  • 01:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P47731 and previous config saved to /var/cache/conftool/dbconfig/20230505-015409-ladsgroup.json
  • 01:49 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2006.codfw.wmnet
  • 01:45 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM prometheus6002.drmrs.wmnet
  • 01:41 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet
  • 01:40 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1006.eqiad.wmnet
  • 01:39 denisse@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM prometheus6002.drmrs.wmnet
  • 01:39 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM prometheus5002.eqsin.wmnet
  • 01:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T335845)', diff saved to https://phabricator.wikimedia.org/P47730 and previous config saved to /var/cache/conftool/dbconfig/20230505-013903-ladsgroup.json
  • 01:32 denisse@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM prometheus5002.eqsin.wmnet
  • 01:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T335845)', diff saved to https://phabricator.wikimedia.org/P47729 and previous config saved to /var/cache/conftool/dbconfig/20230505-013232-ladsgroup.json
  • 01:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 01:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 01:32 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM prometheus4002.ulsfo.wmnet
  • 01:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T335845)', diff saved to https://phabricator.wikimedia.org/P47728 and previous config saved to /var/cache/conftool/dbconfig/20230505-013206-ladsgroup.json
  • 01:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T335845)', diff saved to https://phabricator.wikimedia.org/P47727 and previous config saved to /var/cache/conftool/dbconfig/20230505-013108-ladsgroup.json
  • 01:31 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1006.eqiad.wmnet
  • 01:30 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2005.codfw.wmnet
  • 01:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T335845)', diff saved to https://phabricator.wikimedia.org/P47726 and previous config saved to /var/cache/conftool/dbconfig/20230505-012950-ladsgroup.json
  • 01:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 01:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 01:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T335845)', diff saved to https://phabricator.wikimedia.org/P47725 and previous config saved to /var/cache/conftool/dbconfig/20230505-012927-ladsgroup.json
  • 01:26 denisse@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM prometheus4002.ulsfo.wmnet
  • 01:25 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM prometheus3002.esams.wmnet
  • 01:21 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2005.codfw.wmnet
  • 01:20 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1005.eqiad.wmnet
  • 01:18 denisse@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM prometheus3002.esams.wmnet
  • 01:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P47724 and previous config saved to /var/cache/conftool/dbconfig/20230505-011700-ladsgroup.json
  • 01:16 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1005.eqiad.wmnet
  • 01:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P47723 and previous config saved to /var/cache/conftool/dbconfig/20230505-011421-ladsgroup.json
  • 01:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P47722 and previous config saved to /var/cache/conftool/dbconfig/20230505-010154-ladsgroup.json
  • 00:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P47721 and previous config saved to /var/cache/conftool/dbconfig/20230505-005914-ladsgroup.json
  • 00:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T335845)', diff saved to https://phabricator.wikimedia.org/P47720 and previous config saved to /var/cache/conftool/dbconfig/20230505-004648-ladsgroup.json
  • 00:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T335845)', diff saved to https://phabricator.wikimedia.org/P47719 and previous config saved to /var/cache/conftool/dbconfig/20230505-004408-ladsgroup.json
  • 00:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T335845)', diff saved to https://phabricator.wikimedia.org/P47718 and previous config saved to /var/cache/conftool/dbconfig/20230505-003914-ladsgroup.json
  • 00:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 00:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 00:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 00:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 00:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T335845)', diff saved to https://phabricator.wikimedia.org/P47717 and previous config saved to /var/cache/conftool/dbconfig/20230505-003845-ladsgroup.json
  • 00:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T335845)', diff saved to https://phabricator.wikimedia.org/P47716 and previous config saved to /var/cache/conftool/dbconfig/20230505-003749-ladsgroup.json
  • 00:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 00:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 00:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 00:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 00:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 00:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 00:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T335845)', diff saved to https://phabricator.wikimedia.org/P47715 and previous config saved to /var/cache/conftool/dbconfig/20230505-003359-ladsgroup.json
  • 00:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P47714 and previous config saved to /var/cache/conftool/dbconfig/20230505-002339-ladsgroup.json
  • 00:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P47713 and previous config saved to /var/cache/conftool/dbconfig/20230505-001853-ladsgroup.json
  • 00:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P47712 and previous config saved to /var/cache/conftool/dbconfig/20230505-000832-ladsgroup.json
  • 00:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P47711 and previous config saved to /var/cache/conftool/dbconfig/20230505-000346-ladsgroup.json

2023-05-04

  • 23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T335845)', diff saved to https://phabricator.wikimedia.org/P47710 and previous config saved to /var/cache/conftool/dbconfig/20230504-235326-ladsgroup.json
  • 23:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T335845)', diff saved to https://phabricator.wikimedia.org/P47709 and previous config saved to /var/cache/conftool/dbconfig/20230504-234840-ladsgroup.json
  • 23:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T335845)', diff saved to https://phabricator.wikimedia.org/P47708 and previous config saved to /var/cache/conftool/dbconfig/20230504-234544-ladsgroup.json
  • 23:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 23:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 23:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T335845)', diff saved to https://phabricator.wikimedia.org/P47707 and previous config saved to /var/cache/conftool/dbconfig/20230504-234520-ladsgroup.json
  • 23:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T335845)', diff saved to https://phabricator.wikimedia.org/P47706 and previous config saved to /var/cache/conftool/dbconfig/20230504-234330-ladsgroup.json
  • 23:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 23:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 23:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T335845)', diff saved to https://phabricator.wikimedia.org/P47705 and previous config saved to /var/cache/conftool/dbconfig/20230504-234306-ladsgroup.json
  • 23:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P47704 and previous config saved to /var/cache/conftool/dbconfig/20230504-233013-ladsgroup.json
  • 23:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P47703 and previous config saved to /var/cache/conftool/dbconfig/20230504-232800-ladsgroup.json
  • 23:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P47702 and previous config saved to /var/cache/conftool/dbconfig/20230504-231507-ladsgroup.json
  • 23:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P47701 and previous config saved to /var/cache/conftool/dbconfig/20230504-231254-ladsgroup.json
  • 23:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T335845)', diff saved to https://phabricator.wikimedia.org/P47700 and previous config saved to /var/cache/conftool/dbconfig/20230504-230001-ladsgroup.json
  • 22:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T335845)', diff saved to https://phabricator.wikimedia.org/P47699 and previous config saved to /var/cache/conftool/dbconfig/20230504-225747-ladsgroup.json
  • 22:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T335845)', diff saved to https://phabricator.wikimedia.org/P47698 and previous config saved to /var/cache/conftool/dbconfig/20230504-225336-ladsgroup.json
  • 22:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 22:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 22:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1122 (T335845)', diff saved to https://phabricator.wikimedia.org/P47697 and previous config saved to /var/cache/conftool/dbconfig/20230504-225013-ladsgroup.json
  • 22:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 22:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 22:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 22:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 22:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 22:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 22:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T335838)', diff saved to https://phabricator.wikimedia.org/P47696 and previous config saved to /var/cache/conftool/dbconfig/20230504-224646-ladsgroup.json
  • 22:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P47695 and previous config saved to /var/cache/conftool/dbconfig/20230504-223139-ladsgroup.json
  • 22:22 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host lvs2011.codfw.wmnet
  • 22:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P47694 and previous config saved to /var/cache/conftool/dbconfig/20230504-221633-ladsgroup.json
  • 22:12 brennen@deploy1002: Finished scap: Backport for api: Use Status::isGood in ApiQueryRevisionsBase::getRevisionRecords (T336008) (duration: 09m 07s)
  • 22:04 brennen@deploy1002: brennen: Backport for api: Use Status::isGood in ApiQueryRevisionsBase::getRevisionRecords (T336008) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 22:03 brennen@deploy1002: Started scap: Backport for api: Use Status::isGood in ApiQueryRevisionsBase::getRevisionRecords (T336008)
  • 22:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T335838)', diff saved to https://phabricator.wikimedia.org/P47693 and previous config saved to /var/cache/conftool/dbconfig/20230504-220127-ladsgroup.json
  • 21:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1221 (T335838)', diff saved to https://phabricator.wikimedia.org/P47692 and previous config saved to /var/cache/conftool/dbconfig/20230504-215511-ladsgroup.json
  • 21:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 21:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 21:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T335838)', diff saved to https://phabricator.wikimedia.org/P47691 and previous config saved to /var/cache/conftool/dbconfig/20230504-215447-ladsgroup.json
  • 21:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P47690 and previous config saved to /var/cache/conftool/dbconfig/20230504-213941-ladsgroup.json
  • 21:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P47689 and previous config saved to /var/cache/conftool/dbconfig/20230504-212434-ladsgroup.json
  • 21:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T335838)', diff saved to https://phabricator.wikimedia.org/P47688 and previous config saved to /var/cache/conftool/dbconfig/20230504-210928-ladsgroup.json
  • 21:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T335845)', diff saved to https://phabricator.wikimedia.org/P47687 and previous config saved to /var/cache/conftool/dbconfig/20230504-210513-ladsgroup.json
  • 21:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1199 (T335838)', diff saved to https://phabricator.wikimedia.org/P47686 and previous config saved to /var/cache/conftool/dbconfig/20230504-210057-ladsgroup.json
  • 21:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 21:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 21:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T335838)', diff saved to https://phabricator.wikimedia.org/P47685 and previous config saved to /var/cache/conftool/dbconfig/20230504-210033-ladsgroup.json
  • 20:57 brennen@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.7 refs T330213 (duration: 06m 02s)
  • 20:51 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.7 refs T330213
  • 20:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P47684 and previous config saved to /var/cache/conftool/dbconfig/20230504-205007-ladsgroup.json
  • 20:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P47683 and previous config saved to /var/cache/conftool/dbconfig/20230504-204527-ladsgroup.json
  • 20:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P47682 and previous config saved to /var/cache/conftool/dbconfig/20230504-203501-ladsgroup.json
  • 20:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P47681 and previous config saved to /var/cache/conftool/dbconfig/20230504-203021-ladsgroup.json
  • 20:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T335845)', diff saved to https://phabricator.wikimedia.org/P47680 and previous config saved to /var/cache/conftool/dbconfig/20230504-201955-ladsgroup.json
  • 20:19 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:19 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:17 brennen@deploy1002: Finished scap: Backport for Fix file page integration (T335997) (duration: 10m 50s)
  • 20:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T335838)', diff saved to https://phabricator.wikimedia.org/P47679 and previous config saved to /var/cache/conftool/dbconfig/20230504-201514-ladsgroup.json
  • 20:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T335845)', diff saved to https://phabricator.wikimedia.org/P47678 and previous config saved to /var/cache/conftool/dbconfig/20230504-201332-ladsgroup.json
  • 20:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 20:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 20:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T335845)', diff saved to https://phabricator.wikimedia.org/P47677 and previous config saved to /var/cache/conftool/dbconfig/20230504-201306-ladsgroup.json
  • 20:08 brennen@deploy1002: brennen and jdlrobson: Backport for Fix file page integration (T335997) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 20:06 brennen@deploy1002: Started scap: Backport for Fix file page integration (T335997)
  • 20:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1190 (T335838)', diff saved to https://phabricator.wikimedia.org/P47676 and previous config saved to /var/cache/conftool/dbconfig/20230504-200644-ladsgroup.json
  • 20:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 20:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 20:06 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:06 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:03 mutante: miscweb1003 - rebooting
  • 20:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on miscweb1003.eqiad.wmnet with reason: reboot
  • 20:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on miscweb1003.eqiad.wmnet with reason: reboot
  • 20:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 20:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 20:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T335838)', diff saved to https://phabricator.wikimedia.org/P47675 and previous config saved to /var/cache/conftool/dbconfig/20230504-200141-ladsgroup.json
  • 20:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T335845)', diff saved to https://phabricator.wikimedia.org/P47674 and previous config saved to /var/cache/conftool/dbconfig/20230504-200131-ladsgroup.json
  • 20:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on people2002.codfw.wmnet with reason: maintenance upgrade
  • 20:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on people2002.codfw.wmnet with reason: maintenance upgrade
  • 20:00 mutante: people2002 (people.wikimedia.org) reboot, <1 min downtime
  • 19:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P47673 and previous config saved to /var/cache/conftool/dbconfig/20230504-195800-ladsgroup.json
  • 19:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1054.eqiad.wmnet
  • 19:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P47672 and previous config saved to /var/cache/conftool/dbconfig/20230504-194635-ladsgroup.json
  • 19:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P47671 and previous config saved to /var/cache/conftool/dbconfig/20230504-194624-ladsgroup.json
  • 19:45 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2055.codfw.wmnet
  • 19:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P47670 and previous config saved to /var/cache/conftool/dbconfig/20230504-194254-ladsgroup.json
  • 19:42 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1054.eqiad.wmnet
  • 19:38 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2055.codfw.wmnet
  • 19:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P47669 and previous config saved to /var/cache/conftool/dbconfig/20230504-193129-ladsgroup.json
  • 19:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P47668 and previous config saved to /var/cache/conftool/dbconfig/20230504-193118-ladsgroup.json
  • 19:29 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1053.eqiad.wmnet
  • 19:28 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2054.codfw.wmnet
  • 19:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T335845)', diff saved to https://phabricator.wikimedia.org/P47667 and previous config saved to /var/cache/conftool/dbconfig/20230504-192747-ladsgroup.json
  • 19:23 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1053.eqiad.wmnet
  • 19:21 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2054.codfw.wmnet
  • 19:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T335838)', diff saved to https://phabricator.wikimedia.org/P47666 and previous config saved to /var/cache/conftool/dbconfig/20230504-191623-ladsgroup.json
  • 19:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T335845)', diff saved to https://phabricator.wikimedia.org/P47665 and previous config saved to /var/cache/conftool/dbconfig/20230504-191612-ladsgroup.json
  • 19:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T335845)', diff saved to https://phabricator.wikimedia.org/P47664 and previous config saved to /var/cache/conftool/dbconfig/20230504-191528-ladsgroup.json
  • 19:11 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2053.codfw.wmnet
  • 19:10 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1052.eqiad.wmnet
  • 19:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T335838)', diff saved to https://phabricator.wikimedia.org/P47663 and previous config saved to /var/cache/conftool/dbconfig/20230504-191001-ladsgroup.json
  • 19:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 19:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 19:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T335838)', diff saved to https://phabricator.wikimedia.org/P47662 and previous config saved to /var/cache/conftool/dbconfig/20230504-190937-ladsgroup.json
  • 19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T335845)', diff saved to https://phabricator.wikimedia.org/P47661 and previous config saved to /var/cache/conftool/dbconfig/20230504-190757-ladsgroup.json
  • 19:04 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2053.codfw.wmnet
  • 19:04 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1052.eqiad.wmnet
  • 19:02 fab@deploy1002: Finished deploy [airflow-dags/research@88ebdf7]: (no justification provided) (duration: 00m 03s)
  • 19:02 fab@deploy1002: Started deploy [airflow-dags/research@88ebdf7]: (no justification provided)
  • 19:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P47660 and previous config saved to /var/cache/conftool/dbconfig/20230504-190022-ladsgroup.json
  • 18:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P47659 and previous config saved to /var/cache/conftool/dbconfig/20230504-185431-ladsgroup.json
  • 18:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2052.codfw.wmnet
  • 18:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P47658 and previous config saved to /var/cache/conftool/dbconfig/20230504-185250-ladsgroup.json
  • 18:50 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1051.eqiad.wmnet
  • 18:48 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2052.codfw.wmnet
  • 18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P47657 and previous config saved to /var/cache/conftool/dbconfig/20230504-184516-ladsgroup.json
  • 18:44 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1051.eqiad.wmnet
  • 18:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P47656 and previous config saved to /var/cache/conftool/dbconfig/20230504-183925-ladsgroup.json
  • 18:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2051.codfw.wmnet
  • 18:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P47655 and previous config saved to /var/cache/conftool/dbconfig/20230504-183744-ladsgroup.json
  • 18:37 fab@deploy1002: Finished deploy [airflow-dags/research@88ebdf7]: (no justification provided) (duration: 00m 09s)
  • 18:37 fab@deploy1002: Started deploy [airflow-dags/research@88ebdf7]: (no justification provided)
  • 18:31 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2051.codfw.wmnet
  • 18:31 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1050.eqiad.wmnet
  • 18:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T335845)', diff saved to https://phabricator.wikimedia.org/P47654 and previous config saved to /var/cache/conftool/dbconfig/20230504-183010-ladsgroup.json
  • 18:28 cmooney@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
  • 18:24 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1050.eqiad.wmnet
  • 18:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T335838)', diff saved to https://phabricator.wikimedia.org/P47653 and previous config saved to /var/cache/conftool/dbconfig/20230504-182418-ladsgroup.json
  • 18:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3318 (T335845)', diff saved to https://phabricator.wikimedia.org/P47652 and previous config saved to /var/cache/conftool/dbconfig/20230504-182301-ladsgroup.json
  • 18:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T335845)', diff saved to https://phabricator.wikimedia.org/P47651 and previous config saved to /var/cache/conftool/dbconfig/20230504-182238-ladsgroup.json
  • 18:21 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2050.codfw.wmnet
  • 18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T335845)', diff saved to https://phabricator.wikimedia.org/P47650 and previous config saved to /var/cache/conftool/dbconfig/20230504-182139-ladsgroup.json
  • 18:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 18:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T335845)', diff saved to https://phabricator.wikimedia.org/P47649 and previous config saved to /var/cache/conftool/dbconfig/20230504-182114-ladsgroup.json
  • 18:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T335838)', diff saved to https://phabricator.wikimedia.org/P47648 and previous config saved to /var/cache/conftool/dbconfig/20230504-181851-ladsgroup.json
  • 18:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 18:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 18:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T335838)', diff saved to https://phabricator.wikimedia.org/P47647 and previous config saved to /var/cache/conftool/dbconfig/20230504-181828-ladsgroup.json
  • 18:17 cmooney@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 18:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3316 (T335845)', diff saved to https://phabricator.wikimedia.org/P47646 and previous config saved to /var/cache/conftool/dbconfig/20230504-181636-ladsgroup.json
  • 18:16 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.7 refs T330213
  • 18:15 cmooney@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs2011']
  • 18:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3315 (T335845)', diff saved to https://phabricator.wikimedia.org/P47645 and previous config saved to /var/cache/conftool/dbconfig/20230504-181516-ladsgroup.json
  • 18:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 18:14 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2050.codfw.wmnet
  • 18:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 18:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T335845)', diff saved to https://phabricator.wikimedia.org/P47644 and previous config saved to /var/cache/conftool/dbconfig/20230504-181451-ladsgroup.json
  • 18:14 cmooney@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2011']
  • 18:13 cmooney@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs2011']
  • 18:12 fab@deploy1002: Finished deploy [airflow-dags/research@88ebdf7]: (no justification provided) (duration: 00m 28s)
  • 18:12 fab@deploy1002: Started deploy [airflow-dags/research@88ebdf7]: (no justification provided)
  • 18:12 cmooney@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2011']
  • 18:11 cmooney@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
  • 18:11 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1049.eqiad.wmnet
  • 18:08 brennen: train 1.41.0-wmf.7 (T330213): logs fairly quiet and no current blockers, rolling to group2
  • 18:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P47643 and previous config saved to /var/cache/conftool/dbconfig/20230504-180608-ladsgroup.json
  • 18:05 cmooney@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 18:04 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2049.codfw.wmnet
  • 18:04 cmooney@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
  • 18:04 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1049.eqiad.wmnet
  • 18:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P47642 and previous config saved to /var/cache/conftool/dbconfig/20230504-180322-ladsgroup.json
  • 17:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P47641 and previous config saved to /var/cache/conftool/dbconfig/20230504-175945-ladsgroup.json
  • 17:59 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2049.codfw.wmnet
  • 17:54 cmooney@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 17:53 cmooney@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
  • 17:51 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1048.eqiad.wmnet
  • 17:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P47640 and previous config saved to /var/cache/conftool/dbconfig/20230504-175102-ladsgroup.json
  • 17:48 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2048.codfw.wmnet
  • 17:48 cmooney@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 17:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P47639 and previous config saved to /var/cache/conftool/dbconfig/20230504-174815-ladsgroup.json
  • 17:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P47638 and previous config saved to /var/cache/conftool/dbconfig/20230504-174438-ladsgroup.json
  • 17:44 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1048.eqiad.wmnet
  • 17:42 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2048.codfw.wmnet
  • 17:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T335838)', diff saved to https://phabricator.wikimedia.org/P47637 and previous config saved to /var/cache/conftool/dbconfig/20230504-174040-ladsgroup.json
  • 17:37 cmooney@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
  • 17:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T335845)', diff saved to https://phabricator.wikimedia.org/P47635 and previous config saved to /var/cache/conftool/dbconfig/20230504-173555-ladsgroup.json
  • 17:35 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aphlict1002.eqiad.wmnet
  • 17:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T335838)', diff saved to https://phabricator.wikimedia.org/P47634 and previous config saved to /var/cache/conftool/dbconfig/20230504-173309-ladsgroup.json
  • 17:32 cmooney@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 17:32 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2047.codfw.wmnet
  • 17:32 cmooney@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
  • 17:31 eoghan@cumin1001: START - Cookbook sre.hosts.reboot-single for host aphlict1002.eqiad.wmnet
  • 17:31 mutante: people1003 - rebooting
  • 17:31 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1047.eqiad.wmnet
  • 17:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on people1003.eqiad.wmnet with reason: maintenance upgrade
  • 17:30 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on people1003.eqiad.wmnet with reason: maintenance upgrade
  • 17:30 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aphlict2001.codfw.wmnet
  • 17:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T335845)', diff saved to https://phabricator.wikimedia.org/P47633 and previous config saved to /var/cache/conftool/dbconfig/20230504-172932-ladsgroup.json
  • 17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T335845)', diff saved to https://phabricator.wikimedia.org/P47632 and previous config saved to /var/cache/conftool/dbconfig/20230504-172835-ladsgroup.json
  • 17:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 17:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 17:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 17:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T335845)', diff saved to https://phabricator.wikimedia.org/P47631 and previous config saved to /var/cache/conftool/dbconfig/20230504-172806-ladsgroup.json
  • 17:26 eoghan@cumin1001: START - Cookbook sre.hosts.reboot-single for host aphlict2001.codfw.wmnet
  • 17:25 cmooney@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 17:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T335838)', diff saved to https://phabricator.wikimedia.org/P47630 and previous config saved to /var/cache/conftool/dbconfig/20230504-172546-ladsgroup.json
  • 17:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 17:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P47629 and previous config saved to /var/cache/conftool/dbconfig/20230504-172534-ladsgroup.json
  • 17:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 17:25 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2047.codfw.wmnet
  • 17:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T335838)', diff saved to https://phabricator.wikimedia.org/P47628 and previous config saved to /var/cache/conftool/dbconfig/20230504-172523-ladsgroup.json
  • 17:24 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1047.eqiad.wmnet
  • 17:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1200 (T335845)', diff saved to https://phabricator.wikimedia.org/P47627 and previous config saved to /var/cache/conftool/dbconfig/20230504-172228-ladsgroup.json
  • 17:22 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs10[11-21].eqiad.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 17:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 17:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 17:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T335845)', diff saved to https://phabricator.wikimedia.org/P47626 and previous config saved to /var/cache/conftool/dbconfig/20230504-172204-ladsgroup.json
  • 17:16 mutante: aphlict2001 - not active, rebooting
  • 17:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2046.codfw.wmnet
  • 17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P47625 and previous config saved to /var/cache/conftool/dbconfig/20230504-171300-ladsgroup.json
  • 17:11 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1046.eqiad.wmnet
  • 17:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P47624 and previous config saved to /var/cache/conftool/dbconfig/20230504-171028-ladsgroup.json
  • 17:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P47623 and previous config saved to /var/cache/conftool/dbconfig/20230504-171017-ladsgroup.json
  • 17:09 brennen: phab1004 deployed and restarted, phab up, MR widget still seems to work
  • 17:08 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2046.codfw.wmnet
  • 17:08 brennen@deploy1002: Finished deploy [phabricator/deployment@0529926]: deploy latest state to phab1004 (duration: 00m 34s)
  • 17:07 brennen@deploy1002: Started deploy [phabricator/deployment@0529926]: deploy latest state to phab1004
  • 17:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P47622 and previous config saved to /var/cache/conftool/dbconfig/20230504-170658-ladsgroup.json
  • 17:05 brennen@deploy1002: Finished deploy [phabricator/deployment@0529926]: deploy latest state to phab2002 (duration: 00m 37s)
  • 17:05 brennen@deploy1002: Started deploy [phabricator/deployment@0529926]: deploy latest state to phab2002
  • 17:04 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1046.eqiad.wmnet
  • 17:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: maintenance upgrade
  • 17:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: maintenance upgrade
  • 17:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: maintenance upgrade
  • 17:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: maintenance upgrade
  • 17:01 mutante: Phabricator upgrade - maintenance incoming
  • 17:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host lvs2011.codfw.wmnet
  • 16:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2045.codfw.wmnet
  • 16:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P47621 and previous config saved to /var/cache/conftool/dbconfig/20230504-165753-ladsgroup.json
  • 16:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T335838)', diff saved to https://phabricator.wikimedia.org/P47620 and previous config saved to /var/cache/conftool/dbconfig/20230504-165521-ladsgroup.json
  • 16:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P47619 and previous config saved to /var/cache/conftool/dbconfig/20230504-165511-ladsgroup.json
  • 16:52 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2045.codfw.wmnet
  • 16:52 sbassett@deploy1002: Finished scap: Backport for Re-enable the Graph extension on test2wiki (T334940) (duration: 07m 04s)
  • 16:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P47618 and previous config saved to /var/cache/conftool/dbconfig/20230504-165152-ladsgroup.json
  • 16:51 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1045.eqiad.wmnet
  • 16:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2179 (T335838)', diff saved to https://phabricator.wikimedia.org/P47617 and previous config saved to /var/cache/conftool/dbconfig/20230504-164850-ladsgroup.json
  • 16:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 16:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 16:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T335838)', diff saved to https://phabricator.wikimedia.org/P47616 and previous config saved to /var/cache/conftool/dbconfig/20230504-164826-ladsgroup.json
  • 16:46 sbassett@deploy1002: sbassett: Backport for Re-enable the Graph extension on test2wiki (T334940) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 16:46 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1045.eqiad.wmnet
  • 16:45 hnowlan@puppetmaster1001: conftool action : set/weight=5; selector: service=thumbor,name=thumbor100[1256].eqiad.wmnet
  • 16:45 sbassett@deploy1002: Started scap: Backport for Re-enable the Graph extension on test2wiki (T334940)
  • 16:44 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor100[1256].eqiad.wmnet
  • 16:42 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2044.codfw.wmnet
  • 16:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T335845)', diff saved to https://phabricator.wikimedia.org/P47615 and previous config saved to /var/cache/conftool/dbconfig/20230504-164247-ladsgroup.json
  • 16:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T335838)', diff saved to https://phabricator.wikimedia.org/P47614 and previous config saved to /var/cache/conftool/dbconfig/20230504-164004-ladsgroup.json
  • 16:39 jynus: extending logical volume of backup1003, backup2003 for backup storage
  • 16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T335845)', diff saved to https://phabricator.wikimedia.org/P47613 and previous config saved to /var/cache/conftool/dbconfig/20230504-163646-ladsgroup.json
  • 16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T335845)', diff saved to https://phabricator.wikimedia.org/P47612 and previous config saved to /var/cache/conftool/dbconfig/20230504-163626-ladsgroup.json
  • 16:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 16:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 16:36 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2044.codfw.wmnet
  • 16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T335845)', diff saved to https://phabricator.wikimedia.org/P47611 and previous config saved to /var/cache/conftool/dbconfig/20230504-163601-ladsgroup.json
  • 16:34 mutante: etherpad1003 (https://etherpad.wikimedia.org) rebooting, 1 min downtime
  • 16:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on etherpad1003.eqiad.wmnet with reason: reboot
  • 16:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on etherpad1003.eqiad.wmnet with reason: reboot
  • 16:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P47610 and previous config saved to /var/cache/conftool/dbconfig/20230504-163319-ladsgroup.json
  • 16:32 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1044.eqiad.wmnet
  • 16:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T335838)', diff saved to https://phabricator.wikimedia.org/P47609 and previous config saved to /var/cache/conftool/dbconfig/20230504-163149-ladsgroup.json
  • 16:30 sukhe@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767 (duration: 152m 23s)
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1185 (T335845)', diff saved to https://phabricator.wikimedia.org/P47608 and previous config saved to /var/cache/conftool/dbconfig/20230504-162926-ladsgroup.json
  • 16:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 16:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T335845)', diff saved to https://phabricator.wikimedia.org/P47607 and previous config saved to /var/cache/conftool/dbconfig/20230504-162902-ladsgroup.json
  • 16:28 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on gerrit1003.wikimedia.org with reason: reboot
  • 16:27 mutante: gerrit1003 (gerrit-new.wikimedia.org) - rebooting
  • 16:27 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on gerrit1003.wikimedia.org with reason: reboot
  • 16:26 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1044.eqiad.wmnet
  • 16:26 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2043.codfw.wmnet
  • 16:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2002.codfw.wmnet
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P47606 and previous config saved to /var/cache/conftool/dbconfig/20230504-162055-ladsgroup.json
  • 16:19 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2043.codfw.wmnet
  • 16:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P47605 and previous config saved to /var/cache/conftool/dbconfig/20230504-161813-ladsgroup.json
  • 16:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P47604 and previous config saved to /var/cache/conftool/dbconfig/20230504-161643-ladsgroup.json
  • 16:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2002.codfw.wmnet
  • 16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P47603 and previous config saved to /var/cache/conftool/dbconfig/20230504-161356-ladsgroup.json
  • 16:13 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1043.eqiad.wmnet
  • 16:12 mutante: doc1003 - rebooting
  • 16:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1002.eqiad.wmnet
  • 16:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on doc1002.eqiad.wmnet with reason: reboot
  • 16:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on doc1002.eqiad.wmnet with reason: reboot
  • 16:10 mutante: doc1002 (https://doc.wikimedia.org) - reboot, <1 min downtime
  • 16:09 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2042.codfw.wmnet
  • 16:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1002.eqiad.wmnet
  • 16:06 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1043.eqiad.wmnet
  • 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P47602 and previous config saved to /var/cache/conftool/dbconfig/20230504-160547-ladsgroup.json
  • 16:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2002.codfw.wmnet
  • 16:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T335838)', diff saved to https://phabricator.wikimedia.org/P47601 and previous config saved to /var/cache/conftool/dbconfig/20230504-160307-ladsgroup.json
  • 16:02 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2042.codfw.wmnet
  • 16:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P47600 and previous config saved to /var/cache/conftool/dbconfig/20230504-160136-ladsgroup.json
  • 16:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2002.codfw.wmnet
  • 15:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2002.wikimedia.org
  • 15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P47599 and previous config saved to /var/cache/conftool/dbconfig/20230504-155850-ladsgroup.json
  • 15:58 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs10[11-21].eqiad.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 15:57 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs20[02-12].codfw.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 15:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2172 (T335838)', diff saved to https://phabricator.wikimedia.org/P47598 and previous config saved to /var/cache/conftool/dbconfig/20230504-155544-ladsgroup.json
  • 15:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 15:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 15:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T335838)', diff saved to https://phabricator.wikimedia.org/P47597 and previous config saved to /var/cache/conftool/dbconfig/20230504-155518-ladsgroup.json
  • 15:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2002.wikimedia.org
  • 15:54 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 15:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1002.wikimedia.org
  • 15:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1042.eqiad.wmnet
  • 15:52 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2041.codfw.wmnet
  • 15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T335845)', diff saved to https://phabricator.wikimedia.org/P47596 and previous config saved to /var/cache/conftool/dbconfig/20230504-155041-ladsgroup.json
  • 15:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1002.wikimedia.org
  • 15:47 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1042.eqiad.wmnet
  • 15:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T335838)', diff saved to https://phabricator.wikimedia.org/P47595 and previous config saved to /var/cache/conftool/dbconfig/20230504-154630-ladsgroup.json
  • 15:45 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2041.codfw.wmnet
  • 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T335845)', diff saved to https://phabricator.wikimedia.org/P47594 and previous config saved to /var/cache/conftool/dbconfig/20230504-154344-ladsgroup.json
  • 15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T335845)', diff saved to https://phabricator.wikimedia.org/P47593 and previous config saved to /var/cache/conftool/dbconfig/20230504-154211-ladsgroup.json
  • 15:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 15:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 15:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T335845)', diff saved to https://phabricator.wikimedia.org/P47592 and previous config saved to /var/cache/conftool/dbconfig/20230504-154146-ladsgroup.json
  • 15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T335838)', diff saved to https://phabricator.wikimedia.org/P47591 and previous config saved to /var/cache/conftool/dbconfig/20230504-154021-ladsgroup.json
  • 15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P47590 and previous config saved to /var/cache/conftool/dbconfig/20230504-154012-ladsgroup.json
  • 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T335838)', diff saved to https://phabricator.wikimedia.org/P47589 and previous config saved to /var/cache/conftool/dbconfig/20230504-153850-ladsgroup.json
  • 15:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1183 (T335845)', diff saved to https://phabricator.wikimedia.org/P47588 and previous config saved to /var/cache/conftool/dbconfig/20230504-153834-ladsgroup.json
  • 15:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 15:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T335838)', diff saved to https://phabricator.wikimedia.org/P47587 and previous config saved to /var/cache/conftool/dbconfig/20230504-153825-ladsgroup.json
  • 15:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T335845)', diff saved to https://phabricator.wikimedia.org/P47586 and previous config saved to /var/cache/conftool/dbconfig/20230504-153810-ladsgroup.json
  • 15:38 mutante: doc2002 - rebooting
  • 15:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2040.codfw.wmnet
  • 15:33 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1041.eqiad.wmnet
  • 15:33 mutante: moscovium (https://rt.wikimedia.org) - rebooting
  • 15:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on moscovium.eqiad.wmnet with reason: reboot
  • 15:32 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on moscovium.eqiad.wmnet with reason: reboot
  • 15:27 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1041.eqiad.wmnet
  • 15:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P47585 and previous config saved to /var/cache/conftool/dbconfig/20230504-152640-ladsgroup.json
  • 15:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P47584 and previous config saved to /var/cache/conftool/dbconfig/20230504-152506-ladsgroup.json
  • 15:24 marostegui: Failover m1-master from dbproxy1012 to dbproxy1014
  • 15:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P47583 and previous config saved to /var/cache/conftool/dbconfig/20230504-152319-ladsgroup.json
  • 15:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P47582 and previous config saved to /var/cache/conftool/dbconfig/20230504-152304-ladsgroup.json
  • 15:21 mutante: adding new project langauge 'gpe' - https://en.wikipedia.org/wiki/Ghanaian_Pidgin_English
  • 15:13 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1040.eqiad.wmnet
  • 15:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P47581 and previous config saved to /var/cache/conftool/dbconfig/20230504-151133-ladsgroup.json
  • 15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T335838)', diff saved to https://phabricator.wikimedia.org/P47580 and previous config saved to /var/cache/conftool/dbconfig/20230504-151000-ladsgroup.json
  • 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P47579 and previous config saved to /var/cache/conftool/dbconfig/20230504-150813-ladsgroup.json
  • 15:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P47578 and previous config saved to /var/cache/conftool/dbconfig/20230504-150758-ladsgroup.json
  • 15:07 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1040.eqiad.wmnet
  • 15:03 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host lvs2011.codfw.wmnet
  • 15:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host lvs2011.codfw.wmnet
  • 15:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2155 (T335838)', diff saved to https://phabricator.wikimedia.org/P47576 and previous config saved to /var/cache/conftool/dbconfig/20230504-150336-ladsgroup.json
  • 15:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 15:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 15:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 15:03 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host lvs2011.codfw.wmnet
  • 15:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 15:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T335838)', diff saved to https://phabricator.wikimedia.org/P47575 and previous config saved to /var/cache/conftool/dbconfig/20230504-150307-ladsgroup.json
  • 14:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T335845)', diff saved to https://phabricator.wikimedia.org/P47574 and previous config saved to /var/cache/conftool/dbconfig/20230504-145627-ladsgroup.json
  • 14:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1039.eqiad.wmnet
  • 14:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T335838)', diff saved to https://phabricator.wikimedia.org/P47573 and previous config saved to /var/cache/conftool/dbconfig/20230504-145307-ladsgroup.json
  • 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T335845)', diff saved to https://phabricator.wikimedia.org/P47572 and previous config saved to /var/cache/conftool/dbconfig/20230504-145251-ladsgroup.json
  • 14:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T335838)', diff saved to https://phabricator.wikimedia.org/P47571 and previous config saved to /var/cache/conftool/dbconfig/20230504-145153-ladsgroup.json
  • 14:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T335845)', diff saved to https://phabricator.wikimedia.org/P47570 and previous config saved to /var/cache/conftool/dbconfig/20230504-144852-ladsgroup.json
  • 14:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 14:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 14:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T335845)', diff saved to https://phabricator.wikimedia.org/P47569 and previous config saved to /var/cache/conftool/dbconfig/20230504-144827-ladsgroup.json
  • 14:48 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1039.eqiad.wmnet
  • 14:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P47568 and previous config saved to /var/cache/conftool/dbconfig/20230504-144801-ladsgroup.json
  • 14:47 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
  • 14:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T335845)', diff saved to https://phabricator.wikimedia.org/P47567 and previous config saved to /var/cache/conftool/dbconfig/20230504-144625-ladsgroup.json
  • 14:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 14:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 14:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 14:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 14:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T335845)', diff saved to https://phabricator.wikimedia.org/P47566 and previous config saved to /var/cache/conftool/dbconfig/20230504-144110-ladsgroup.json
  • 14:40 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 14:40 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2011.codfw.wmnet with OS bullseye
  • 14:38 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2040.codfw.wmnet
  • 14:36 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs20[02-12].codfw.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 14:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P47565 and previous config saved to /var/cache/conftool/dbconfig/20230504-143647-ladsgroup.json
  • 14:35 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 14:34 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1038.eqiad.wmnet
  • 14:34 eevans@cumin1001: END (ERROR) - Cookbook sre.cassandra.roll-restart (exit_code=97) for nodes matching aqs20[02-12].codfw.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 14:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P47564 and previous config saved to /var/cache/conftool/dbconfig/20230504-143320-ladsgroup.json
  • 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P47563 and previous config saved to /var/cache/conftool/dbconfig/20230504-143255-ladsgroup.json
  • 14:28 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1038.eqiad.wmnet
  • 14:28 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2039.codfw.wmnet
  • 14:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P47562 and previous config saved to /var/cache/conftool/dbconfig/20230504-142604-ladsgroup.json
  • 14:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P47561 and previous config saved to /var/cache/conftool/dbconfig/20230504-142140-ladsgroup.json
  • 14:21 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2039.codfw.wmnet
  • 14:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P47560 and previous config saved to /var/cache/conftool/dbconfig/20230504-141814-ladsgroup.json
  • 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T335838)', diff saved to https://phabricator.wikimedia.org/P47559 and previous config saved to /var/cache/conftool/dbconfig/20230504-141749-ladsgroup.json
  • 14:17 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubetcd1004.eqiad.wmnet
  • 14:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1037.eqiad.wmnet
  • 14:13 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubetcd1004.eqiad.wmnet
  • 14:12 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubetcd1005.eqiad.wmnet
  • 14:12 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs20[02-12].codfw.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 14:11 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2038.codfw.wmnet
  • 14:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P47558 and previous config saved to /var/cache/conftool/dbconfig/20230504-141057-ladsgroup.json
  • 14:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2147 (T335838)', diff saved to https://phabricator.wikimedia.org/P47557 and previous config saved to /var/cache/conftool/dbconfig/20230504-141024-ladsgroup.json
  • 14:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 14:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 14:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T335838)', diff saved to https://phabricator.wikimedia.org/P47556 and previous config saved to /var/cache/conftool/dbconfig/20230504-140958-ladsgroup.json
  • 14:09 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1037.eqiad.wmnet
  • 14:08 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubetcd1005.eqiad.wmnet
  • 14:07 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubetcd1006.eqiad.wmnet
  • 14:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T335838)', diff saved to https://phabricator.wikimedia.org/P47555 and previous config saved to /var/cache/conftool/dbconfig/20230504-140634-ladsgroup.json
  • 14:04 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2038.codfw.wmnet
  • 14:03 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubetcd1006.eqiad.wmnet
  • 14:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T335845)', diff saved to https://phabricator.wikimedia.org/P47554 and previous config saved to /var/cache/conftool/dbconfig/20230504-140308-ladsgroup.json
  • 14:01 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubetcd2004.codfw.wmnet
  • 14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T335838)', diff saved to https://phabricator.wikimedia.org/P47553 and previous config saved to /var/cache/conftool/dbconfig/20230504-140012-ladsgroup.json
  • 13:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T335838)', diff saved to https://phabricator.wikimedia.org/P47552 and previous config saved to /var/cache/conftool/dbconfig/20230504-135845-ladsgroup.json
  • 13:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 13:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 13:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T335838)', diff saved to https://phabricator.wikimedia.org/P47551 and previous config saved to /var/cache/conftool/dbconfig/20230504-135821-ladsgroup.json
  • 13:58 sukhe@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767
  • 13:57 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubetcd2004.codfw.wmnet
  • 13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T335845)', diff saved to https://phabricator.wikimedia.org/P47550 and previous config saved to /var/cache/conftool/dbconfig/20230504-135637-ladsgroup.json
  • 13:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 13:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T335845)', diff saved to https://phabricator.wikimedia.org/P47549 and previous config saved to /var/cache/conftool/dbconfig/20230504-135612-ladsgroup.json
  • 13:56 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubetcd2005.codfw.wmnet
  • 13:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T335845)', diff saved to https://phabricator.wikimedia.org/P47548 and previous config saved to /var/cache/conftool/dbconfig/20230504-135551-ladsgroup.json
  • 13:54 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:54 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Make wbsubscribers API output sensible on Test Wikidata (T300458) (duration: 09m 52s)
  • 13:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P47547 and previous config saved to /var/cache/conftool/dbconfig/20230504-135452-ladsgroup.json
  • 13:53 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubetcd2005.codfw.wmnet
  • 13:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubetcd2006.codfw.wmnet
  • 13:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T335845)', diff saved to https://phabricator.wikimedia.org/P47546 and previous config saved to /var/cache/conftool/dbconfig/20230504-135135-ladsgroup.json
  • 13:48 herron: switching to bullseye kafka monitoring hosts T335424
  • 13:48 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubetcd2006.codfw.wmnet
  • 13:47 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagetcd1004.eqiad.wmnet
  • 13:47 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde: Backport for Make wbsubscribers API output sensible on Test Wikidata (T300458) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 13:45 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Make wbsubscribers API output sensible on Test Wikidata (T300458)
  • 13:43 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagetcd1004.eqiad.wmnet
  • 13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P47545 and previous config saved to /var/cache/conftool/dbconfig/20230504-134315-ladsgroup.json
  • 13:42 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagetcd1005.eqiad.wmnet
  • 13:41 jdrewniak@deploy1002: Finished scap: Backport for Enable Vector 2022 as the default skin on frwikinews (T335686) (duration: 07m 47s)
  • 13:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P47544 and previous config saved to /var/cache/conftool/dbconfig/20230504-134106-ladsgroup.json
  • 13:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P47543 and previous config saved to /var/cache/conftool/dbconfig/20230504-133945-ladsgroup.json
  • 13:39 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagetcd1005.eqiad.wmnet
  • 13:38 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:37 elukey: revert "Grant IdempotentWrite Kafka Cluster ACL to User:ANONYOUS in kafka logging clusters - T334733"
  • 13:37 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagetcd1006.eqiad.wmnet
  • 13:37 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 13:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P47542 and previous config saved to /var/cache/conftool/dbconfig/20230504-133628-ladsgroup.json
  • 13:35 jdrewniak@deploy1002: jdrewniak: Backport for Enable Vector 2022 as the default skin on frwikinews (T335686) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:34 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagetcd1006.eqiad.wmnet
  • 13:33 jdrewniak@deploy1002: Started scap: Backport for Enable Vector 2022 as the default skin on frwikinews (T335686)
  • 13:33 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagetcd2001.codfw.wmnet
  • 13:31 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagetcd2001.codfw.wmnet
  • 13:30 jdrewniak@deploy1002: Finished scap: Backport for Enable Vector 2022 as the default skin on eswiki (T335686) (duration: 08m 01s)
  • 13:30 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagetcd2002.codfw.wmnet
  • 13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P47541 and previous config saved to /var/cache/conftool/dbconfig/20230504-132809-ladsgroup.json
  • 13:27 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagetcd2002.codfw.wmnet
  • 13:26 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagetcd2003.codfw.wmnet
  • 13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P47540 and previous config saved to /var/cache/conftool/dbconfig/20230504-132600-ladsgroup.json
  • 13:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T335838)', diff saved to https://phabricator.wikimedia.org/P47539 and previous config saved to /var/cache/conftool/dbconfig/20230504-132439-ladsgroup.json
  • 13:23 jdrewniak@deploy1002: jdrewniak: Backport for Enable Vector 2022 as the default skin on eswiki (T335686) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 13:23 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:22 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 13:22 jdrewniak@deploy1002: Started scap: Backport for Enable Vector 2022 as the default skin on eswiki (T335686)
  • 13:22 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagetcd2003.codfw.wmnet
  • 13:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P47538 and previous config saved to /var/cache/conftool/dbconfig/20230504-132122-ladsgroup.json
  • 13:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T335838)', diff saved to https://phabricator.wikimedia.org/P47537 and previous config saved to /var/cache/conftool/dbconfig/20230504-131621-ladsgroup.json
  • 13:15 jdrewniak@deploy1002: Finished scap: Backport for [10%] Enable Vector 2022 as the default skin for eswiki (T335686) (duration: 08m 15s)
  • 13:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T335838)', diff saved to https://phabricator.wikimedia.org/P47536 and previous config saved to /var/cache/conftool/dbconfig/20230504-131302-ladsgroup.json
  • 13:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T335845)', diff saved to https://phabricator.wikimedia.org/P47535 and previous config saved to /var/cache/conftool/dbconfig/20230504-131054-ladsgroup.json
  • 13:09 jdrewniak@deploy1002: jdrewniak: Backport for [10%] Enable Vector 2022 as the default skin for eswiki (T335686) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 13:06 jdrewniak@deploy1002: Started scap: Backport for [10%] Enable Vector 2022 as the default skin for eswiki (T335686)
  • 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T335845)', diff saved to https://phabricator.wikimedia.org/P47534 and previous config saved to /var/cache/conftool/dbconfig/20230504-130616-ladsgroup.json
  • 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T335845)', diff saved to https://phabricator.wikimedia.org/P47533 and previous config saved to /var/cache/conftool/dbconfig/20230504-130432-ladsgroup.json
  • 13:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 13:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P47532 and previous config saved to /var/cache/conftool/dbconfig/20230504-130115-ladsgroup.json
  • 12:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 12:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 12:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 12:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install1004.wikimedia.org
  • 12:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 12:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T335845)', diff saved to https://phabricator.wikimedia.org/P47531 and previous config saved to /var/cache/conftool/dbconfig/20230504-125309-ladsgroup.json
  • 12:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T335845)', diff saved to https://phabricator.wikimedia.org/P47530 and previous config saved to /var/cache/conftool/dbconfig/20230504-125250-ladsgroup.json
  • 12:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 12:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install1004.wikimedia.org
  • 12:48 moritzm: installing ruby-rack security updates
  • 12:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P47529 and previous config saved to /var/cache/conftool/dbconfig/20230504-124609-ladsgroup.json
  • 12:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9-wmf2 to netbox-next - ayounsi@cumin1001
  • 12:38 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9-wmf2 to netbox-next - ayounsi@cumin1001
  • 12:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T335838)', diff saved to https://phabricator.wikimedia.org/P47528 and previous config saved to /var/cache/conftool/dbconfig/20230504-123103-ladsgroup.json
  • 12:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3314 (T335838)', diff saved to https://phabricator.wikimedia.org/P47527 and previous config saved to /var/cache/conftool/dbconfig/20230504-122237-ladsgroup.json
  • 12:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3312 (T335838)', diff saved to https://phabricator.wikimedia.org/P47526 and previous config saved to /var/cache/conftool/dbconfig/20230504-122114-ladsgroup.json
  • 12:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 12:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 12:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T335838)', diff saved to https://phabricator.wikimedia.org/P47525 and previous config saved to /var/cache/conftool/dbconfig/20230504-122048-ladsgroup.json
  • 12:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T335838)', diff saved to https://phabricator.wikimedia.org/P47524 and previous config saved to /var/cache/conftool/dbconfig/20230504-121247-ladsgroup.json
  • 12:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 12:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 12:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T335838)', diff saved to https://phabricator.wikimedia.org/P47523 and previous config saved to /var/cache/conftool/dbconfig/20230504-121224-ladsgroup.json
  • 12:10 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Fix output path of list=wbsubscribers API (T300458) (duration: 07m 43s)
  • 12:08 moritzm: installing libdatetime-timezone-perl updates
  • 12:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P47522 and previous config saved to /var/cache/conftool/dbconfig/20230504-120542-ladsgroup.json
  • 12:04 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde: Backport for Fix output path of list=wbsubscribers API (T300458) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 12:03 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Fix output path of list=wbsubscribers API (T300458)
  • 11:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P47521 and previous config saved to /var/cache/conftool/dbconfig/20230504-115717-ladsgroup.json
  • 11:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P47520 and previous config saved to /var/cache/conftool/dbconfig/20230504-115035-ladsgroup.json
  • 11:44 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Fix output path of list=wbsubscribers API (T300458) (duration: 08m 24s)
  • 11:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P47519 and previous config saved to /var/cache/conftool/dbconfig/20230504-114211-ladsgroup.json
  • 11:38 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and migr: Backport for Fix output path of list=wbsubscribers API (T300458) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 11:38 kart_: Updated cxserver to 2023-05-03-044244-production (T333835, T335019, T331505)
  • 11:36 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Fix output path of list=wbsubscribers API (T300458)
  • 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T335838)', diff saved to https://phabricator.wikimedia.org/P47518 and previous config saved to /var/cache/conftool/dbconfig/20230504-113529-ladsgroup.json
  • 11:34 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 11:33 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 11:31 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 11:31 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 11:30 moritzm: installing curl security updates (on buster)
  • 11:30 moritzm: installing curl security updates
  • 11:27 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 11:27 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 11:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T335838)', diff saved to https://phabricator.wikimedia.org/P47516 and previous config saved to /var/cache/conftool/dbconfig/20230504-112705-ladsgroup.json
  • 11:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T335838)', diff saved to https://phabricator.wikimedia.org/P47515 and previous config saved to /var/cache/conftool/dbconfig/20230504-112650-ladsgroup.json
  • 11:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 11:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 11:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T335838)', diff saved to https://phabricator.wikimedia.org/P47514 and previous config saved to /var/cache/conftool/dbconfig/20230504-112625-ladsgroup.json
  • 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T335838)', diff saved to https://phabricator.wikimedia.org/P47513 and previous config saved to /var/cache/conftool/dbconfig/20230504-112041-ladsgroup.json
  • 11:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 11:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T335838)', diff saved to https://phabricator.wikimedia.org/P47512 and previous config saved to /var/cache/conftool/dbconfig/20230504-112017-ladsgroup.json
  • 11:15 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9-wmf2 to netbox-next - volans@cumin1001
  • 11:14 eoghan@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host aphlict2001.codfw.wmnet with OS bullseye
  • 11:13 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9-wmf2 to netbox-next - volans@cumin1001
  • 11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P47511 and previous config saved to /var/cache/conftool/dbconfig/20230504-111119-ladsgroup.json
  • 11:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P47510 and previous config saved to /var/cache/conftool/dbconfig/20230504-110511-ladsgroup.json
  • 11:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 5713
  • 11:04 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aphlict2001.codfw.wmnet with reason: host reimage
  • 11:03 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 5713
  • 11:01 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aphlict2001.codfw.wmnet with reason: host reimage
  • 10:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P47509 and previous config saved to /var/cache/conftool/dbconfig/20230504-105613-ladsgroup.json
  • 10:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 112
  • 10:54 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 112
  • 10:53 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 10:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install2004.wikimedia.org
  • 10:51 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 10:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P47508 and previous config saved to /var/cache/conftool/dbconfig/20230504-105005-ladsgroup.json
  • 10:48 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster1001.eqiad.wmnet
  • 10:48 eoghan@cumin1001: START - Cookbook sre.ganeti.reimage for host aphlict2001.codfw.wmnet with OS bullseye
  • 10:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install2004.wikimedia.org
  • 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3002.wikimedia.org
  • 10:43 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling reboot on A:ldap-replicas-eqiad
  • 10:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T335838)', diff saved to https://phabricator.wikimedia.org/P47507 and previous config saved to /var/cache/conftool/dbconfig/20230504-104107-ladsgroup.json
  • 10:40 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster1001.eqiad.wmnet
  • 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3002.wikimedia.org
  • 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install4002.wikimedia.org
  • 10:40 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster1002.eqiad.wmnet
  • 10:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install4002.wikimedia.org
  • 10:35 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling reboot on A:ldap-replicas-eqiad
  • 10:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T335838)', diff saved to https://phabricator.wikimedia.org/P47506 and previous config saved to /var/cache/conftool/dbconfig/20230504-103459-ladsgroup.json
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2119 (T335838)', diff saved to https://phabricator.wikimedia.org/P47505 and previous config saved to /var/cache/conftool/dbconfig/20230504-103434-ladsgroup.json
  • 10:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 10:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T335838)', diff saved to https://phabricator.wikimedia.org/P47504 and previous config saved to /var/cache/conftool/dbconfig/20230504-103409-ladsgroup.json
  • 10:30 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:30 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 10:28 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster1002.eqiad.wmnet
  • 10:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T335838)', diff saved to https://phabricator.wikimedia.org/P47503 and previous config saved to /var/cache/conftool/dbconfig/20230504-102835-ladsgroup.json
  • 10:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 10:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 10:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T335838)', diff saved to https://phabricator.wikimedia.org/P47502 and previous config saved to /var/cache/conftool/dbconfig/20230504-102812-ladsgroup.json
  • 10:26 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster2001.codfw.wmnet
  • 10:23 Amir1: Removing db1114 from zarcillo T335837
  • 10:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1114.eqiad.wmnet
  • 10:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1114.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
  • 10:19 ladsgroup@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1114.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
  • 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P47501 and previous config saved to /var/cache/conftool/dbconfig/20230504-101903-ladsgroup.json
  • 10:17 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 10:17 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster2001.codfw.wmnet
  • 10:16 elukey@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 10:16 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
  • 10:16 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 10:16 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster2002.codfw.wmnet
  • 10:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P47500 and previous config saved to /var/cache/conftool/dbconfig/20230504-101306-ladsgroup.json
  • 10:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1114.eqiad.wmnet
  • 10:10 elukey@cumin1001: START - Cookbook sre.dns.netbox
  • 10:05 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster2002.codfw.wmnet
  • 10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P47499 and previous config saved to /var/cache/conftool/dbconfig/20230504-100357-ladsgroup.json
  • 10:02 cgoubert@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-eqiad
  • 09:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P47498 and previous config saved to /var/cache/conftool/dbconfig/20230504-095800-ladsgroup.json
  • 09:52 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor100[1256].eqiad.wmnet
  • 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Remove db1114 from dbctl T335837', diff saved to https://phabricator.wikimedia.org/P47497 and previous config saved to /var/cache/conftool/dbconfig/20230504-094945-ladsgroup.json
  • 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T335838)', diff saved to https://phabricator.wikimedia.org/P47496 and previous config saved to /var/cache/conftool/dbconfig/20230504-094850-ladsgroup.json
  • 09:47 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor200[3456].codfw.wmnet
  • 09:47 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 09:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T335838)', diff saved to https://phabricator.wikimedia.org/P47495 and previous config saved to /var/cache/conftool/dbconfig/20230504-094253-ladsgroup.json
  • 09:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2110 (T335838)', diff saved to https://phabricator.wikimedia.org/P47494 and previous config saved to /var/cache/conftool/dbconfig/20230504-094221-ladsgroup.json
  • 09:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 09:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T335838)', diff saved to https://phabricator.wikimedia.org/P47493 and previous config saved to /var/cache/conftool/dbconfig/20230504-094156-ladsgroup.json
  • 09:38 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 09:38 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 09:38 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for cloudbackup1001-dev.eqiad.wmnet: Renew puppet certificate - jbond@cumin1001
  • 09:37 jbond@cumin1001: START - Cookbook sre.puppet.renew-cert for cloudbackup1001-dev.eqiad.wmnet: Renew puppet certificate - jbond@cumin1001
  • 09:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1138 (T335838)', diff saved to https://phabricator.wikimedia.org/P47492 and previous config saved to /var/cache/conftool/dbconfig/20230504-093733-ladsgroup.json
  • 09:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 09:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 09:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T335838)', diff saved to https://phabricator.wikimedia.org/P47491 and previous config saved to /var/cache/conftool/dbconfig/20230504-093710-ladsgroup.json
  • 09:36 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 09:36 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 09:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1114 T335837', diff saved to https://phabricator.wikimedia.org/P47490 and previous config saved to /var/cache/conftool/dbconfig/20230504-093419-ladsgroup.json
  • 09:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P47488 and previous config saved to /var/cache/conftool/dbconfig/20230504-092649-ladsgroup.json
  • 09:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P47487 and previous config saved to /var/cache/conftool/dbconfig/20230504-092203-ladsgroup.json
  • 09:20 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling reboot on A:ldap-replicas-codfw
  • 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2002.codfw.wmnet
  • 09:12 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling reboot on A:ldap-replicas-codfw
  • 09:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P47486 and previous config saved to /var/cache/conftool/dbconfig/20230504-091143-ladsgroup.json
  • 09:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2002.codfw.wmnet
  • 09:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P47485 and previous config saved to /var/cache/conftool/dbconfig/20230504-090657-ladsgroup.json
  • 09:06 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 09:04 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 08:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T335838)', diff saved to https://phabricator.wikimedia.org/P47484 and previous config saved to /var/cache/conftool/dbconfig/20230504-085637-ladsgroup.json
  • 08:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T335838)', diff saved to https://phabricator.wikimedia.org/P47483 and previous config saved to /var/cache/conftool/dbconfig/20230504-085151-ladsgroup.json
  • 08:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2106 (T335838)', diff saved to https://phabricator.wikimedia.org/P47482 and previous config saved to /var/cache/conftool/dbconfig/20230504-085008-ladsgroup.json
  • 08:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 08:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 08:49 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:49 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T335838)', diff saved to https://phabricator.wikimedia.org/P47481 and previous config saved to /var/cache/conftool/dbconfig/20230504-084741-ladsgroup.json
  • 08:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 08:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 08:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 08:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 08:40 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 08:40 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 08:37 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes1006.eqiad.wmnet
  • 08:37 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubernetes1006.eqiad.wmnet
  • 08:14 cgoubert@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
  • 08:07 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:07 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:04 urbanecm@deploy1002: Finished scap: Backport for [Growth] Deploy Personalized praise to AR, BN, CS (T334630) (duration: 07m 24s)
  • 07:58 urbanecm@deploy1002: urbanecm: Backport for [Growth] Deploy Personalized praise to AR, BN, CS (T334630) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 07:57 urbanecm@deploy1002: Started scap: Backport for [Growth] Deploy Personalized praise to AR, BN, CS (T334630)
  • 07:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1132.eqiad.wmnet with reason: Onsite maintenance T334722
  • 07:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1132.eqiad.wmnet with reason: Onsite maintenance T334722
  • 07:56 urbanecm@deploy1002: Finished scap: Backport for ApiVisualEditor: Support preloading from i18n messages (T330337), Mentor dashboard: Move away from alpha/beta/stable (T334630) (duration: 07m 08s)
  • 07:50 urbanecm@deploy1002: urbanecm: Backport for ApiVisualEditor: Support preloading from i18n messages (T330337), Mentor dashboard: Move away from alpha/beta/stable (T334630) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 07:49 urbanecm@deploy1002: Started scap: Backport for ApiVisualEditor: Support preloading from i18n messages (T330337), Mentor dashboard: Move away from alpha/beta/stable (T334630)
  • 07:37 urbanecm@deploy1002: Finished scap: Backport for Mentor dashboard: Move away from alpha/beta/stable (T334630), EditPage: Support preloading from i18n messages (T330337), ApiVisualEditor: Support preloading from i18n messages (T330337), EditPage: Support preloading from i18n messages (T330337) (duration: 07m 58s)
  • 07:37 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 134823
  • 07:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 134823
  • 07:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 293
  • 07:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 293
  • 07:31 urbanecm@deploy1002: urbanecm: Backport for Mentor dashboard: Move away from alpha/beta/stable (T334630), EditPage: Support preloading from i18n messages (T330337), ApiVisualEditor: Support preloading from i18n messages (T330337), EditPage: Support preloading from i18n messages (T330337) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2
  • 07:30 urbanecm@deploy1002: Started scap: Backport for Mentor dashboard: Move away from alpha/beta/stable (T334630), EditPage: Support preloading from i18n messages (T330337), ApiVisualEditor: Support preloading from i18n messages (T330337), EditPage: Support preloading from i18n messages (T330337)
  • 07:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install5002.wikimedia.org
  • 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install5002.wikimedia.org
  • 07:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast2003.wikimedia.org with OS bookworm
  • 07:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6002.wikimedia.org
  • 07:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6002.wikimedia.org
  • 06:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on pc2011.codfw.wmnet with reason: Onsite maintenance T334722
  • 06:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp1002.wikimedia.org
  • 06:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on pc2011.codfw.wmnet with reason: Onsite maintenance T334722
  • 06:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast2003.wikimedia.org with reason: host reimage
  • 06:57 marostegui@deploy1002: Finished scap: Backport for ProductionServices.php: Promote pc2014 to pc1 master (T334722) (duration: 07m 23s)
  • 06:54 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast2003.wikimedia.org with reason: host reimage
  • 06:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp1002.wikimedia.org
  • 06:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb2002.codfw.wmnet
  • 06:52 marostegui: Promote pc2014 as pc1 master codfw dbmaint - T334722
  • 06:51 marostegui@deploy1002: marostegui: Backport for ProductionServices.php: Promote pc2014 to pc1 master (T334722) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 06:49 marostegui@deploy1002: Started scap: Backport for ProductionServices.php: Promote pc2014 to pc1 master (T334722)
  • 06:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb2002.codfw.wmnet
  • 06:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test2002.wikimedia.org
  • 06:46 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 06:44 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 06:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2002.wikimedia.org
  • 06:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast2003.wikimedia.org with OS bookworm
  • 06:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast4004.wikimedia.org
  • 06:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet with OS bookworm
  • 06:27 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
  • 06:25 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
  • 06:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast4004.wikimedia.org
  • 06:18 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 06:10 slyngshede@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts test-reimage2001.codfw.wmnet
  • 06:10 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:10 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: test-reimage2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1001"
  • 06:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 06:07 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: test-reimage2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1001"
  • 06:05 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 06:05 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 06:01 slyngshede@cumin1001: START - Cookbook sre.hosts.decommission for hosts test-reimage2001.codfw.wmnet
  • 05:59 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host bast5003.wikimedia.org
  • 05:54 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835
  • 05:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5003.wikimedia.org
  • 05:51 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 04:54 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster reboot - ryankemper@cumin1001 - T335835
  • 04:51 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835
  • 04:50 ryankemper@cumin1001: START - Cookbook sre.wdqs.reboot
  • 04:47 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 6 hosts with reason: Rolling reboot for T335835
  • 04:47 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 6 hosts with reason: Rolling reboot for T335835
  • 04:45 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster reboot - ryankemper@cumin1001 - T335835
  • 04:39 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on relforge[1003-1004].eqiad.wmnet with reason: Rolling reboot T335835
  • 04:38 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on relforge[1003-1004].eqiad.wmnet with reason: Rolling reboot T335835
  • 04:38 ryankemper: [Elastic] Reboot operation failed w/ (likely transient) read timeouts, will try again in 10 mins
  • 04:37 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835
  • 04:36 ryankemper: [Elastic] Beginning rolling reboot of eqiad elastic, 3 nodes at a time, `ryankemper@cumin1001` tmux session `reboot_eqiad`
  • 04:36 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835
  • 04:30 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 50 hosts with reason: Rolling reboot of eqiad for T335835
  • 04:29 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 50 hosts with reason: Rolling reboot of eqiad for T335835
  • 02:42 eileen: config revision changed from 5ac52d82 to 7ac11236 reduce batch size, avoid failmail
  • 02:35 eileen: config revision changed from 121a864a to 5ac52d82
  • 02:33 eileen: civicrm upgraded from b97aaa08 to 05523a9d
  • 01:29 eileen: config revision changed from 26147e89 to 121a864a - disabling populate as it keeps rolling back so prob another overlong row

2023-05-03

  • 23:55 eileen: config revision changed from 2995f558 to 26147e89
  • 23:15 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reboot - bking@cumin1001 - T335835
  • 23:10 tzatziki: removing 1 file for legal compliance
  • 23:01 eileen: config revision changed from 69f60bb9 to 2995f558
  • 22:42 zabe@deploy1002: Finished scap: Backport for Start writing to af_actor/afh_actor in group1 wikis (T334295) (duration: 07m 13s)
  • 22:37 zabe@deploy1002: zabe: Backport for Start writing to af_actor/afh_actor in group1 wikis (T334295) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 22:35 zabe@deploy1002: Started scap: Backport for Start writing to af_actor/afh_actor in group1 wikis (T334295)
  • 22:34 tzatziki: removing 12 files for legal compliance
  • 22:19 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs1010.eqiad.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 22:11 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs1010.eqiad.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 22:08 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs2001.codfw.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 22:04 eileen: civicrm upgraded from c6149ad2 to b97aaa08
  • 22:00 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs2001.codfw.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 21:55 brett: Disable puppet on lvs4008 for new pybal deployment (just in case immediate config rollback is required) - T263797
  • 21:43 milimetric@deploy1002: Finished deploy [analytics/refinery@c53c095] (thin): Deploy THIN [analytics/refinery@c53c095] (duration: 00m 06s)
  • 21:43 milimetric@deploy1002: Started deploy [analytics/refinery@c53c095] (thin): Deploy THIN [analytics/refinery@c53c095]
  • 21:31 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase10[17-33].eqiad.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 21:31 brett: Uploaded pybal_1.15.11 to apt1001 via reprepro
  • 21:31 milimetric@deploy1002: Finished deploy [analytics/refinery@c53c095]: Refinery deploy [analytics/refinery@c53c095] (duration: 08m 22s)
  • 21:22 milimetric@deploy1002: Started deploy [analytics/refinery@c53c095]: Refinery deploy [analytics/refinery@c53c095]
  • 21:11 brett: Upgrading pybal to 1.15.11 on lvs4010
  • 20:54 cjming: end of UTC late backport window
  • 20:53 cjming@deploy1002: Finished scap: Backport for Router handling code should be centralized into mmv.bootstrap (T236591) (duration: 10m 08s)
  • 20:44 cjming@deploy1002: cjming and jdlrobson: Backport for Router handling code should be centralized into mmv.bootstrap (T236591) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 20:43 cjming@deploy1002: Started scap: Backport for Router handling code should be centralized into mmv.bootstrap (T236591)
  • 20:42 cjming@deploy1002: Finished scap: Backport for Explicitly enable MFCustomSiteModules (T270603) (duration: 10m 23s)
  • 20:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T335838)', diff saved to https://phabricator.wikimedia.org/P47480 and previous config saved to /var/cache/conftool/dbconfig/20230503-203424-ladsgroup.json
  • 20:33 cjming@deploy1002: jdlrobson and cjming: Backport for Explicitly enable MFCustomSiteModules (T270603) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:32 cjming@deploy1002: Started scap: Backport for Explicitly enable MFCustomSiteModules (T270603)
  • 20:30 cjming@deploy1002: Finished scap: Backport for Enable graphs on test wikipedia and mediawiki.org (T334940) (duration: 08m 19s)
  • 20:23 cjming@deploy1002: cjming and jdlrobson: Backport for Enable graphs on test wikipedia and mediawiki.org (T334940) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:21 cjming@deploy1002: Started scap: Backport for Enable graphs on test wikipedia and mediawiki.org (T334940)
  • 20:19 cjming@deploy1002: Finished scap: Backport for Create autopatroller and patroller groups on bn.wikiquote (T335829) (duration: 08m 36s)
  • 20:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P47479 and previous config saved to /var/cache/conftool/dbconfig/20230503-201918-ladsgroup.json
  • 20:14 fab@deploy1002: Finished deploy [airflow-dags/research@f8dad05]: (no justification provided) (duration: 00m 19s)
  • 20:13 fab@deploy1002: Started deploy [airflow-dags/research@f8dad05]: (no justification provided)
  • 20:13 cjming@deploy1002: cjming and mdsshakil: Backport for Create autopatroller and patroller groups on bn.wikiquote (T335829) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:11 cjming@deploy1002: Started scap: Backport for Create autopatroller and patroller groups on bn.wikiquote (T335829)
  • 20:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P47478 and previous config saved to /var/cache/conftool/dbconfig/20230503-200411-ladsgroup.json
  • 19:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T335838)', diff saved to https://phabricator.wikimedia.org/P47477 and previous config saved to /var/cache/conftool/dbconfig/20230503-194905-ladsgroup.json
  • 19:43 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reboot - bking@cumin1001 - T335835
  • 19:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T335838)', diff saved to https://phabricator.wikimedia.org/P47476 and previous config saved to /var/cache/conftool/dbconfig/20230503-194238-ladsgroup.json
  • 19:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 19:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 19:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T335838)', diff saved to https://phabricator.wikimedia.org/P47475 and previous config saved to /var/cache/conftool/dbconfig/20230503-194213-ladsgroup.json
  • 19:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T335838)', diff saved to https://phabricator.wikimedia.org/P47474 and previous config saved to /var/cache/conftool/dbconfig/20230503-194045-ladsgroup.json
  • 19:37 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reboot - bking@cumin1001
  • 19:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P47473 and previous config saved to /var/cache/conftool/dbconfig/20230503-192707-ladsgroup.json
  • 19:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P47472 and previous config saved to /var/cache/conftool/dbconfig/20230503-192538-ladsgroup.json
  • 19:20 inflatador: bking@cumin1001 reboot Elastic cluster for T335835
  • 19:19 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reboot - bking@cumin1001
  • 19:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P47471 and previous config saved to /var/cache/conftool/dbconfig/20230503-191200-ladsgroup.json
  • 19:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P47470 and previous config saved to /var/cache/conftool/dbconfig/20230503-191032-ladsgroup.json
  • 19:10 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:09 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 18:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T335838)', diff saved to https://phabricator.wikimedia.org/P47469 and previous config saved to /var/cache/conftool/dbconfig/20230503-185654-ladsgroup.json
  • 18:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T335838)', diff saved to https://phabricator.wikimedia.org/P47468 and previous config saved to /var/cache/conftool/dbconfig/20230503-185526-ladsgroup.json
  • 18:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T335838)', diff saved to https://phabricator.wikimedia.org/P47467 and previous config saved to /var/cache/conftool/dbconfig/20230503-185026-ladsgroup.json
  • 18:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 18:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 18:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 18:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 18:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T335838)', diff saved to https://phabricator.wikimedia.org/P47466 and previous config saved to /var/cache/conftool/dbconfig/20230503-184957-ladsgroup.json
  • 18:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1202 (T335838)', diff saved to https://phabricator.wikimedia.org/P47465 and previous config saved to /var/cache/conftool/dbconfig/20230503-184610-ladsgroup.json
  • 18:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 18:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T335838)', diff saved to https://phabricator.wikimedia.org/P47464 and previous config saved to /var/cache/conftool/dbconfig/20230503-184536-ladsgroup.json
  • 18:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P47463 and previous config saved to /var/cache/conftool/dbconfig/20230503-183451-ladsgroup.json
  • 18:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P47462 and previous config saved to /var/cache/conftool/dbconfig/20230503-183030-ladsgroup.json
  • 18:26 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10[17-33].eqiad.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 18:26 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase20[13-27].codfw.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 18:22 brennen@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.7 refs T330213 (duration: 06m 18s)
  • 18:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P47461 and previous config saved to /var/cache/conftool/dbconfig/20230503-181944-ladsgroup.json
  • 18:16 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.7 refs T330213
  • 18:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P47460 and previous config saved to /var/cache/conftool/dbconfig/20230503-181524-ladsgroup.json
  • 18:08 brennen: train 1.41.0-wmf.7 (T330213): logs quiet and no current blockers, rolling to group1
  • 18:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T335838)', diff saved to https://phabricator.wikimedia.org/P47459 and previous config saved to /var/cache/conftool/dbconfig/20230503-180438-ladsgroup.json
  • 18:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T335838)', diff saved to https://phabricator.wikimedia.org/P47458 and previous config saved to /var/cache/conftool/dbconfig/20230503-180018-ladsgroup.json
  • 17:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T335838)', diff saved to https://phabricator.wikimedia.org/P47457 and previous config saved to /var/cache/conftool/dbconfig/20230503-175806-ladsgroup.json
  • 17:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 17:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 17:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T335838)', diff saved to https://phabricator.wikimedia.org/P47456 and previous config saved to /var/cache/conftool/dbconfig/20230503-175404-ladsgroup.json
  • 17:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 17:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T335838)', diff saved to https://phabricator.wikimedia.org/P47455 and previous config saved to /var/cache/conftool/dbconfig/20230503-175340-ladsgroup.json
  • 17:53 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:52 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 17:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 17:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 17:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T335838)', diff saved to https://phabricator.wikimedia.org/P47454 and previous config saved to /var/cache/conftool/dbconfig/20230503-175126-ladsgroup.json
  • 17:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T335838)', diff saved to https://phabricator.wikimedia.org/P47453 and previous config saved to /var/cache/conftool/dbconfig/20230503-174330-ladsgroup.json
  • 17:41 inflatador: bking@cumin1001 reboot wdqs20[13-22].codfw.wmnet T335835
  • 17:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P47452 and previous config saved to /var/cache/conftool/dbconfig/20230503-173834-ladsgroup.json
  • 17:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P47451 and previous config saved to /var/cache/conftool/dbconfig/20230503-173620-ladsgroup.json
  • 17:32 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kafkamon2003.codfw.wmnet with OS bullseye
  • 17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P47450 and previous config saved to /var/cache/conftool/dbconfig/20230503-172824-ladsgroup.json
  • 17:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P47449 and previous config saved to /var/cache/conftool/dbconfig/20230503-172328-ladsgroup.json
  • 17:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2011.codfw.wmnet with OS bullseye
  • 17:22 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 17:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P47448 and previous config saved to /var/cache/conftool/dbconfig/20230503-172114-ladsgroup.json
  • 17:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 17:18 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage
  • 17:15 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage
  • 17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P47447 and previous config saved to /var/cache/conftool/dbconfig/20230503-171317-ladsgroup.json
  • 17:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T335838)', diff saved to https://phabricator.wikimedia.org/P47446 and previous config saved to /var/cache/conftool/dbconfig/20230503-170821-ladsgroup.json
  • 17:07 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 17:07 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 17:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T335838)', diff saved to https://phabricator.wikimedia.org/P47445 and previous config saved to /var/cache/conftool/dbconfig/20230503-170607-ladsgroup.json
  • 17:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2011.codfw.wmnet with reason: host reimage
  • 17:05 sukhe@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767 (duration: 169m 01s)
  • 17:02 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2011.codfw.wmnet with reason: host reimage
  • 16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1191 (T335838)', diff saved to https://phabricator.wikimedia.org/P47444 and previous config saved to /var/cache/conftool/dbconfig/20230503-165954-ladsgroup.json
  • 16:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 16:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T335838)', diff saved to https://phabricator.wikimedia.org/P47443 and previous config saved to /var/cache/conftool/dbconfig/20230503-165920-ladsgroup.json
  • 16:58 herron@cumin1001: START - Cookbook sre.ganeti.reimage for host kafkamon2003.codfw.wmnet with OS bullseye
  • 16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T335838)', diff saved to https://phabricator.wikimedia.org/P47442 and previous config saved to /var/cache/conftool/dbconfig/20230503-165818-ladsgroup.json
  • 16:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T335838)', diff saved to https://phabricator.wikimedia.org/P47441 and previous config saved to /var/cache/conftool/dbconfig/20230503-165811-ladsgroup.json
  • 16:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 16:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T335838)', diff saved to https://phabricator.wikimedia.org/P47440 and previous config saved to /var/cache/conftool/dbconfig/20230503-165754-ladsgroup.json
  • 16:47 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 16:47 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 16:46 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 16:46 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T335838)', diff saved to https://phabricator.wikimedia.org/P47438 and previous config saved to /var/cache/conftool/dbconfig/20230503-164622-ladsgroup.json
  • 16:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 16:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 16:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T335838)', diff saved to https://phabricator.wikimedia.org/P47437 and previous config saved to /var/cache/conftool/dbconfig/20230503-164557-ladsgroup.json
  • 16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P47436 and previous config saved to /var/cache/conftool/dbconfig/20230503-164414-ladsgroup.json
  • 16:43 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 16:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P47435 and previous config saved to /var/cache/conftool/dbconfig/20230503-164248-ladsgroup.json
  • 16:32 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['lvs2011']
  • 16:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P47434 and previous config saved to /var/cache/conftool/dbconfig/20230503-163051-ladsgroup.json
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P47433 and previous config saved to /var/cache/conftool/dbconfig/20230503-162908-ladsgroup.json
  • 16:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P47432 and previous config saved to /var/cache/conftool/dbconfig/20230503-162741-ladsgroup.json
  • 16:25 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2011']
  • 16:20 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['lvs2011']
  • 16:20 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2011']
  • 16:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs2011']
  • 16:19 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2011']
  • 16:19 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['lvs2011']
  • 16:18 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2011']
  • 16:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P47431 and previous config saved to /var/cache/conftool/dbconfig/20230503-161545-ladsgroup.json
  • 16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T335838)', diff saved to https://phabricator.wikimedia.org/P47430 and previous config saved to /var/cache/conftool/dbconfig/20230503-161402-ladsgroup.json
  • 16:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs2011.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T335838)', diff saved to https://phabricator.wikimedia.org/P47429 and previous config saved to /var/cache/conftool/dbconfig/20230503-161235-ladsgroup.json
  • 16:08 jbond@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts puppetmaster2001.codfw.wmnet
  • 16:08 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts puppetmaster2001.codfw.wmnet
  • 16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T335838)', diff saved to https://phabricator.wikimedia.org/P47428 and previous config saved to /var/cache/conftool/dbconfig/20230503-160601-ladsgroup.json
  • 16:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 16:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 16:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T335838)', diff saved to https://phabricator.wikimedia.org/P47427 and previous config saved to /var/cache/conftool/dbconfig/20230503-160146-ladsgroup.json
  • 16:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T335838)', diff saved to https://phabricator.wikimedia.org/P47426 and previous config saved to /var/cache/conftool/dbconfig/20230503-160039-ladsgroup.json
  • 15:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T335838)', diff saved to https://phabricator.wikimedia.org/P47425 and previous config saved to /var/cache/conftool/dbconfig/20230503-155946-ladsgroup.json
  • 15:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 15:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 15:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 15:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 15:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T335838)', diff saved to https://phabricator.wikimedia.org/P47424 and previous config saved to /var/cache/conftool/dbconfig/20230503-155506-ladsgroup.json
  • 15:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T335838)', diff saved to https://phabricator.wikimedia.org/P47423 and previous config saved to /var/cache/conftool/dbconfig/20230503-155221-ladsgroup.json
  • 15:48 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase20[13-27].codfw.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 15:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P47422 and previous config saved to /var/cache/conftool/dbconfig/20230503-154639-ladsgroup.json
  • 15:42 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 15:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1002.eqiad.wmnet
  • 15:41 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 15:40 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host lvs2011.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:40 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 15:40 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P47421 and previous config saved to /var/cache/conftool/dbconfig/20230503-154000-ladsgroup.json
  • 15:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet
  • 15:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
  • 15:38 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:38 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for lvs2011 - pt1979@cumin2002"
  • 15:37 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts puppetmaster1002.eqiad.wmnet
  • 15:37 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts puppetmaster1002.eqiad.wmnet
  • 15:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P47420 and previous config saved to /var/cache/conftool/dbconfig/20230503-153715-ladsgroup.json
  • 15:36 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for lvs2011 - pt1979@cumin2002"
  • 15:34 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:34 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1016.eqiad.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 15:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
  • 15:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P47419 and previous config saved to /var/cache/conftool/dbconfig/20230503-153133-ladsgroup.json
  • 15:29 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 15:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P47418 and previous config saved to /var/cache/conftool/dbconfig/20230503-152453-ladsgroup.json
  • 15:24 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase1016.eqiad.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 15:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P47417 and previous config saved to /var/cache/conftool/dbconfig/20230503-152208-ladsgroup.json
  • 15:17 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1016.eqiad.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 15:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T335838)', diff saved to https://phabricator.wikimedia.org/P47416 and previous config saved to /var/cache/conftool/dbconfig/20230503-151627-ladsgroup.json
  • 15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T335838)', diff saved to https://phabricator.wikimedia.org/P47415 and previous config saved to /var/cache/conftool/dbconfig/20230503-151013-ladsgroup.json
  • 15:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 15:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 15:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T335838)', diff saved to https://phabricator.wikimedia.org/P47414 and previous config saved to /var/cache/conftool/dbconfig/20230503-150947-ladsgroup.json
  • 15:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T335838)', diff saved to https://phabricator.wikimedia.org/P47413 and previous config saved to /var/cache/conftool/dbconfig/20230503-150947-ladsgroup.json
  • 15:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T335838)', diff saved to https://phabricator.wikimedia.org/P47412 and previous config saved to /var/cache/conftool/dbconfig/20230503-150702-ladsgroup.json
  • 15:07 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase1016.eqiad.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 15:03 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase2012.codfw.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T335838)', diff saved to https://phabricator.wikimedia.org/P47411 and previous config saved to /var/cache/conftool/dbconfig/20230503-150103-ladsgroup.json
  • 15:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T335838)', diff saved to https://phabricator.wikimedia.org/P47410 and previous config saved to /var/cache/conftool/dbconfig/20230503-150042-ladsgroup.json
  • 15:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 15:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 15:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T335838)', diff saved to https://phabricator.wikimedia.org/P47409 and previous config saved to /var/cache/conftool/dbconfig/20230503-150017-ladsgroup.json
  • 14:59 sukhe: fix backup route for high-traffic2 in codfw: set routing-options static route 208.80.153.240/28 next-hop 10.192.17.7
  • 14:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P47408 and previous config saved to /var/cache/conftool/dbconfig/20230503-145440-ladsgroup.json
  • 14:54 sukhe: [finished] homer "cr*-codfw*" commit "Gerrit: 914344 remove decommissioned host lvs2007": T335777
  • 14:53 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase2012.codfw.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 14:52 sukhe: homer "cr*-codfw*" commit "Gerrit: 914344 remove decommissioned host lvs2007": T335777
  • 14:46 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs2007.codfw.wmnet
  • 14:46 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P47407 and previous config saved to /var/cache/conftool/dbconfig/20230503-144511-ladsgroup.json
  • 14:45 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 14:43 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm1001.wikimedia.org
  • 14:43 ottomata: Grant IdempotentWrite Kafka Cluster ACL to User:ANONYOUS in kafka logging clusters - T334733
  • 14:40 jclark@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:40 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt frav1003 - jclark@cumin1001"
  • 14:40 slyngshede@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM idm1001.wikimedia.org
  • 14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P47406 and previous config saved to /var/cache/conftool/dbconfig/20230503-143933-ladsgroup.json
  • 14:38 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt frav1003 - jclark@cumin1001"
  • 14:36 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 14:36 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs2007.codfw.wmnet
  • 14:33 sukhe: set routing-options static route 208.80.153.224/28 next-hop 10.192.49.7 [move static route for high-traffic1 to lvs2010]: T335777
  • 14:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P47405 and previous config saved to /var/cache/conftool/dbconfig/20230503-143005-ladsgroup.json
  • 14:26 ottomata: Grant IdempotentWrite Kafka Cluster ACL to User:ANONYOUS in kafka main clusters - T334733
  • 14:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T335838)', diff saved to https://phabricator.wikimedia.org/P47404 and previous config saved to /var/cache/conftool/dbconfig/20230503-142427-ladsgroup.json
  • 14:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T335838)', diff saved to https://phabricator.wikimedia.org/P47403 and previous config saved to /var/cache/conftool/dbconfig/20230503-141817-ladsgroup.json
  • 14:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 14:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T335838)', diff saved to https://phabricator.wikimedia.org/P47402 and previous config saved to /var/cache/conftool/dbconfig/20230503-141752-ladsgroup.json
  • 14:16 sukhe@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767
  • 14:16 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 14:15 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 14:15 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 14:15 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 14:15 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T335838)', diff saved to https://phabricator.wikimedia.org/P47401 and previous config saved to /var/cache/conftool/dbconfig/20230503-141458-ladsgroup.json
  • 14:14 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 14:13 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 14:13 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 14:13 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 14:13 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 14:12 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 14:12 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 14:12 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 14:12 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 14:11 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 14:11 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 14:11 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 14:11 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 14:11 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 14:11 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 14:09 sukhe: stop pybal on lvs2007 to drain host for decommissioning
  • 14:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T335838)', diff saved to https://phabricator.wikimedia.org/P47400 and previous config saved to /var/cache/conftool/dbconfig/20230503-140932-ladsgroup.json
  • 14:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 14:09 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Turn on experimental Parsoid Read Views support, except on commons & wikidata (T335157) (duration: 15m 27s)
  • 14:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 14:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T335838)', diff saved to https://phabricator.wikimedia.org/P47399 and previous config saved to /var/cache/conftool/dbconfig/20230503-140908-ladsgroup.json
  • 14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T335838)', diff saved to https://phabricator.wikimedia.org/P47398 and previous config saved to /var/cache/conftool/dbconfig/20230503-140540-ladsgroup.json
  • 14:04 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 14:04 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 14:03 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host kafkamon2003.codfw.wmnet
  • 14:03 herron@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM kafkamon2003.codfw.wmnet - herron@cumin1001"
  • 14:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P47396 and previous config saved to /var/cache/conftool/dbconfig/20230503-140246-ladsgroup.json
  • 14:02 herron@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM kafkamon2003.codfw.wmnet - herron@cumin1001"
  • 13:55 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and cscott: Backport for Turn on experimental Parsoid Read Views support, except on commons & wikidata (T335157) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 13:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P47395 and previous config saved to /var/cache/conftool/dbconfig/20230503-135402-ladsgroup.json
  • 13:53 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Turn on experimental Parsoid Read Views support, except on commons & wikidata (T335157)
  • 13:52 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for wblistentityusage: Deprecate wbeu prefix, new output format (T300460 T196962) (duration: 27m 54s)
  • 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P47394 and previous config saved to /var/cache/conftool/dbconfig/20230503-135034-ladsgroup.json
  • 13:48 cgoubert@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-codfw
  • 13:47 herron@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafkamon2003.codfw.wmnet on all recursors
  • 13:47 herron@cumin1001: START - Cookbook sre.dns.wipe-cache kafkamon2003.codfw.wmnet on all recursors
  • 13:47 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:47 herron@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM kafkamon2003.codfw.wmnet - herron@cumin1001"
  • 13:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P47393 and previous config saved to /var/cache/conftool/dbconfig/20230503-134740-ladsgroup.json
  • 13:46 herron@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM kafkamon2003.codfw.wmnet - herron@cumin1001"
  • 13:43 herron@cumin1001: START - Cookbook sre.dns.netbox
  • 13:43 herron@cumin1001: START - Cookbook sre.ganeti.makevm for new host kafkamon2003.codfw.wmnet
  • 13:43 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm1001.wikimedia.org
  • 13:42 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde: Backport for wblistentityusage: Deprecate wbeu prefix, new output format (T300460 T196962) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 13:40 slyngshede@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM idm1001.wikimedia.org
  • 13:40 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm2001.wikimedia.org
  • 13:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P47392 and previous config saved to /var/cache/conftool/dbconfig/20230503-133855-ladsgroup.json
  • 13:36 slyngshede@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM idm2001.wikimedia.org
  • 13:36 slyngshede@cumin1001: END (ERROR) - Cookbook sre.ganeti.reboot-vm (exit_code=97) for VM idm-test1001.wikimedia.org
  • 13:35 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host kafkamon1003.eqiad.wmnet
  • 13:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P47391 and previous config saved to /var/cache/conftool/dbconfig/20230503-133528-ladsgroup.json
  • 13:34 slyngshede@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM idm-test1001.wikimedia.org
  • 13:34 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.reboot-vm (exit_code=99) for VM idm-test1001.wikimedia.org
  • 13:33 slyngshede@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM idm-test1001.wikimedia.org
  • 13:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T335838)', diff saved to https://phabricator.wikimedia.org/P47390 and previous config saved to /var/cache/conftool/dbconfig/20230503-133232-ladsgroup.json
  • 13:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T335838)', diff saved to https://phabricator.wikimedia.org/P47389 and previous config saved to /var/cache/conftool/dbconfig/20230503-133117-ladsgroup.json
  • 13:26 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafkamon1003.eqiad.wmnet
  • 13:24 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for wblistentityusage: Deprecate wbeu prefix, new output format (T300460 T196962)
  • 13:24 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:24 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T335838)', diff saved to https://phabricator.wikimedia.org/P47388 and previous config saved to /var/cache/conftool/dbconfig/20230503-132349-ladsgroup.json
  • 13:20 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for testwikidatawiki: enable entity labels in parsed API edit summaries (T335098) (duration: 17m 55s)
  • 13:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T335838)', diff saved to https://phabricator.wikimedia.org/P47387 and previous config saved to /var/cache/conftool/dbconfig/20230503-132022-ladsgroup.json
  • 13:19 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.reboot-runner (exit_code=0) rolling reboot on A:gitlab-runner
  • 13:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T335838)', diff saved to https://phabricator.wikimedia.org/P47386 and previous config saved to /var/cache/conftool/dbconfig/20230503-131736-ladsgroup.json
  • 13:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 13:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 13:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T335838)', diff saved to https://phabricator.wikimedia.org/P47385 and previous config saved to /var/cache/conftool/dbconfig/20230503-131656-ladsgroup.json
  • 13:16 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:16 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P47384 and previous config saved to /var/cache/conftool/dbconfig/20230503-131611-ladsgroup.json
  • 13:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T335838)', diff saved to https://phabricator.wikimedia.org/P47383 and previous config saved to /var/cache/conftool/dbconfig/20230503-131414-ladsgroup.json
  • 13:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T335838)', diff saved to https://phabricator.wikimedia.org/P47382 and previous config saved to /var/cache/conftool/dbconfig/20230503-131249-ladsgroup.json
  • 13:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 13:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 13:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T335838)', diff saved to https://phabricator.wikimedia.org/P47381 and previous config saved to /var/cache/conftool/dbconfig/20230503-131224-ladsgroup.json
  • 13:05 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and migr: Backport for testwikidatawiki: enable entity labels in parsed API edit summaries (T335098) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:02 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for testwikidatawiki: enable entity labels in parsed API edit summaries (T335098)
  • 13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P47380 and previous config saved to /var/cache/conftool/dbconfig/20230503-130149-ladsgroup.json
  • 13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P47379 and previous config saved to /var/cache/conftool/dbconfig/20230503-130105-ladsgroup.json
  • 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P47378 and previous config saved to /var/cache/conftool/dbconfig/20230503-125718-ladsgroup.json
  • 12:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P47377 and previous config saved to /var/cache/conftool/dbconfig/20230503-124643-ladsgroup.json
  • 12:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T335838)', diff saved to https://phabricator.wikimedia.org/P47376 and previous config saved to /var/cache/conftool/dbconfig/20230503-124558-ladsgroup.json
  • 12:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P47375 and previous config saved to /var/cache/conftool/dbconfig/20230503-124212-ladsgroup.json
  • 12:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T335838)', diff saved to https://phabricator.wikimedia.org/P47374 and previous config saved to /var/cache/conftool/dbconfig/20230503-123837-ladsgroup.json
  • 12:37 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog2002.codfw.wmnet
  • 12:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3314 (T335838)', diff saved to https://phabricator.wikimedia.org/P47373 and previous config saved to /var/cache/conftool/dbconfig/20230503-123714-ladsgroup.json
  • 12:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 12:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 12:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T335838)', diff saved to https://phabricator.wikimedia.org/P47372 and previous config saved to /var/cache/conftool/dbconfig/20230503-123649-ladsgroup.json
  • 12:36 jelto@cumin1001: START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:gitlab-runner
  • 12:35 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 12:31 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host centrallog2002.codfw.wmnet
  • 12:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T335838)', diff saved to https://phabricator.wikimedia.org/P47371 and previous config saved to /var/cache/conftool/dbconfig/20230503-123137-ladsgroup.json
  • 12:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T335838)', diff saved to https://phabricator.wikimedia.org/P47370 and previous config saved to /var/cache/conftool/dbconfig/20230503-122705-ladsgroup.json
  • 12:25 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 12:24 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 12:24 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 12:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P47369 and previous config saved to /var/cache/conftool/dbconfig/20230503-122143-ladsgroup.json
  • 12:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1136 (T335838)', diff saved to https://phabricator.wikimedia.org/P47368 and previous config saved to /var/cache/conftool/dbconfig/20230503-122113-ladsgroup.json
  • 12:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 12:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 12:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T335838)', diff saved to https://phabricator.wikimedia.org/P47367 and previous config saved to /var/cache/conftool/dbconfig/20230503-122049-ladsgroup.json
  • 12:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T335838)', diff saved to https://phabricator.wikimedia.org/P47366 and previous config saved to /var/cache/conftool/dbconfig/20230503-122040-ladsgroup.json
  • 12:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 12:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 12:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 12:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 12:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T335838)', diff saved to https://phabricator.wikimedia.org/P47365 and previous config saved to /var/cache/conftool/dbconfig/20230503-122000-ladsgroup.json
  • 12:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab2002.wikimedia.org
  • 12:15 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog1002.eqiad.wmnet
  • 12:11 Amir1: Removing db1111 from zarcillo T335836
  • 12:09 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab2002.wikimedia.org
  • 12:09 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host centrallog1002.eqiad.wmnet
  • 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P47364 and previous config saved to /var/cache/conftool/dbconfig/20230503-120637-ladsgroup.json
  • 12:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1111.eqiad.wmnet
  • 12:06 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: Install software version upgrade
  • 12:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1111.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
  • 12:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P47363 and previous config saved to /var/cache/conftool/dbconfig/20230503-120536-ladsgroup.json
  • 12:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P47362 and previous config saved to /var/cache/conftool/dbconfig/20230503-120453-ladsgroup.json
  • 12:02 ladsgroup@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1111.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
  • 11:56 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
  • 11:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1111.eqiad.wmnet
  • 11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T335838)', diff saved to https://phabricator.wikimedia.org/P47361 and previous config saved to /var/cache/conftool/dbconfig/20230503-115130-ladsgroup.json
  • 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1110 from dbctl T335011', diff saved to https://phabricator.wikimedia.org/P47360 and previous config saved to /var/cache/conftool/dbconfig/20230503-115124-marostegui.json
  • 11:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P47359 and previous config saved to /var/cache/conftool/dbconfig/20230503-115030-ladsgroup.json
  • 11:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P47358 and previous config saved to /var/cache/conftool/dbconfig/20230503-114947-ladsgroup.json
  • 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T335838)', diff saved to https://phabricator.wikimedia.org/P47357 and previous config saved to /var/cache/conftool/dbconfig/20230503-114426-ladsgroup.json
  • 11:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 11:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 11:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 11:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T335838)', diff saved to https://phabricator.wikimedia.org/P47356 and previous config saved to /var/cache/conftool/dbconfig/20230503-114335-ladsgroup.json
  • 11:40 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9-wmf2 to netbox-next - volans@cumin1001
  • 11:38 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes2015.codfw.wmnet
  • 11:38 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubernetes2015.codfw.wmnet
  • 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T335838)', diff saved to https://phabricator.wikimedia.org/P47355 and previous config saved to /var/cache/conftool/dbconfig/20230503-113524-ladsgroup.json
  • 11:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T335838)', diff saved to https://phabricator.wikimedia.org/P47354 and previous config saved to /var/cache/conftool/dbconfig/20230503-113441-ladsgroup.json
  • 11:31 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9-wmf2 to netbox-next - volans@cumin1001
  • 11:28 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9-wmf2 to netbox-next - volans@cumin1001
  • 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P47353 and previous config saved to /var/cache/conftool/dbconfig/20230503-112828-ladsgroup.json
  • 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2151 (T335838)', diff saved to https://phabricator.wikimedia.org/P47352 and previous config saved to /var/cache/conftool/dbconfig/20230503-112819-ladsgroup.json
  • 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T335838)', diff saved to https://phabricator.wikimedia.org/P47351 and previous config saved to /var/cache/conftool/dbconfig/20230503-112819-ladsgroup.json
  • 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Remove db1111 from dbctl T335836', diff saved to https://phabricator.wikimedia.org/P47350 and previous config saved to /var/cache/conftool/dbconfig/20230503-112812-ladsgroup.json
  • 11:27 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9-wmf2 to netbox-next - volans@cumin1001
  • 11:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 11:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 11:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 11:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 11:23 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9-wmf2 to netbox-next - volans@cumin1001
  • 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 100%: Repooling after migrating', diff saved to https://phabricator.wikimedia.org/P47349 and previous config saved to /var/cache/conftool/dbconfig/20230503-112037-root.json
  • 11:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 11:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 11:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T335838)', diff saved to https://phabricator.wikimedia.org/P47348 and previous config saved to /var/cache/conftool/dbconfig/20230503-111910-ladsgroup.json
  • 11:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Decom db1111 T335836', diff saved to https://phabricator.wikimedia.org/P47347 and previous config saved to /var/cache/conftool/dbconfig/20230503-111904-ladsgroup.json
  • 11:18 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9-wmf2 to netbox-next - volans@cumin1001
  • 11:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbproxy1017.eqiad.wmnet with reason: Upgrade
  • 11:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbproxy1017.eqiad.wmnet with reason: Upgrade
  • 11:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbproxy1016.eqiad.wmnet with reason: Upgrade
  • 11:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbproxy1016.eqiad.wmnet with reason: Upgrade
  • 11:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbproxy1015.eqiad.wmnet with reason: Upgrade
  • 11:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbproxy1015.eqiad.wmnet with reason: Upgrade
  • 11:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbproxy1014.eqiad.wmnet with reason: Upgrade
  • 11:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbproxy1014.eqiad.wmnet with reason: Upgrade
  • 11:11 cgoubert@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-codfw
  • 11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P47346 and previous config saved to /var/cache/conftool/dbconfig/20230503-111145-ladsgroup.json
  • 11:08 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Install software version upgrade
  • 11:06 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagemaster1001.eqiad.wmnet
  • 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 75%: Repooling after migrating', diff saved to https://phabricator.wikimedia.org/P47345 and previous config saved to /var/cache/conftool/dbconfig/20230503-110532-root.json
  • 11:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P47344 and previous config saved to /var/cache/conftool/dbconfig/20230503-110357-ladsgroup.json
  • 11:02 marostegui: Reboot dbproxy200[1-4]
  • 11:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbproxy[2001-2004].codfw.wmnet with reason: Reboot T335845
  • 11:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbproxy[2001-2004].codfw.wmnet with reason: Reboot T335845
  • 10:57 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagemaster1001.eqiad.wmnet
  • 10:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T335838)', diff saved to https://phabricator.wikimedia.org/P47343 and previous config saved to /var/cache/conftool/dbconfig/20230503-105639-ladsgroup.json
  • 10:52 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: apply
  • 10:51 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/recommendation-api: apply
  • 10:51 claime: Migrating recommendation-api eqiad to mw-api-int-async - T334062
  • 10:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 50%: Repooling after migrating', diff saved to https://phabricator.wikimedia.org/P47342 and previous config saved to /var/cache/conftool/dbconfig/20230503-105028-root.json
  • 10:50 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/recommendation-api: apply
  • 10:50 claime: Migrating recommendation-api codfw to mw-api-int-async - T334062
  • 10:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T335838)', diff saved to https://phabricator.wikimedia.org/P47341 and previous config saved to /var/cache/conftool/dbconfig/20230503-105004-ladsgroup.json
  • 10:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 10:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 10:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T335838)', diff saved to https://phabricator.wikimedia.org/P47340 and previous config saved to /var/cache/conftool/dbconfig/20230503-104939-ladsgroup.json
  • 10:49 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/recommendation-api: apply
  • 10:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P47339 and previous config saved to /var/cache/conftool/dbconfig/20230503-104851-ladsgroup.json
  • 10:47 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/recommendation-api: apply
  • 10:45 cgoubert@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host kubestagemaster2001.codfw.wmnet
  • 10:45 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/recommendation-api: apply
  • 10:45 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite1005.eqiad.wmnet
  • 10:45 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2004.codfw.wmnet
  • 10:41 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9-wmf2 to netbox-next - volans@cumin1001
  • 10:40 claime: Migrating recommendation-api staging to mw-api-int-async - T334062
  • 10:39 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2004.codfw.wmnet
  • 10:38 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1004.eqiad.wmnet
  • 10:38 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite1005.eqiad.wmnet
  • 10:35 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for wblistentityusage: Deprecate wbeu prefix, new output format (T300460 T196962) (duration: 34m 53s)
  • 10:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 25%: Repooling after migrating', diff saved to https://phabricator.wikimedia.org/P47338 and previous config saved to /var/cache/conftool/dbconfig/20230503-103523-root.json
  • 10:35 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9-wmf2 to netbox-next - volans@cumin1001
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P47337 and previous config saved to /var/cache/conftool/dbconfig/20230503-103433-ladsgroup.json
  • 10:34 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagemaster2001.codfw.wmnet
  • 10:33 cgoubert@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kubestagemaster2001.codfw.wmnet
  • 10:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T335838)', diff saved to https://phabricator.wikimedia.org/P47336 and previous config saved to /var/cache/conftool/dbconfig/20230503-103345-ladsgroup.json
  • 10:33 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagemaster2001.codfw.wmnet
  • 10:32 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1004.eqiad.wmnet
  • 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T335838)', diff saved to https://phabricator.wikimedia.org/P47335 and previous config saved to /var/cache/conftool/dbconfig/20230503-102719-ladsgroup.json
  • 10:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 10:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 (T335838)', diff saved to https://phabricator.wikimedia.org/P47334 and previous config saved to /var/cache/conftool/dbconfig/20230503-102654-ladsgroup.json
  • 10:25 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab2003.wikimedia.org
  • 10:21 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 10:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 10%: Repooling after migrating', diff saved to https://phabricator.wikimedia.org/P47333 and previous config saved to /var/cache/conftool/dbconfig/20230503-102018-root.json
  • 10:19 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab2003.wikimedia.org
  • 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P47332 and previous config saved to /var/cache/conftool/dbconfig/20230503-101926-ladsgroup.json
  • 10:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
  • 10:18 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
  • 10:18 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and migr: Backport for wblistentityusage: Deprecate wbeu prefix, new output format (T300460 T196962) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 10:18 jelto@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host gitlab2003.wikimedia.org
  • 10:17 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host arclamp2001.codfw.wmnet
  • 10:16 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on aphlict1001.eqiad.wmnet with reason: aphlict1002 is now active
  • 10:16 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on aphlict1001.eqiad.wmnet with reason: aphlict1002 is now active
  • 10:13 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 10:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P47331 and previous config saved to /var/cache/conftool/dbconfig/20230503-101147-ladsgroup.json
  • 10:10 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host arclamp2001.codfw.wmnet
  • 10:10 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2003.codfw.wmnet
  • 10:09 cgoubert@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-eqiad
  • 10:09 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host arclamp1001.eqiad.wmnet
  • 10:07 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite2004.codfw.wmnet
  • 10:07 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host webperf2003.codfw.wmnet
  • 10:07 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet
  • 10:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 5%: Repooling after migrating', diff saved to https://phabricator.wikimedia.org/P47330 and previous config saved to /var/cache/conftool/dbconfig/20230503-100513-root.json
  • 10:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T335838)', diff saved to https://phabricator.wikimedia.org/P47329 and previous config saved to /var/cache/conftool/dbconfig/20230503-100420-ladsgroup.json
  • 10:03 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet
  • 10:02 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host arclamp1001.eqiad.wmnet
  • 10:00 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite2004.codfw.wmnet
  • 10:00 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for wblistentityusage: Deprecate wbeu prefix, new output format (T300460 T196962)
  • 09:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T335838)', diff saved to https://phabricator.wikimedia.org/P47328 and previous config saved to /var/cache/conftool/dbconfig/20230503-095901-ladsgroup.json
  • 09:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 09:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 09:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P47327 and previous config saved to /var/cache/conftool/dbconfig/20230503-095641-ladsgroup.json
  • 09:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 09:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 09:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1217.eqiad.wmnet with reason: Cloning db1110 from db1217:3323 T335092
  • 09:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1217.eqiad.wmnet with reason: Cloning db1110 from db1217:3323 T335092
  • 09:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 3%: Repooling after migrating', diff saved to https://phabricator.wikimedia.org/P47325 and previous config saved to /var/cache/conftool/dbconfig/20230503-095008-root.json
  • 09:49 urbanecm@deploy1002: Finished scap: Backport for Personalized praise: Run convertNumber() before displaying numbers (T322443), Personalized praise: Run convertNumber() before displaying numbers (T322443) (duration: 06m 53s)
  • 09:47 cgoubert@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-eqiad
  • 09:47 cgoubert@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-codfw
  • 09:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 4:00:00 on db1110.eqiad.wmnet with reason: Moving to m3 T335092
  • 09:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 4:00:00 on db1110.eqiad.wmnet with reason: Moving to m3 T335092
  • 09:42 urbanecm@deploy1002: Started scap: Backport for Personalized praise: Run convertNumber() before displaying numbers (T322443), Personalized praise: Run convertNumber() before displaying numbers (T322443)
  • 09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 (T335838)', diff saved to https://phabricator.wikimedia.org/P47324 and previous config saved to /var/cache/conftool/dbconfig/20230503-094135-ladsgroup.json
  • 09:36 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade
  • 09:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2114 (T335838)', diff saved to https://phabricator.wikimedia.org/P47323 and previous config saved to /var/cache/conftool/dbconfig/20230503-093606-ladsgroup.json
  • 09:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 09:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 1%: Repooling after migrating', diff saved to https://phabricator.wikimedia.org/P47322 and previous config saved to /var/cache/conftool/dbconfig/20230503-093503-root.json
  • 09:29 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.2 - volans@cumin1001
  • 09:29 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab2003.wikimedia.org
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 100%: Pooling db1213:3316 T326669', diff saved to https://phabricator.wikimedia.org/P47321 and previous config saved to /var/cache/conftool/dbconfig/20230503-092856-root.json
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 100%: Pooling db1213:3315 T326669', diff saved to https://phabricator.wikimedia.org/P47320 and previous config saved to /var/cache/conftool/dbconfig/20230503-092847-root.json
  • 09:28 volans@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.2 - volans@cumin1001
  • 09:26 cgoubert@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2124', diff saved to https://phabricator.wikimedia.org/P47319 and previous config saved to /var/cache/conftool/dbconfig/20230503-092513-root.json
  • 09:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2124.codfw.wmnet with reason: Migrating to 10.6 and rebooting
  • 09:25 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test2001.codfw.wmnet
  • 09:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2124.codfw.wmnet with reason: Migrating to 10.6 and rebooting
  • 09:24 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1003.wikimedia.org
  • 09:21 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief-test2001.codfw.wmnet
  • 09:20 urbanecm@deploy1002: Finished scap: Backport for [Growth] Add GEMentorDashboardEnabledModules (T334630) (duration: 06m 56s)
  • 09:20 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test1001.eqiad.wmnet
  • 09:17 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab1003.wikimedia.org
  • 09:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief-test1001.eqiad.wmnet
  • 09:14 urbanecm@deploy1002: Started scap: Backport for [Growth] Add GEMentorDashboardEnabledModules (T334630)
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 75%: Pooling db1213:3316 T326669', diff saved to https://phabricator.wikimedia.org/P47318 and previous config saved to /var/cache/conftool/dbconfig/20230503-091352-root.json
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 75%: Pooling db1213:3315 T326669', diff saved to https://phabricator.wikimedia.org/P47317 and previous config saved to /var/cache/conftool/dbconfig/20230503-091342-root.json
  • 09:12 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief2001.codfw.wmnet
  • 09:11 urbanecm@deploy1002: sync-world aborted: Backport for Personalized praise: Let mentors to skip suggestions (T334300) (duration: 00m 06s)
  • 09:11 urbanecm@deploy1002: Started scap: Backport for Personalized praise: Let mentors to skip suggestions (T334300)
  • 09:08 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief2001.codfw.wmnet
  • 09:05 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief1001.eqiad.wmnet
  • 09:02 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief1001.eqiad.wmnet
  • 09:01 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host acmechief1001.eqiad.wmnet
  • 09:01 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief1001.eqiad.wmnet
  • 09:00 urbanecm@deploy1002: Finished scap: Backport for Personalized praise: Let mentors to skip suggestions (T334300) (duration: 27m 39s)
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 50%: Pooling db1213:3316 T326669', diff saved to https://phabricator.wikimedia.org/P47316 and previous config saved to /var/cache/conftool/dbconfig/20230503-085847-root.json
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 50%: Pooling db1213:3315 T326669', diff saved to https://phabricator.wikimedia.org/P47315 and previous config saved to /var/cache/conftool/dbconfig/20230503-085837-root.json
  • 08:44 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade
  • 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 25%: Pooling db1213:3316 T326669', diff saved to https://phabricator.wikimedia.org/P47314 and previous config saved to /var/cache/conftool/dbconfig/20230503-084342-root.json
  • 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 25%: Pooling db1213:3315 T326669', diff saved to https://phabricator.wikimedia.org/P47313 and previous config saved to /var/cache/conftool/dbconfig/20230503-084332-root.json
  • 08:39 marostegui: dbmaint deploy schema change on eqiad s3 with replication T335834
  • 08:39 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: Install software version upgrade
  • 08:32 urbanecm@deploy1002: Started scap: Backport for Personalized praise: Let mentors to skip suggestions (T334300)
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 10%: Pooling db1213:3316 T326669', diff saved to https://phabricator.wikimedia.org/P47312 and previous config saved to /var/cache/conftool/dbconfig/20230503-082837-root.json
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 10%: Pooling db1213:3315 T326669', diff saved to https://phabricator.wikimedia.org/P47311 and previous config saved to /var/cache/conftool/dbconfig/20230503-082827-root.json
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 5%: Pooling db1213:3316 T326669', diff saved to https://phabricator.wikimedia.org/P47310 and previous config saved to /var/cache/conftool/dbconfig/20230503-081332-root.json
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 5%: Pooling db1213:3315 T326669', diff saved to https://phabricator.wikimedia.org/P47309 and previous config saved to /var/cache/conftool/dbconfig/20230503-081323-root.json
  • 08:04 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1004.wikimedia.org
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 4%: Pooling db1213:3316 T326669', diff saved to https://phabricator.wikimedia.org/P47308 and previous config saved to /var/cache/conftool/dbconfig/20230503-075828-root.json
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 4%: Pooling db1213:3315 T326669', diff saved to https://phabricator.wikimedia.org/P47307 and previous config saved to /var/cache/conftool/dbconfig/20230503-075818-root.json
  • 07:57 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab1004.wikimedia.org
  • 07:48 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Install software version upgrade
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 3%: Pooling db1213:3316 T326669', diff saved to https://phabricator.wikimedia.org/P47306 and previous config saved to /var/cache/conftool/dbconfig/20230503-074323-root.json
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 3%: Pooling db1213:3315 T326669', diff saved to https://phabricator.wikimedia.org/P47305 and previous config saved to /var/cache/conftool/dbconfig/20230503-074313-root.json
  • 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 T335011', diff saved to https://phabricator.wikimedia.org/P47304 and previous config saved to /var/cache/conftool/dbconfig/20230503-073602-root.json
  • 07:28 taavi@deploy1002: Finished scap: Backport for Remove duplicated diff-mode selector in save dialog (T324759) (duration: 10m 14s)
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 2%: Pooling db1213:3316 T326669', diff saved to https://phabricator.wikimedia.org/P47303 and previous config saved to /var/cache/conftool/dbconfig/20230503-072818-root.json
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 2%: Pooling db1213:3315 T326669', diff saved to https://phabricator.wikimedia.org/P47302 and previous config saved to /var/cache/conftool/dbconfig/20230503-072808-root.json
  • 07:26 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 07:20 taavi@deploy1002: taavi and samwilson: Backport for Remove duplicated diff-mode selector in save dialog (T324759) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 07:18 taavi@deploy1002: Started scap: Backport for Remove duplicated diff-mode selector in save dialog (T324759)
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 1%: Pooling db1213:3316 T326669', diff saved to https://phabricator.wikimedia.org/P47299 and previous config saved to /var/cache/conftool/dbconfig/20230503-071313-root.json
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 1%: Pooling db1213:3315 T326669', diff saved to https://phabricator.wikimedia.org/P47298 and previous config saved to /var/cache/conftool/dbconfig/20230503-071303-root.json
  • 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1213 (s5,s6) to dbctl T326669', diff saved to https://phabricator.wikimedia.org/P47297 and previous config saved to /var/cache/conftool/dbconfig/20230503-071046-marostegui.json
  • 07:09 moritzm: installing glibc bugfix updates from bullseye point release
  • 07:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1117.eqiad.wmnet
  • 07:02 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:02 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1117.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 07:01 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1117.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 06:56 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 06:50 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1117.eqiad.wmnet
  • 06:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 38 hosts with reason: Disconnecting codfw > eqiad T335267
  • 06:46 marostegui: Disconnect codfw -> eqiad replication on s1 T335267
  • 06:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 38 hosts with reason: Disconnecting codfw > eqiad T335267
  • 06:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
  • 06:28 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
  • 06:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 34 hosts with reason: Disconnecting codfw > eqiad T335267
  • 06:14 marostegui: Disconnect codfw -> eqiad replication on s8 T335267
  • 06:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 34 hosts with reason: Disconnecting codfw > eqiad T335267
  • 06:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 35 hosts with reason: Disconnecting codfw > eqiad T335267
  • 06:09 marostegui: Disconnect codfw -> eqiad replication on s4 T335267
  • 06:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 35 hosts with reason: Disconnecting codfw > eqiad T335267
  • 06:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 28 hosts with reason: Disconnecting codfw > eqiad T335267
  • 06:06 marostegui: Disconnect codfw -> eqiad replication on s7 T335267
  • 06:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 28 hosts with reason: Disconnecting codfw > eqiad T335267
  • 06:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 24 hosts with reason: Disconnecting codfw > eqiad T335267
  • 06:01 marostegui: Disconnect codfw -> eqiad replication on s3 T335267
  • 06:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 24 hosts with reason: Disconnecting codfw > eqiad T335267
  • 05:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 26 hosts with reason: Disconnecting codfw > eqiad T335267
  • 05:59 marostegui: Disconnect codfw -> eqiad replication on s5 T335267
  • 05:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 26 hosts with reason: Disconnecting codfw > eqiad T335267
  • 05:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 27 hosts with reason: Disconnecting codfw > eqiad T335267
  • 05:57 marostegui: Disconnect codfw -> eqiad replication on s2 T335267
  • 05:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 27 hosts with reason: Disconnecting codfw > eqiad T335267
  • 05:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 27 hosts with reason: Disconnecting codfw > eqiad T335267
  • 05:54 marostegui: Disconnect codfw -> eqiad replication on s6 T335267
  • 05:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 27 hosts with reason: Disconnecting codfw > eqiad T335267
  • 05:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 6 hosts with reason: Disconnecting codfw > eqiad T335267
  • 05:51 marostegui: Disconnect codfw -> eqiad replication on es5 T335267
  • 05:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 6 hosts with reason: Disconnecting codfw > eqiad T335267
  • 05:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 6 hosts with reason: Disconnecting codfw > eqiad T335267
  • 05:48 marostegui: Disconnect codfw -> eqiad replication on es4 T335267
  • 05:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 6 hosts with reason: Disconnecting codfw > eqiad T335267
  • 05:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 10 hosts with reason: Disconnecting codfw > eqiad T335267
  • 05:44 marostegui: Disconnect codfw -> eqiad replication on x1 T335267
  • 05:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 10 hosts with reason: Disconnecting codfw > eqiad T335267
  • 05:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on pc2013.codfw.wmnet,pc1013.eqiad.wmnet with reason: Disconnecting codfw > eqiad T335267
  • 05:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on pc2013.codfw.wmnet,pc1013.eqiad.wmnet with reason: Disconnecting codfw > eqiad T335267
  • 05:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on pc2012.codfw.wmnet,pc1012.eqiad.wmnet with reason: Disconnecting codfw > eqiad T335267
  • 05:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on pc2012.codfw.wmnet,pc1012.eqiad.wmnet with reason: Disconnecting codfw > eqiad T335267
  • 05:40 marostegui: Disconnect codfw -> eqiad replication on pc3 T335267
  • 05:40 marostegui: Disconnect codfw -> eqiad replication on pc2 T335267
  • 05:40 marostegui: Disconnect codfw -> eqiad replication on pc1 T335267
  • 05:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on pc[2011,2014].codfw.wmnet,pc[1011,1014].eqiad.wmnet with reason: Disconnecting codfw > eqiad T335267
  • 05:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on pc[2011,2014].codfw.wmnet,pc[1011,1014].eqiad.wmnet with reason: Disconnecting codfw > eqiad T335267
  • 01:24 eileen: civicrm upgraded from 09d2eefd to c6149ad2
  • 00:58 ejegg: civicrm upgraded from 8426761b to 09d2eefd
  • 00:47 sukhe: restart haproxy on cp2031: T334448
  • 00:42 eileen: civicrm upgraded from 8076995a to 8426761b

2023-05-02

  • 23:28 eileen: config revision changed from 18acbe1a to 69f60bb9
  • 22:56 eileen: config revision changed from 2eef4039 to 18acbe1a
  • 22:53 eileen: civicrm upgraded from a3d84de3 to 8076995a
  • 20:36 urbanecm@deploy1002: Finished scap: Backport for [Growth] Finish Personalized praise variable rename (T334630) (duration: 06m 55s)
  • 20:30 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 20:30 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 20:29 urbanecm@deploy1002: Started scap: Backport for [Growth] Finish Personalized praise variable rename (T334630)
  • 20:25 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@daf8c32]: bump mjolnir to v2.3.0 (duration: 00m 28s)
  • 20:24 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@daf8c32]: bump mjolnir to v2.3.0
  • 20:20 urbanecm@deploy1002: Finished scap: Backport for Switch on creating Babel categories in Russian Wiktionary (T335136) (duration: 15m 47s)
  • 20:06 urbanecm@deploy1002: urbanecm and iniquity: Backport for Switch on creating Babel categories in Russian Wiktionary (T335136) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 20:04 urbanecm@deploy1002: Started scap: Backport for Switch on creating Babel categories in Russian Wiktionary (T335136)
  • 19:40 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_codfw
  • 19:40 bking@cumin1001: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_codfw
  • 19:28 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:28 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:18 bking@deploy1002: Finished deploy [wdqs/wdqs@0e051d8]: (no justification provided) (duration: 00m 05s)
  • 19:18 bking@deploy1002: Started deploy [wdqs/wdqs@0e051d8]: (no justification provided)
  • 19:13 bking@deploy1002: Finished deploy [wdqs/wdqs@0e051d8]: (no justification provided) (duration: 00m 16s)
  • 19:12 bking@deploy1002: Started deploy [wdqs/wdqs@0e051d8]: (no justification provided)
  • 19:12 bking@deploy1002: Finished deploy [wdqs/wdqs@0e051d8]: (no justification provided) (duration: 07m 13s)
  • 19:05 bking@deploy1002: Started deploy [wdqs/wdqs@0e051d8]: (no justification provided)
  • 19:04 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on wdqs2022.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 19:04 bking@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on wdqs2022.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 18:56 bking@deploy1002: Finished deploy [wdqs/wdqs@0e051d8]: (no justification provided) (duration: 00m 03s)
  • 18:56 bking@deploy1002: Started deploy [wdqs/wdqs@0e051d8]: (no justification provided)
  • 18:56 bking@deploy1002: Finished deploy [wdqs/wdqs@0e051d8]: (no justification provided) (duration: 00m 19s)
  • 18:55 bking@deploy1002: Started deploy [wdqs/wdqs@0e051d8]: (no justification provided)
  • 18:50 bking@cumin1001: conftool action : set/pooled=inactive; selector: name=wdqs2022.codfw.wmnet
  • 18:38 ejegg: civicrm upgraded from e7904ea6 to a3d84de3
  • 18:19 milimetric@deploy1002: Finished deploy [analytics/refinery@c42021f] (thin): Regular analytics weekly train THIN [analytics/refinery@c42021f] (duration: 00m 07s)
  • 18:19 milimetric@deploy1002: Started deploy [analytics/refinery@c42021f] (thin): Regular analytics weekly train THIN [analytics/refinery@c42021f]
  • 18:11 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.7 refs T330213
  • 18:03 brennen: train 1.41.0-wmf.7 (T330213): (correction: group0 today)
  • 18:01 brennen: train 1.41.0-wmf.7 (T330213): no current blockers, rolling to group1 with `scap train`
  • 17:45 milimetric@deploy1002: Finished deploy [analytics/refinery@c42021f]: Regular analytics weekly train [analytics/refinery@c42021f] (duration: 06m 26s)
  • 17:39 milimetric@deploy1002: Started deploy [analytics/refinery@c42021f]: Regular analytics weekly train [analytics/refinery@c42021f]
  • 17:36 sukhe: cr*-eqiad: delete backup routes for ns0: delete routing-options static route 208.80.153.231/32: T330670
  • 17:34 sukhe: [correction] cr*-codfw: delete backup routes for ns0: delete routing-options static route 208.80.154.238/32: T330670
  • 17:32 sukhe: cr*-codfw: delete backup routes for ns1: delete routing-options static route 208.80.154.238/32: T330670
  • 17:28 sukhe: ns1 set routing-options static route 208.80.153.231/32 next-hop [ 208.80.153.77 208.80.153.111 208.80.153.10 ]: T330670
  • 17:25 sukhe: ns0 set routing-options static route 208.80.154.238/32 next-hop [ 208.80.154.10 208.80.155.108 208.80.154.134 ]: T330670
  • 17:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
  • 16:58 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 16:38 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 16:24 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 16:16 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@bb96aca]: Add snappy dependency for kafka daemons (duration: 00m 26s)
  • 16:16 sukhe: ns0 backup routes: delete routing-options static route 208.80.154.238/32 next-hop 208.80.153.111, set to 208.80.153.77
  • 16:16 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@bb96aca]: Add snappy dependency for kafka daemons
  • 16:12 sukhe: ns1: delete routing-options static route 208.80.153.231/32 next-hop 208.80.153.111, set to 208.80.153.77
  • 16:11 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 16:10 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 16:08 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 16:06 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 15:39 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:38 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 15:36 claime: Re-running puppet on failed parse servers - T313227
  • 15:35 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12 days, 0:00:00 on wdqs2022.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 15:35 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:34 bking@cumin1001: START - Cookbook sre.hosts.downtime for 12 days, 0:00:00 on wdqs2022.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 15:33 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 15:16 jiji@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: codfw row C switches upgrade - T334049
  • 15:13 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:12 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 15:09 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 15:04 claime: enabling puppet on parse2014
  • 15:04 claime: enabling puppet on parse2013
  • 15:02 akosiaris: enable puppet on parse1005
  • 15:00 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:00 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 14:59 jiji@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: codfw row C switches upgrade - T334049
  • 14:59 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 14:58 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 14:56 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:55 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 14:54 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 14:53 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 14:52 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 14:40 moritzm: installing intel-microcode security updates on bullseye servers
  • 14:40 akosiaris: emergency disabling of puppet on parse hosts
  • 14:33 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: sync
  • 14:33 claime: Merging new internal certs for api, jobrunner, appservers, parsoid - T313227
  • 14:29 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: sync
  • 14:27 denisse: sync prometheus3001 -> prometheus3002
  • 14:27 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 14:23 _joe_: also on contint1002, the current ci master
  • 14:22 _joe_: restarted zuul on contint2001
  • 14:07 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 13:51 sukhe: run authdns-update to repool codfw
  • 13:47 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid
  • 13:47 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=appserver
  • 13:47 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=api_appserver
  • 13:45 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 13:37 urbanecm@deploy1002: Finished scap: Backport for Enable Kartographer Nearby on mobile (T333137), Fix clearing wrong container when closing fullscreen map (T335648), Fix clearing wrong container when closing fullscreen map (T335648) (duration: 14m 54s)
  • 13:25 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 13:24 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica2005.wikimedia.org
  • 13:24 urbanecm@deploy1002: wmde-fisch and urbanecm: Backport for Enable Kartographer Nearby on mobile (T333137), Fix clearing wrong container when closing fullscreen map (T335648), Fix clearing wrong container when closing fullscreen map (T335648) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 13:22 urbanecm@deploy1002: Started scap: Backport for Enable Kartographer Nearby on mobile (T333137), Fix clearing wrong container when closing fullscreen map (T335648), Fix clearing wrong container when closing fullscreen map (T335648)
  • 13:16 urbanecm@deploy1002: Sync cancelled.
  • 13:16 urbanecm@deploy1002: urbanecm and wmde-fisch: Backport for Enable Kartographer Nearby on mobile (T333137) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 13:07 urbanecm@deploy1002: Started scap: Backport for Enable Kartographer Nearby on mobile (T333137)
  • 13:05 XioNoX: rebooting asw-c-codfw for software upgrade - T334049
  • 13:03 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 185 hosts with reason: codfw row C upgrade
  • 13:01 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 185 hosts with reason: codfw row C upgrade
  • 12:54 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on 186 hosts with reason: codfw row C upgrade
  • 12:54 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 186 hosts with reason: codfw row C upgrade
  • 12:31 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) check 2 services: maintenance
  • 12:31 eevans@cumin1001: START - Cookbook sre.discovery.service-route check 2 services: maintenance
  • 12:30 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:29 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:28 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 12:24 moritzm: installing LInux 5.10.178 on bullseye hosts
  • 12:20 sukhe: run authdns-update to depool codfwL T334049
  • 12:17 Amir1: stop slave on eqiad masters of s1, x1, s8 (T334049)
  • 12:10 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica2005.wikimedia.org
  • 12:05 Amir1: stop slave again on db1130 (eqiad master of s5) (T334049)
  • 12:03 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host backup1011
  • 12:03 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host backup1011
  • 11:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter2003.codfw.wmnet
  • 11:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host poolcounter2003.codfw.wmnet
  • 11:52 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host backup1011
  • 11:52 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host backup1011
  • 11:52 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host backup1010
  • 11:52 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host backup1010
  • 11:52 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host puppetmaster1006
  • 11:51 Amir1: stop slave on db1130 (eqiad master of s5) (T334049)
  • 11:51 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 11:50 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host puppetmaster1006
  • 11:50 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest1003
  • 11:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter2004.codfw.wmnet
  • 11:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 41 hosts with reason: Row c switch maint T334049
  • 11:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on 41 hosts with reason: Row c switch maint T334049
  • 11:49 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest1003
  • 11:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host poolcounter2004.codfw.wmnet
  • 11:32 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:30 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 10:52 akosiaris@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) depool all active/active services in codfw: codfw row C switches upgrade - T334049
  • 10:47 ladsgroup@deploy1002: Finished scap: Backport for Set externallinks migration to read new in testwiki (T335343) (duration: 13m 27s)
  • 10:35 ladsgroup@deploy1002: ladsgroup: Backport for Set externallinks migration to read new in testwiki (T335343) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 10:33 ladsgroup@deploy1002: Started scap: Backport for Set externallinks migration to read new in testwiki (T335343)
  • 10:00 ladsgroup@deploy1002: Finished scap: Backport for Remove 1024px and 1920px from pre-gen thumbsizes (T211661) (duration: 08m 40s)
  • 09:59 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter depool all active/active services in codfw: codfw row C switches upgrade - T334049
  • 09:53 ladsgroup@deploy1002: ladsgroup: Backport for Remove 1024px and 1920px from pre-gen thumbsizes (T211661) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 09:51 ladsgroup@deploy1002: Started scap: Backport for Remove 1024px and 1920px from pre-gen thumbsizes (T211661)
  • 09:21 eoghan@cumin1001: END (ERROR) - Cookbook sre.gitlab.failover (exit_code=97) Failover of gitlab from gitlab2002.wikimedia.org to gitlab1004.wikimedia.org
  • 09:13 ladsgroup@deploy1002: scap failed: CalledProcessError Command 'sudo -u mwbuilder /usr/local/bin/update-mediawiki-tools-release' returned non-zero exit status 1. (duration: 00m 05s)
  • 09:12 ladsgroup@deploy1002: Started scap: Backport for Remove 1024px and 1920px from pre-gen thumbsizes (T211661)
  • 09:10 eoghan@cumin1001: START - Cookbook sre.gitlab.failover Failover of gitlab from gitlab2002.wikimedia.org to gitlab1004.wikimedia.org
  • 08:51 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 08:44 vgutierrez: testing haproxy 2.6.12-1~bpo10+1+wmf1 in cp1077 and cp1085 - T334448
  • 08:40 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 08:28 moritzm: updated netboot image for Bullseye 11.7 T335575
  • 08:27 XioNoX: stage Junos 21 on asw-c-codfw - T334049
  • 08:07 godog: upgrade grafana to 9.3.13
  • 07:49 tgr_: UTC morning deploys done
  • 07:48 tgr@deploy1002: Finished scap: Backport for OAuth: Do not require approval for read-only grants on public wikis (T67750) (duration: 07m 39s)
  • 07:42 tgr@deploy1002: tgr: Backport for OAuth: Do not require approval for read-only grants on public wikis (T67750) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 07:40 tgr@deploy1002: Started scap: Backport for OAuth: Do not require approval for read-only grants on public wikis (T67750)
  • 07:38 tgr@deploy1002: Finished scap: Backport for [noop] Disable section image recommendations in production (T329276) (duration: 07m 29s)
  • 07:32 tgr@deploy1002: tgr: Backport for [noop] Disable section image recommendations in production (T329276) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 07:30 tgr@deploy1002: Started scap: Backport for [noop] Disable section image recommendations in production (T329276)
  • 07:11 taavi@deploy1002: Finished scap: Backport for Enable WikiLove extension on bnwikibooks (T335705) (duration: 07m 59s)
  • 07:05 taavi@deploy1002: taavi and mdsshakil: Backport for Enable WikiLove extension on bnwikibooks (T335705) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 07:03 taavi@deploy1002: Started scap: Backport for Enable WikiLove extension on bnwikibooks (T335705)
  • 06:57 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Cmjohnson out of all services on: 794 hosts
  • 06:57 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Cmjohnson out of all services on: 794 hosts
  • 06:57 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Cmjohnson out of all services on: 1274 hosts
  • 06:56 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Cmjohnson out of all services on: 1274 hosts
  • 06:08 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 136106
  • 06:07 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 136106
  • 06:07 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 293
  • 06:06 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 293
  • 06:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 48237
  • 06:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 48237
  • 06:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 132132
  • 06:03 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 132132
  • 06:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 10089
  • 06:01 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 10089
  • 05:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 17961
  • 05:58 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 17961
  • 03:54 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.5 (duration: 02m 17s)
  • 03:52 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.7 refs T330213 (duration: 49m 21s)
  • 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.7 refs T330213

2023-05-01

  • 22:24 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 22:23 legoktm@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 22:23 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 22:23 legoktm@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 22:23 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 22:22 legoktm@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 22:22 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 22:22 legoktm@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 22:22 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 22:21 legoktm@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 22:19 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 22:18 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 22:18 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 22:17 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 22:17 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 22:16 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 22:16 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 22:16 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 22:15 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 22:14 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 21:57 legoktm@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 21:56 legoktm@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 21:56 legoktm@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 21:56 legoktm@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 21:56 legoktm@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 21:56 legoktm@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 21:55 legoktm@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 21:55 legoktm@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 21:55 legoktm@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 21:55 legoktm@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 21:47 legoktm@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 21:47 legoktm@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 21:47 legoktm@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 21:47 legoktm@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 21:15 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 17 hosts with reason: T334049 maint
  • 21:15 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 17 hosts with reason: T334049 maint
  • 21:08 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic[2045-2048,2059,2065-2066,2071,2081-2083]* for row C switch upgrade - bking@cumin1001 - T334049
  • 21:08 bking@cumin1001: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic[2045-2048,2059,2065-2066,2071,2081-2083]* for row C switch upgrade - bking@cumin1001 - T334049
  • 21:08 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: elastic[2045-2048,2059,2065-2066,2071,2081-2083] for row C switch upgrade - bking@cumin1001 - T334049
  • 21:08 bking@cumin1001: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic[2045-2048,2059,2065-2066,2071,2081-2083] for row C switch upgrade - bking@cumin1001 - T334049
  • 21:02 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool sessionstore in eqiad: maintenance
  • 20:57 eevans@cumin1001: START - Cookbook sre.discovery.service-route pool sessionstore in eqiad: maintenance
  • 20:47 urandom: upgrading sessionstore1003 to Cassandra 3.11.14 — T335383
  • 20:45 urandom: upgrading sessionstore1002 to Cassandra 3.11.14 — T335383
  • 20:42 urandom: upgrading sessionstore1001 to Cassandra 3.11.14 — T335383
  • 20:38 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool sessionstore in eqiad: maintenance
  • 20:33 eevans@cumin1001: START - Cookbook sre.discovery.service-route depool sessionstore in eqiad: maintenance
  • 20:13 taavi@deploy1002: Finished scap: Backport for Point SyntaxHighlight at /srv/app/pygmentize (T320848) (duration: 08m 12s)
  • 20:06 taavi@deploy1002: legoktm and taavi: Backport for Point SyntaxHighlight at /srv/app/pygmentize (T320848) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:05 taavi@deploy1002: Started scap: Backport for Point SyntaxHighlight at /srv/app/pygmentize (T320848)
  • 19:42 dancy@deploy1002: Installation of scap version "4.52.0" completed for 593 hosts
  • 19:41 dancy@deploy1002: Installing scap version "4.52.0" for 593 hosts
  • 19:06 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:06 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 18:58 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 18:58 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 18:56 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 18:56 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 17:32 sukhe: run authdns-update for CR 913966
  • 16:54 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool sessionstore in codfw: maintenance
  • 16:49 eevans@cumin1001: START - Cookbook sre.discovery.service-route pool sessionstore in codfw: maintenance
  • 16:44 urbanecm@deploy1002: Finished scap: Backport for dewiki: Deploy Growth features to 100% of newcomers (T335385) (duration: 07m 22s)
  • 16:38 urbanecm@deploy1002: urbanecm: Backport for dewiki: Deploy Growth features to 100% of newcomers (T335385) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 16:37 urbanecm@deploy1002: Started scap: Backport for dewiki: Deploy Growth features to 100% of newcomers (T335385)
  • 16:33 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2001.codfw.wmnet
  • 16:26 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2001.codfw.wmnet
  • 16:22 urandom: upgrading sessionstore2003 to Cassandra 3.11.14 — T335383
  • 16:19 urandom: upgrading sessionstore2002 to Cassandra 3.11.14 — T335383
  • 16:03 urandom: upgrading sessionstore2001 to Cassandra 3.11.14 — T335383
  • 15:59 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool sessionstore in codfw: maintenance
  • 15:54 eevans@cumin1001: START - Cookbook sre.discovery.service-route depool sessionstore in codfw: maintenance
  • 14:58 sukhe: restart haproxy on cp1077: T334448
  • 14:09 sukhe: move backup routes for ns0 from dns2001 to dns2002: T334049
  • 14:04 sukhe: move ns1 from dns2001 to dns2002: T334049
  • 13:19 taavi@deploy1002: Finished scap: Backport for [slwiki] Enable VisualEditor on Draft and Project namespaces (T335208), [frwikibooks] Change the logo for Vector legacy and add a wordmark for Vector 2022 (T335642), Close nawiki (T335674) (duration: 07m 59s)
  • 13:12 taavi@deploy1002: superpes and taavi: Backport for [slwiki] Enable VisualEditor on Draft and Project namespaces (T335208), [frwikibooks] Change the logo for Vector legacy and add a wordmark for Vector 2022 (T335642), Close nawiki (T335674) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:11 taavi@deploy1002: Started scap: Backport for [slwiki] Enable VisualEditor on Draft and Project namespaces (T335208), [frwikibooks] Change the logo for Vector legacy and add a wordmark for Vector 2022 (T335642), Close nawiki (T335674)
  • 11:41 zabe@deploy1002: Finished scap: Backport for Start writing to af_actor/afh_actor in group0 wikis (T334295) (duration: 21m 08s)
  • 11:31 zabe@deploy1002: zabe: Backport for Start writing to af_actor/afh_actor in group0 wikis (T334295) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 11:20 zabe@deploy1002: Started scap: Backport for Start writing to af_actor/afh_actor in group0 wikis (T334295)
  • 11:20 zabe@deploy1002: sync-world aborted: Backport for Start writing to af_actor/afh_actor in group0 wikis (T334295) (duration: 02m 33s)
  • 11:17 zabe@deploy1002: Started scap: Backport for Start writing to af_actor/afh_actor in group0 wikis (T334295)

2000s

2010s

2020s