Jump to content

Server Admin Log/Archive 82

From Wikitech
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

2024-06-30

  • 23:25 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 23:25 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 23:17 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 23:15 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 23:14 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 23:14 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 23:14 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 23:13 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 23:12 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 23:11 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 23:11 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 23:11 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 23:11 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 23:09 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 23:09 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 23:05 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 23:03 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 22:56 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 22:55 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 22:53 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 22:53 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 22:51 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 21:27 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 21:27 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 17:08 _joe_: delete failing pod in eqiad for mw-api-ext, caused the backend errors page

2024-06-29

  • 01:24 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1040.eqiad.wmnet with OS bullseye
  • 01:12 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:10 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 00:04 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1040.eqiad.wmnet with OS bullseye
  • 00:04 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:04 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for dbproxy1028,9 - jclark@cumin1002"
  • 00:03 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for dbproxy1028,9 - jclark@cumin1002"
  • 00:00 jclark@cumin1002: START - Cookbook sre.dns.netbox

2024-06-28

  • 23:57 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:56 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 23:55 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:55 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for dbproxy1028,9 - jclark@cumin1002"
  • 23:54 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for dbproxy1028,9 - jclark@cumin1002"
  • 23:50 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 23:42 tzatziki: removing 1 image for legal compliance
  • 23:33 pt1979@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudcephosd1039.eqiad.wmnet
  • 23:32 pt1979@cumin1002: START - Cookbook sre.hosts.dhcp for host cloudcephosd1039.eqiad.wmnet
  • 23:32 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:32 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip address for cloudcephosd1039 - pt1979@cumin2002"
  • 23:31 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip address for cloudcephosd1039 - pt1979@cumin2002"
  • 23:29 pt1979@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudcephosd1039.eqiad.wmnet
  • 23:28 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 23:23 pt1979@cumin1002: START - Cookbook sre.hosts.dhcp for host cloudcephosd1039.eqiad.wmnet
  • 23:22 pt1979@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudcephosd1039.eqiad.wmnet
  • 23:21 tzatziki: removing 1 image for legal compliance
  • 23:18 pt1979@cumin1002: START - Cookbook sre.hosts.dhcp for host cloudcephosd1039.eqiad.wmnet
  • 23:18 pt1979@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudcephosd1039.eqiad.wmnet
  • 23:16 tzatziki: removing 1 image for legal compliance
  • 22:50 pt1979@cumin1002: START - Cookbook sre.hosts.dhcp for host cloudcephosd1039.eqiad.wmnet
  • 22:14 jclark@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1041']
  • 22:13 jclark@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1041']
  • 22:10 jclark@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1040']
  • 22:09 jclark@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1040']
  • 21:42 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1040.eqiad.wmnet with OS bullseye
  • 21:42 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:42 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephosd1040 - jclark@cumin1002"
  • 21:41 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1041.eqiad.wmnet with OS bullseye
  • 21:41 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1039.eqiad.wmnet with OS bullseye
  • 21:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephosd1040 - jclark@cumin1002"
  • 21:38 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 21:38 jclark@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:36 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 21:35 jclark@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1041.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:34 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 21:33 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1039.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:30 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1040.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:22 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1041.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephosd1039 - jclark@cumin1002"
  • 21:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephosd1039 - jclark@cumin1002"
  • 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 21:17 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1040.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:17 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:17 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephosd1039 - jclark@cumin1002"
  • 21:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1039.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:16 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephosd1039 - jclark@cumin1002"
  • 21:11 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 21:05 sukhe: sudo cumin -b11 "A:cp-text" 'run-puppet-agent'
  • 20:29 sukhe: sudo cumin "A:cp-text" 'disable-puppet "CR 1050672"'
  • 20:20 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1028.eqiad.wmnet with OS bookworm
  • 20:20 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 20:19 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 20:18 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1029.eqiad.wmnet with OS bookworm
  • 20:18 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 20:15 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 20:04 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1028.eqiad.wmnet with reason: host reimage
  • 20:01 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on lists1001.wikimedia.org with reason: decomed
  • 20:01 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on lists1001.wikimedia.org with reason: decomed
  • 20:01 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1029.eqiad.wmnet with reason: host reimage
  • 19:57 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1028.eqiad.wmnet with reason: host reimage
  • 19:57 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1029.eqiad.wmnet with reason: host reimage
  • 19:46 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host dbproxy1029.eqiad.wmnet with OS bookworm
  • 19:46 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host dbproxy1028.eqiad.wmnet with OS bookworm
  • 19:37 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:37 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for dbproxy1028,9 - jclark@cumin1002"
  • 19:36 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1029
  • 19:36 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for dbproxy1028,9 - jclark@cumin1002"
  • 19:35 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1029
  • 19:34 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1028
  • 19:33 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1028
  • 19:31 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 19:31 jclark@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 19:31 sukhe: sudo cumin -b10 "A:cp-text" "run-puppet-agent --enable 'dont enable'": T368645
  • 19:30 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 18:22 sukhe: disable puppet on A:cp-text
  • 18:16 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
  • 18:16 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
  • 18:05 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 18:05 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 16:43 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2026.codfw.wmnet with OS bullseye
  • 16:36 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:36 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 16:33 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:23 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 16:19 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 16:03 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on mw2300.codfw.wmnet with reason: Reimaging issues
  • 16:03 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on mw2300.codfw.wmnet with reason: Reimaging issues
  • 15:45 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2026.codfw.wmnet with OS bullseye
  • 15:43 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:43 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: fix entries for wikikube-worker2026 - cmooney@cumin1002"
  • 15:35 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: fix entries for wikikube-worker2026 - cmooney@cumin1002"
  • 15:32 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 15:25 hnowlan@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2025.codfw.wmnet|wikikube-worker2027.codfw.wmnet|wikikube-worker2028.codfw.wmnet|wikikube-worker2029.codfw.wmnet),cluster=kubernetes,service=kubesvc
  • 15:22 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:21 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
  • 15:21 andrewbogott: upgraded wikitech-static to 1_42 and php 8.3
  • 15:14 hnowlan: homer 'cr*codfw*' commit 'T351074'
  • 15:14 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2027.codfw.wmnet with OS bullseye
  • 15:12 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:11 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
  • 15:11 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:10 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2028.codfw.wmnet with OS bullseye
  • 15:10 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker1027.eqiad.wmnet|wikikube-worker1028.eqiad.wmnet|wikikube-worker1029.eqiad.wmnet|wikikube-worker1030.eqiad.wmnet|wikikube-worker1031.eqiad.wmnet),cluster=kubernetes,service=kubesvc
  • 15:10 claime: Pooling and uncordoning wikikube-worker1027.eqiad.wmnet,wikikube-worker1028.eqiad.wmnet,wikikube-worker1029.eqiad.wmnet,wikikube-worker1030.eqiad.wmnet,wikikube-worker1031.eqiad.wmnet - T351074
  • 15:09 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2029.codfw.wmnet with OS bullseye
  • 15:07 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2025.codfw.wmnet with OS bullseye
  • 15:06 jhathaway: mx-in1001 postfix mx testing complete
  • 15:04 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
  • 15:00 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:00 claime: homer 'cr*eqiad*' commit 'T351074'
  • 14:56 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2027.codfw.wmnet with reason: host reimage
  • 14:54 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
  • 14:52 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2028.codfw.wmnet with reason: host reimage
  • 14:50 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2029.codfw.wmnet with reason: host reimage
  • 14:47 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2025.codfw.wmnet with reason: host reimage
  • 14:46 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2029.codfw.wmnet with reason: host reimage
  • 14:46 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2028.codfw.wmnet with reason: host reimage
  • 14:45 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2027.codfw.wmnet with reason: host reimage
  • 14:45 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2025.codfw.wmnet with reason: host reimage
  • 14:30 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2029.codfw.wmnet with OS bullseye
  • 14:30 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2028.codfw.wmnet with OS bullseye
  • 14:29 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2027.codfw.wmnet with OS bullseye
  • 14:28 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2025.codfw.wmnet with OS bullseye
  • 14:27 sukhe: sudo cumin "O:durum" "run-puppet-agent"
  • 14:27 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2330 to wikikube-worker2029
  • 14:27 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2029
  • 14:26 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2029
  • 14:26 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:26 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2330 to wikikube-worker2029 - hnowlan@cumin1002"
  • 14:25 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 14:25 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2330 to wikikube-worker2029 - hnowlan@cumin1002"
  • 14:24 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 14:23 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 14:23 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 14:22 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 14:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host krb1002.eqiad.wmnet with OS bookworm
  • 14:22 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 14:22 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw2330 to wikikube-worker2029
  • 14:22 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2308 to wikikube-worker2028
  • 14:22 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2028
  • 14:22 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2028
  • 14:22 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:21 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2308 to wikikube-worker2028 - hnowlan@cumin1002"
  • 14:15 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:12 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2308 to wikikube-worker2028 - hnowlan@cumin1002"
  • 14:10 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host deploy1003.eqiad.wmnet with OS bullseye
  • 14:08 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 14:07 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw2308 to wikikube-worker2028
  • 14:07 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2306 to wikikube-worker2027
  • 14:07 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2027
  • 14:07 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 14:07 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2027
  • 14:07 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:06 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2306 to wikikube-worker2027 - hnowlan@cumin1002"
  • 14:06 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx-in1001.wikimedia.org with reason: email testing
  • 14:05 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on mx-in1001.wikimedia.org with reason: email testing
  • 14:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1027.eqiad.wmnet with OS bullseye
  • 14:04 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:03 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2306 to wikikube-worker2027 - hnowlan@cumin1002"
  • 14:01 jhathaway: ingressing email on mx-in1001, initial test 1hr
  • 14:00 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 14:00 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw2306 to wikikube-worker2027
  • 13:59 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2298 to wikikube-worker2025
  • 13:59 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2025
  • 13:56 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on deploy1003.eqiad.wmnet with reason: host reimage
  • 13:54 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2025
  • 13:54 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:54 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2298 to wikikube-worker2025 - hnowlan@cumin1002"
  • 13:53 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on deploy1003.eqiad.wmnet with reason: host reimage
  • 13:49 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on krb1002.eqiad.wmnet with reason: host reimage
  • 13:49 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2298 to wikikube-worker2025 - hnowlan@cumin1002"
  • 13:46 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=93) from mw2300 to wikikube-worker2026
  • 13:46 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw2300 to wikikube-worker2026
  • 13:46 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 13:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1027.eqiad.wmnet with reason: host reimage
  • 13:45 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw2298 to wikikube-worker2025
  • 13:45 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on krb1002.eqiad.wmnet with reason: host reimage
  • 13:43 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1027.eqiad.wmnet with reason: host reimage
  • 13:42 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 13:42 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 13:42 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 13:42 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host deploy1003.eqiad.wmnet with OS bullseye
  • 13:42 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 13:42 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 13:41 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 13:41 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host deploy1003.eqiad.wmnet with OS bookworm
  • 13:41 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - akosiaris@cumin1002"
  • 13:38 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 13:38 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 13:38 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 13:37 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 13:32 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host krb1002.eqiad.wmnet with OS bookworm
  • 13:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1027.eqiad.wmnet with OS bullseye
  • 13:29 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - akosiaris@cumin1002"
  • 13:28 hnowlan: running `decommission` on 5 codfw api appservers
  • 13:27 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1028.mgmt.eqiad.wmnet on all recursors
  • 13:27 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1028.mgmt.eqiad.wmnet on all recursors
  • 13:27 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1027.mgmt.eqiad.wmnet on all recursors
  • 13:27 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1027.mgmt.eqiad.wmnet on all recursors
  • 13:26 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1027.eqiad.wmnet with OS bullseye
  • 13:25 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:25 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: fix entries for wikikube-worker102[7-8] - cmooney@cumin1002"
  • 13:24 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: fix entries for wikikube-worker102[7-8] - cmooney@cumin1002"
  • 13:21 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 13:18 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1028.mgmt.eqiad.wmnet on all recursors
  • 13:18 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1028.mgmt.eqiad.wmnet on all recursors
  • 13:18 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1027.mgmt.eqiad.wmnet on all recursors
  • 13:18 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1027.mgmt.eqiad.wmnet on all recursors
  • 13:15 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on deploy1003.eqiad.wmnet with reason: host reimage
  • 13:12 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on deploy1003.eqiad.wmnet with reason: host reimage
  • 13:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1029.eqiad.wmnet with OS bullseye
  • 13:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1028.eqiad.wmnet with reason: mgmt ip issue
  • 13:05 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1028.eqiad.wmnet with reason: mgmt ip issue
  • 13:01 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host deploy1003.eqiad.wmnet with OS bookworm
  • 12:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T367856)', diff saved to https://phabricator.wikimedia.org/P65547 and previous config saved to /var/cache/conftool/dbconfig/20240628-125926-marostegui.json
  • 12:55 hashar@deploy1002: Finished deploy [gerrit/gerrit@0db053e]: Upgrade Gerrit 3.10.0-32-gf77960412e to 3.10.0-71-gf6e9431fff - T367029 T341291 (duration: 00m 09s)
  • 12:55 hashar@deploy1002: Started deploy [gerrit/gerrit@0db053e]: Upgrade Gerrit 3.10.0-32-gf77960412e to 3.10.0-71-gf6e9431fff - T367029 T341291
  • 12:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1029.eqiad.wmnet with reason: host reimage
  • 12:50 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1029.eqiad.wmnet with reason: host reimage
  • 12:48 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 12:48 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 12:45 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1027.eqiad.wmnet with OS bullseye
  • 12:44 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1027.eqiad.wmnet with OS bullseye
  • 12:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P65546 and previous config saved to /var/cache/conftool/dbconfig/20240628-124419-marostegui.json
  • 12:44 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-conf1004
  • 12:44 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-conf1004
  • 12:21 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-conf1004
  • 12:18 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:18 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for an-conf1005,6 - jclark@cumin1002"
  • 12:17 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for an-conf1005,6 - jclark@cumin1002"
  • 12:15 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 12:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T367856)', diff saved to https://phabricator.wikimedia.org/P65544 and previous config saved to /var/cache/conftool/dbconfig/20240628-121404-marostegui.json
  • 12:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1030.eqiad.wmnet with OS bullseye
  • 12:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1028.eqiad.wmnet with OS bullseye
  • 12:05 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1027.eqiad.wmnet with OS bullseye
  • 12:05 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1027.eqiad.wmnet with OS bullseye
  • 11:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1030.eqiad.wmnet with reason: host reimage
  • 11:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1031.eqiad.wmnet with OS bullseye
  • 11:51 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1030.eqiad.wmnet with reason: host reimage
  • 11:50 Dreamy_Jazz: Finished run on `medium.dblist`
  • 11:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1028.eqiad.wmnet with reason: host reimage
  • 11:45 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 11:45 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 11:45 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 11:44 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1028.eqiad.wmnet with reason: host reimage
  • 11:44 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1029.eqiad.wmnet with OS bullseye
  • 11:44 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 11:44 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1029.eqiad.wmnet with OS bullseye
  • 11:38 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1030.eqiad.wmnet with OS bullseye
  • 11:38 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1030.eqiad.wmnet with OS bullseye
  • 11:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1031.eqiad.wmnet with reason: host reimage
  • 11:31 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1031.eqiad.wmnet with reason: host reimage
  • 11:31 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1027.eqiad.wmnet with OS bullseye
  • 11:30 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1027.eqiad.wmnet with OS bullseye
  • 11:29 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@9b733de] (releasing): (no justification provided) (duration: 00m 44s)
  • 11:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1029.eqiad.wmnet with OS bullseye
  • 11:29 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1029.eqiad.wmnet with OS bullseye
  • 11:29 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@9b733de] (releasing): (no justification provided)
  • 11:26 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:24 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1028.eqiad.wmnet with OS bullseye
  • 11:23 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1028.eqiad.wmnet with OS bullseye
  • 11:18 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1031.eqiad.wmnet with OS bullseye
  • 11:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1031.eqiad.wmnet on all recursors
  • 11:18 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1031.eqiad.wmnet on all recursors
  • 11:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1450 to wikikube-worker1031
  • 11:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1031
  • 11:16 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1031
  • 11:16 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:16 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1450 to wikikube-worker1031 - cgoubert@cumin1002"
  • 11:16 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 11:15 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1028.eqiad.wmnet with OS bullseye
  • 11:15 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1028.eqiad.wmnet with OS bullseye
  • 11:14 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1450 to wikikube-worker1031 - cgoubert@cumin1002"
  • 11:13 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@9b733de] (releasing): (no justification provided) (duration: 00m 25s)
  • 11:13 Dreamy_Jazz: Running `foreachwikiindblist medium.dblist extensions/CheckUser/maintenance/deleteReadOldRowsInCuChanges.php --batch-size=200` for T366781. `medium.dblist` does not include `loginwiki` or `metawiki` (which are to be done later).
  • 11:12 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@9b733de] (releasing): (no justification provided)
  • 11:11 Dreamy_Jazz: `foreachwikiindblist group1-wikipedia.dblist extensions/CheckUser/maintenance/deleteReadOldRowsInCuChanges.php --batch-size=200` finished running
  • 11:11 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1030.eqiad.wmnet with OS bullseye
  • 11:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1030.eqiad.wmnet on all recursors
  • 11:11 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1030.eqiad.wmnet on all recursors
  • 11:09 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 11:09 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1450 to wikikube-worker1031
  • 11:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1418 to wikikube-worker1030
  • 11:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1030
  • 11:07 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1027.eqiad.wmnet with OS bullseye
  • 11:07 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1027.eqiad.wmnet with OS bullseye
  • 11:06 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
  • 11:04 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1030
  • 11:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1418 to wikikube-worker1030 - cgoubert@cumin1002"
  • 11:02 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1418 to wikikube-worker1030 - cgoubert@cumin1002"
  • 11:01 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1027.eqiad.wmnet with OS bullseye
  • 11:00 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 11:00 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1418 to wikikube-worker1030
  • 10:59 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1029.eqiad.wmnet with OS bullseye
  • 10:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1029.eqiad.wmnet on all recursors
  • 10:59 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1029.eqiad.wmnet on all recursors
  • 10:58 Dreamy_Jazz: Running `foreachwikiindblist group1-wikipedia.dblist extensions/CheckUser/maintenance/deleteReadOldRowsInCuChanges.php --batch-size=200`
  • 10:58 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1027.eqiad.wmnet with OS bullseye
  • 10:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1417 to wikikube-worker1029
  • 10:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1029
  • 10:56 Dreamy_Jazz: Stopped running script at `cawiki`
  • 10:56 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1029
  • 10:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1417 to wikikube-worker1029 - cgoubert@cumin1002"
  • 10:54 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1417 to wikikube-worker1029 - cgoubert@cumin1002"
  • 10:51 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:51 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1417 to wikikube-worker1029
  • 10:51 Dreamy_Jazz: Running `foreachwikiindblist group1.dblist extensions/CheckUser/maintenance/deleteReadOldRowsInCuChanges.php --batch-size=200`
  • 10:51 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1028.eqiad.wmnet with OS bullseye
  • 10:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1028.eqiad.wmnet on all recursors
  • 10:51 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1028.eqiad.wmnet on all recursors
  • 10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1413 to wikikube-worker1028
  • 10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1028
  • 10:49 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1028
  • 10:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1413 to wikikube-worker1028 - cgoubert@cumin1002"
  • 10:48 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1413 to wikikube-worker1028 - cgoubert@cumin1002"
  • 10:46 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:46 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1413 to wikikube-worker1028
  • 10:45 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1027.eqiad.wmnet with OS bullseye
  • 10:45 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1027.eqiad.wmnet on all recursors
  • 10:45 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1027.eqiad.wmnet on all recursors
  • 10:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1412 to wikikube-worker1027
  • 10:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1027
  • 10:44 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:44 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:44 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1027
  • 10:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1412 to wikikube-worker1027 - cgoubert@cumin1002"
  • 10:42 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1412 to wikikube-worker1027 - cgoubert@cumin1002"
  • 10:40 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1412 to wikikube-worker1027
  • 10:35 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:30 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 10:22 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:22 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 10:22 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 10:18 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 10:17 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 10:16 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 10:12 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 09:57 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-serve2007.codfw.wmnet
  • 09:57 klausman@cumin2002: START - Cookbook sre.hosts.remove-downtime for ml-serve2007.codfw.wmnet
  • 09:37 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:37 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:33 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:22 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 08:34 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
  • 08:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2204 (T367856)', diff saved to https://phabricator.wikimedia.org/P65543 and previous config saved to /var/cache/conftool/dbconfig/20240628-082946-marostegui.json
  • 08:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2204.codfw.wmnet with reason: Maintenance
  • 08:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2204.codfw.wmnet with reason: Maintenance
  • 07:54 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1015.eqiad.wmnet,service=s4
  • 02:30 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 02:30 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 02:27 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 02:23 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 02:23 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 02:16 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 02:14 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 02:04 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 02:00 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 02:00 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 02:00 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 02:00 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 01:44 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 01:44 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 01:39 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 01:39 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 01:37 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 01:37 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 00:52 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 00:51 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 00:45 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5024.eqsin.wmnet
  • 00:40 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5024.eqsin.wmnet with OS bullseye
  • 00:36 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 00:36 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 00:33 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 00:33 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 00:33 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 00:33 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 00:29 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 00:29 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 00:08 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5024.eqsin.wmnet with reason: host reimage
  • 00:06 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 00:06 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 00:06 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5024.eqsin.wmnet with reason: host reimage

2024-06-27

  • 23:55 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 23:55 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 23:33 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5024.eqsin.wmnet with OS bullseye
  • 23:33 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5024.eqsin.wmnet with OS bullseye
  • 23:32 eileen: civicrm upgraded from 76c6fed8 to f9782670
  • 23:24 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 23:24 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 23:19 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 23:18 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 23:17 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 23:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 23:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T367856)', diff saved to https://phabricator.wikimedia.org/P65542 and previous config saved to /var/cache/conftool/dbconfig/20240627-231703-marostegui.json
  • 23:05 Dreamy_Jazz: Running `foreachwikiindblist group0.dblist extensions/CheckUser/maintenance/deleteReadOldRowsInCuChanges.php` for T366781
  • 23:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P65541 and previous config saved to /var/cache/conftool/dbconfig/20240627-230156-marostegui.json
  • 22:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P65540 and previous config saved to /var/cache/conftool/dbconfig/20240627-224649-marostegui.json
  • 22:44 eileen: civicrm upgraded from 7747a290 to 76c6fed8
  • 22:43 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5024.eqsin.wmnet with OS bullseye
  • 22:43 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 22:42 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 22:41 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 22:41 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 22:39 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 22:37 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 22:37 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 22:34 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 22:34 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 22:33 eileen: civicrm upgraded from 3af41401 to 7747a290
  • 22:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T367856)', diff saved to https://phabricator.wikimedia.org/P65539 and previous config saved to /var/cache/conftool/dbconfig/20240627-223142-marostegui.json
  • 22:29 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 22:21 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 22:19 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5024.eqsin.wmnet
  • 22:13 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 22:13 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 22:09 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 22:09 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 22:05 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5023.eqsin.wmnet
  • 22:05 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 22:02 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5023.eqsin.wmnet with OS bullseye
  • 21:58 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 21:58 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 21:55 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 21:55 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 21:55 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 21:55 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 21:54 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 21:54 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 21:53 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 21:53 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 21:50 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 21:50 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 21:50 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 21:50 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 21:32 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 21:31 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 21:28 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5023.eqsin.wmnet with reason: host reimage
  • 21:25 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5023.eqsin.wmnet with reason: host reimage
  • 21:22 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 21:22 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 20:53 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-conf1005
  • 20:51 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-conf1005
  • 20:50 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:50 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-conf1005 - vriley@cumin1002"
  • 20:50 jhuneidi@deploy1002: Finished scap: Backport for testwiki: Enable QuickSurveys (T368459) (duration: 14m 33s)
  • 20:49 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-conf1005 - vriley@cumin1002"
  • 20:47 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 20:44 jhuneidi@deploy1002: kharlan, jhuneidi: Continuing with sync
  • 20:40 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:40 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:39 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5023.eqsin.wmnet with OS bullseye
  • 20:38 jhuneidi@deploy1002: kharlan, jhuneidi: Backport for testwiki: Enable QuickSurveys (T368459) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:37 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:37 otto@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:35 jhuneidi@deploy1002: Started scap: Backport for testwiki: Enable QuickSurveys (T368459)
  • 20:34 jhuneidi@deploy1002: Finished scap: Backport for QuickSurveys: Add testing survey configuration (T368459) (duration: 14m 45s)
  • 20:29 jhuneidi@deploy1002: kharlan, jhuneidi: Continuing with sync
  • 20:24 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:24 otto@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:22 jhuneidi@deploy1002: kharlan, jhuneidi: Backport for QuickSurveys: Add testing survey configuration (T368459) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:21 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:21 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:20 jhuneidi@deploy1002: Started scap: Backport for QuickSurveys: Add testing survey configuration (T368459)
  • 20:17 jhuneidi@deploy1002: Finished scap: Backport for Enable DiscussionTools permalinks on enwiki (T365974) (duration: 11m 09s)
  • 20:16 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5023.eqsin.wmnet
  • 20:11 jhuneidi@deploy1002: jhuneidi, kemayo: Continuing with sync
  • 20:08 jhuneidi@deploy1002: jhuneidi, kemayo: Backport for Enable DiscussionTools permalinks on enwiki (T365974) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:06 jhuneidi@deploy1002: Started scap: Backport for Enable DiscussionTools permalinks on enwiki (T365974)
  • 20:03 bking@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:03 bking@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:55 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5022.eqsin.wmnet
  • 19:53 ottomata: deleted mw-page-content-change-enrich stuck jobmanager pod: kubectl -n mw-page-content-change-enrich delete pod flink-app-main-859d98c57b-zrgwk - T368667
  • 19:51 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1059.eqiad.wmnet with OS bookworm
  • 19:48 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5022.eqsin.wmnet with OS bullseye
  • 19:33 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1064.eqiad.wmnet with OS bookworm
  • 19:27 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1059.eqiad.wmnet with reason: host reimage
  • 19:23 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1059.eqiad.wmnet with reason: host reimage
  • 19:14 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5022.eqsin.wmnet with reason: host reimage
  • 19:10 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1064.eqiad.wmnet with reason: host reimage
  • 19:09 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5022.eqsin.wmnet with reason: host reimage
  • 19:07 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1059.eqiad.wmnet with OS bookworm
  • 19:07 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1064.eqiad.wmnet with reason: host reimage
  • 19:00 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 19:00 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 19:00 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 18:52 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bookworm
  • 18:36 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5022.eqsin.wmnet with OS bullseye
  • 18:36 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5022.eqsin.wmnet with OS bullseye
  • 18:19 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 18:19 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 18:19 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5022.eqsin.wmnet with OS bullseye
  • 18:19 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Server swap — T362033
  • 18:18 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Server swap — T362033
  • 18:15 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 18:14 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1067.eqiad.wmnet with OS bookworm
  • 18:12 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 18:12 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 18:12 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.43.0-wmf.11 refs T366956
  • 18:12 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1065.eqiad.wmnet with OS bookworm
  • 18:11 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 18:11 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 18:10 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5022.eqsin.wmnet
  • 18:08 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1066.eqiad.wmnet with OS bookworm
  • 18:08 ejegg: fundraising civicrm upgraded from 13a13f3a to 43fc2c89
  • 18:04 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1058.eqiad.wmnet with OS bookworm
  • 17:59 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1057.eqiad.wmnet with OS bookworm
  • 17:51 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5021.eqsin.wmnet
  • 17:50 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1067.eqiad.wmnet with reason: host reimage
  • 17:47 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1065.eqiad.wmnet with reason: host reimage
  • 17:45 swfrench@deploy1002: Finished scap: Deploying securityContext changes for T362978 to main release (duration: 04m 09s)
  • 17:43 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1066.eqiad.wmnet with reason: host reimage
  • 17:41 swfrench@deploy1002: Started scap: Deploying securityContext changes for T362978 to main release
  • 17:39 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1058.eqiad.wmnet with reason: host reimage
  • 17:37 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 17:37 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 17:36 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1057.eqiad.wmnet with reason: host reimage
  • 17:35 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 17:35 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 17:34 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1058.eqiad.wmnet with reason: host reimage
  • 17:34 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirtlocal1003.eqiad.wmnet with OS bookworm
  • 17:34 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1066.eqiad.wmnet with reason: host reimage
  • 17:34 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1067.eqiad.wmnet with reason: host reimage
  • 17:33 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1065.eqiad.wmnet with reason: host reimage
  • 17:33 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 17:33 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1057.eqiad.wmnet with reason: host reimage
  • 17:33 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 17:33 swfrench-wmf: canary deployments are healthy, slow-logs still produced, continuing with main deployments for T362978
  • 17:33 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 17:33 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 17:32 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 17:32 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 17:26 hashar@deploy1002: Finished deploy [gerrit/gerrit@7659481]: Revert "Add image-diff JavaScript plugin (take 2)" (duration: 00m 07s)
  • 17:26 hashar@deploy1002: Started deploy [gerrit/gerrit@7659481]: Revert "Add image-diff JavaScript plugin (take 2)"
  • 17:26 hashar@deploy1002: deploy aborted: Revert Add image-diff JavaScript plugin (take 2) (duration: 00m 00s)
  • 17:26 hashar@deploy1002: Started deploy [gerrit/gerrit@7659481]: Revert Add image-diff JavaScript plugin (take 2)
  • 17:23 swfrench@deploy1002: Finished scap: (no justification provided) (duration: 08m 03s)
  • 17:19 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bookworm
  • 17:19 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1066.eqiad.wmnet with OS bookworm
  • 17:19 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1065.eqiad.wmnet with OS bookworm
  • 17:18 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1058.eqiad.wmnet with OS bookworm
  • 17:17 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1057.eqiad.wmnet with OS bookworm
  • 17:14 swfrench@deploy1002: Started scap: (no justification provided)
  • 17:13 hashar@deploy1002: Finished deploy [gerrit/gerrit@8c6ae73]: Add image-diff JavaScript plugin (take 2) - T341291 (duration: 00m 07s)
  • 17:13 hashar@deploy1002: Started deploy [gerrit/gerrit@8c6ae73]: Add image-diff JavaScript plugin (take 2) - T341291
  • 17:09 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirtlocal1003.eqiad.wmnet with reason: host reimage
  • 17:06 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1003.eqiad.wmnet with reason: host reimage
  • 17:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5021.eqsin.wmnet with OS bullseye
  • 16:55 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
  • 16:54 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
  • 16:50 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1003.eqiad.wmnet with OS bookworm
  • 16:36 arnaudb@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 100%: post T365988 repool', diff saved to https://phabricator.wikimedia.org/P65537 and previous config saved to /var/cache/conftool/dbconfig/20240627-163635-arnaudb.json
  • 16:35 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker1022.eqiad.wmnet|wikikube-worker1023.eqiad.wmnet|wikikube-worker1024.eqiad.wmnet|wikikube-worker1025.eqiad.wmnet|wikikube-worker1026.eqiad.wmnet),cluster=kubernetes,service=kubesvc
  • 16:35 claime: Pooling and uncordoning wikikube-worker1022.eqiad.wmnet,wikikube-worker1023.eqiad.wmnet,wikikube-worker1024.eqiad.wmnet,wikikube-worker1025.eqiad.wmnet,wikikube-worker1026.eqiad.wmnet - T351074
  • 16:32 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5021.eqsin.wmnet with reason: host reimage
  • 16:29 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5021.eqsin.wmnet with reason: host reimage
  • 16:27 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirtlocal1002.eqiad.wmnet with OS bookworm
  • 16:21 arnaudb@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 75%: post T365988 repool', diff saved to https://phabricator.wikimedia.org/P65536 and previous config saved to /var/cache/conftool/dbconfig/20240627-162129-arnaudb.json
  • 16:18 claime: homer 'cr*eqiad*' commit 'T351074'
  • 16:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1026.eqiad.wmnet with OS bullseye
  • 16:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1025.eqiad.wmnet with OS bullseye
  • 16:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1023.eqiad.wmnet with OS bullseye
  • 16:06 arnaudb@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 50%: post T365988 repool', diff saved to https://phabricator.wikimedia.org/P65535 and previous config saved to /var/cache/conftool/dbconfig/20240627-160624-arnaudb.json
  • 16:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1024.eqiad.wmnet with OS bullseye
  • 16:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1022.eqiad.wmnet with OS bullseye
  • 16:01 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirtlocal1002.eqiad.wmnet with reason: host reimage
  • 16:00 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: sync
  • 16:00 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: sync
  • 15:58 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1002.eqiad.wmnet with reason: host reimage
  • 15:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5021.eqsin.wmnet with OS bullseye
  • 15:56 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5021.eqsin.wmnet with OS bullseye
  • 15:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy2005
  • 15:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy2005
  • 15:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1026.eqiad.wmnet with reason: host reimage
  • 15:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1025.eqiad.wmnet with reason: host reimage
  • 15:51 arnaudb@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 25%: post T365988 repool', diff saved to https://phabricator.wikimedia.org/P65534 and previous config saved to /var/cache/conftool/dbconfig/20240627-155118-arnaudb.json
  • 15:48 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1023.eqiad.wmnet with reason: host reimage
  • 15:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1024.eqiad.wmnet with reason: host reimage
  • 15:44 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1025.eqiad.wmnet with reason: host reimage
  • 15:43 hnowlan: restarted ferm on 8 failing k8s workers
  • 15:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1022.eqiad.wmnet with reason: host reimage
  • 15:42 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1026.eqiad.wmnet with reason: host reimage
  • 15:42 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1024.eqiad.wmnet with reason: host reimage
  • 15:41 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1002.eqiad.wmnet with OS bookworm
  • 15:41 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1023.eqiad.wmnet with reason: host reimage
  • 15:40 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1022.eqiad.wmnet with reason: host reimage
  • 15:36 arnaudb@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 10%: post T365988 repool', diff saved to https://phabricator.wikimedia.org/P65533 and previous config saved to /var/cache/conftool/dbconfig/20240627-153613-arnaudb.json
  • 15:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1025.eqiad.wmnet with OS bullseye
  • 15:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1025.eqiad.wmnet on all recursors
  • 15:29 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1025.eqiad.wmnet on all recursors
  • 15:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1373 to wikikube-worker1025
  • 15:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1025
  • 15:28 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1026.eqiad.wmnet with OS bullseye
  • 15:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1026.eqiad.wmnet on all recursors
  • 15:28 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1026.eqiad.wmnet on all recursors
  • 15:27 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1025
  • 15:27 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:27 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1373 to wikikube-worker1025 - cgoubert@cumin1002"
  • 15:27 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1024.eqiad.wmnet with OS bullseye
  • 15:27 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1024.eqiad.wmnet on all recursors
  • 15:27 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1024.eqiad.wmnet on all recursors
  • 15:26 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1023.eqiad.wmnet with OS bullseye
  • 15:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1023.eqiad.wmnet on all recursors
  • 15:26 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1023.eqiad.wmnet on all recursors
  • 15:25 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1022.eqiad.wmnet with OS bullseye
  • 15:25 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1022.eqiad.wmnet on all recursors
  • 15:25 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1022.eqiad.wmnet on all recursors
  • 15:25 pmiazga: T367901 mwmaint1002: Ran `mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=rowiki --logwiki=metawiki 'Rui_Filipe_Fernandes' '44_Gabriel’`
  • 15:24 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1373 to wikikube-worker1025 - cgoubert@cumin1002"
  • 15:23 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1007.eqiad.wmnet
  • 15:21 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 15:21 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1373 to wikikube-worker1025
  • 15:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1366 to wikikube-worker1024
  • 15:21 arnaudb@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 5%: post T365988 repool', diff saved to https://phabricator.wikimedia.org/P65532 and previous config saved to /var/cache/conftool/dbconfig/20240627-152107-arnaudb.json
  • 15:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1024
  • 15:20 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1024
  • 15:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1366 to wikikube-worker1024 - cgoubert@cumin1002"
  • 15:19 pmiazga: T368451 mwmaint1002: Ran `mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki 'Agustín_Antonio_Cardozo' 'Agustín_Cardozo_Cabrera’
  • 15:18 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1366 to wikikube-worker1024 - cgoubert@cumin1002"
  • 15:17 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1007.eqiad.wmnet
  • 15:16 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 15:16 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1366 to wikikube-worker1024
  • 15:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1404 to wikikube-worker1026
  • 15:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1026
  • 15:14 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1026
  • 15:14 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:14 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1404 to wikikube-worker1026 - cgoubert@cumin1002"
  • 15:12 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1404 to wikikube-worker1026 - cgoubert@cumin1002"
  • 15:10 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 15:10 cgoubert@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 15:10 hashar@deploy1002: Finished deploy [gerrit/gerrit@8c94fee]: Revert "Add image-diff JavaScript plugin" (duration: 00m 07s)
  • 15:09 hashar@deploy1002: Started deploy [gerrit/gerrit@8c94fee]: Revert "Add image-diff JavaScript plugin"
  • 15:09 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1373.eqiad.wmnet
  • 15:09 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1373.eqiad.wmnet
  • 15:08 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 15:08 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1404 to wikikube-worker1026
  • 15:04 hashar@deploy1002: Finished deploy [gerrit/gerrit@9652bc3]: Add image-diff JavaScript plugin - T341291 (duration: 00m 07s)
  • 15:04 hashar@deploy1002: Started deploy [gerrit/gerrit@9652bc3]: Add image-diff JavaScript plugin - T341291
  • 15:03 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1007.eqiad.wmnet with OS bullseye
  • 15:02 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@483e8c3] (codfw): Bump kartotherian src to latest master (duration: 02m 49s)
  • 15:00 topranks: rebooting lsw1-e7-eqiad to upgrade JunOS on switch T365988
  • 15:00 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1366.eqiad.wmnet
  • 15:00 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1366.eqiad.wmnet
  • 15:00 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@483e8c3] (codfw): Bump kartotherian src to latest master
  • 14:59 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@483e8c3] (eqiad): Bump kartotherian src to latest master (duration: 03m 10s)
  • 14:58 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on an-worker[1163-1165].eqiad.wmnet,es1037.eqiad.wmnet,ms-be1078.eqiad.wmnet with reason: JunOS upgrade lsw1-e7-eqiad
  • 14:58 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on an-worker[1163-1165].eqiad.wmnet,es1037.eqiad.wmnet,ms-be1078.eqiad.wmnet with reason: JunOS upgrade lsw1-e7-eqiad
  • 14:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1365 to wikikube-worker1023
  • 14:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1023
  • 14:57 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-e7-eqiad,lsw1-e7-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e7-eqiad
  • 14:57 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5021.eqsin.wmnet with OS bullseye
  • 14:57 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-e7-eqiad,lsw1-e7-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e7-eqiad
  • 14:56 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@483e8c3] (eqiad): Bump kartotherian src to latest master
  • 14:56 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1023
  • 14:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1365 to wikikube-worker1023 - cgoubert@cumin1002"
  • 14:54 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1365 to wikikube-worker1023 - cgoubert@cumin1002"
  • 14:53 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host deploy1003.eqiad.wmnet with OS bullseye
  • 14:52 brennen@deploy1002: Finished deploy [phabricator/deployment@0df351e]: deploy phab1004 for minor update (duration: 00m 32s)
  • 14:52 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:52 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1365 to wikikube-worker1023
  • 14:52 brennen@deploy1002: Started deploy [phabricator/deployment@0df351e]: deploy phab1004 for minor update
  • 14:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1359 to wikikube-worker1022
  • 14:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1022
  • 14:51 brennen@deploy1002: Finished deploy [phabricator/deployment@0df351e]: test deploy phab2002 (duration: 00m 34s)
  • 14:50 brennen@deploy1002: Started deploy [phabricator/deployment@0df351e]: test deploy phab2002
  • 14:50 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1022
  • 14:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1359 to wikikube-worker1022 - cgoubert@cumin1002"
  • 14:48 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1359 to wikikube-worker1022 - cgoubert@cumin1002"
  • 14:46 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-e7-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e7-eqiad
  • 14:46 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1007.eqiad.wmnet with reason: host reimage
  • 14:46 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-e7-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e7-eqiad
  • 14:46 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:46 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1359 to wikikube-worker1022
  • 14:43 dcaro@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1007.eqiad.wmnet with reason: host reimage
  • 14:38 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1037.eqiad.wmnet with reason: T365988
  • 14:38 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on es1037.eqiad.wmnet with reason: T365988
  • 14:37 arnaudb@cumin1002: dbctl commit (dc=all): 'T365988 - depool es1037', diff saved to https://phabricator.wikimedia.org/P65531 and previous config saved to /var/cache/conftool/dbconfig/20240627-143741-arnaudb.json
  • 14:15 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1015.eqiad.wmnet,service=s4
  • 14:12 dcaro@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1007.eqiad.wmnet with OS bullseye
  • 13:54 urbanecm@deploy1002: Finished scap: Backport for CommonSettings: Mark REL1_42 as stable (T359850) (duration: 08m 10s)
  • 13:48 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:46 urbanecm@deploy1002: Started scap: Backport for CommonSettings: Mark REL1_42 as stable (T359850)
  • 13:46 urbanecm@deploy1002: Finished scap: Backport for ptwiki: Enable CommunityConfiguration (T368310) (duration: 08m 58s)
  • 13:42 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirtlocal1001.eqiad.wmnet with OS bookworm
  • 13:41 urbanecm@deploy1002: urbanecm: Continuing with sync
  • 13:41 urbanecm: Run `mwscript extensions/GrowthExperiments/maintenance/migrateCommunityConfig.php --wiki=ptwiki --force` via mwdebug1001 (T368310)
  • 13:40 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host deploy1003
  • 13:39 urbanecm@deploy1002: urbanecm: Backport for ptwiki: Enable CommunityConfiguration (T368310) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:38 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host deploy1003
  • 13:37 urbanecm@deploy1002: Started scap: Backport for ptwiki: Enable CommunityConfiguration (T368310)
  • 13:36 urbanecm@deploy1002: Finished scap: Backport for Enable local uploads for Gilaki Wikipedia (T364673), [noop] Remove $wgRedirectScript, not used since MediaWiki 1.22, CommunityConfiguration: Log info and higher (duration: 10m 22s)
  • 13:35 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:31 urbanecm@deploy1002: urbanecm, tgr, nmw03: Continuing with sync
  • 13:28 urbanecm@deploy1002: urbanecm, tgr, nmw03: Backport for Enable local uploads for Gilaki Wikipedia (T364673), [noop] Remove $wgRedirectScript, not used since MediaWiki 1.22, CommunityConfiguration: Log info and higher synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:25 urbanecm@deploy1002: Started scap: Backport for Enable local uploads for Gilaki Wikipedia (T364673), [noop] Remove $wgRedirectScript, not used since MediaWiki 1.22, CommunityConfiguration: Log info and higher
  • 13:24 urbanecm@deploy1002: Finished scap: Backport for [CheckUser] Stop writing old for event tables migration on all wikis (T360685), testwiki: use shellbox-video for scaling video (T356241), Add VK namespace alias to Azerbaijani Wikibooks (T368237) (duration: 16m 48s)
  • 13:19 urbanecm@deploy1002: urbanecm, dreamrimmer, hnowlan, dreamyjazz: Continuing with sync
  • 13:12 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
  • 13:10 urbanecm@deploy1002: urbanecm, dreamrimmer, hnowlan, dreamyjazz: Backport for [CheckUser] Stop writing old for event tables migration on all wikis (T360685), testwiki: use shellbox-video for scaling video (T356241), Add VK namespace alias to Azerbaijani Wikibooks (T368237) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:08 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
  • 13:08 urbanecm@deploy1002: Started scap: Backport for [CheckUser] Stop writing old for event tables migration on all wikis (T360685), testwiki: use shellbox-video for scaling video (T356241), Add VK namespace alias to Azerbaijani Wikibooks (T368237)
  • 13:02 sukhe: A:dnsbox: remove 10.3.0.2/32 from /e/n/i
  • 12:52 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bookworm
  • 12:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2189 (T367856)', diff saved to https://phabricator.wikimedia.org/P65529 and previous config saved to /var/cache/conftool/dbconfig/20240627-125019-marostegui.json
  • 12:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: Maintenance
  • 12:50 sukhe: sudo cumin 'A:dnsbox' 'rm /var/lib/dnsbox/ntp.state': remove obsolete ntp.state file
  • 12:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: Maintenance
  • 12:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T367856)', diff saved to https://phabricator.wikimedia.org/P65528 and previous config saved to /var/cache/conftool/dbconfig/20240627-124957-marostegui.json
  • 12:48 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ml-serve2007.codfw.wmnet with reason: Hardware maintenance for memory errors
  • 12:48 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ml-serve2007.codfw.wmnet with reason: Hardware maintenance for memory errors
  • 12:42 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host deploy1003.eqiad.wmnet with OS bullseye
  • 12:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 12:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 12:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T364069)', diff saved to https://phabricator.wikimedia.org/P65527 and previous config saved to /var/cache/conftool/dbconfig/20240627-123805-marostegui.json
  • 12:22 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 12:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P65524 and previous config saved to /var/cache/conftool/dbconfig/20240627-121942-marostegui.json
  • 12:17 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 12:15 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 12:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:12 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 12:12 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 12:12 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 12:12 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 12:11 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 12:10 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 12:10 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 12:10 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 12:08 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 12:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P65523 and previous config saved to /var/cache/conftool/dbconfig/20240627-120751-marostegui.json
  • 12:07 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 12:07 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 12:07 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 12:07 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 12:07 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 12:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T367856)', diff saved to https://phabricator.wikimedia.org/P65522 and previous config saved to /var/cache/conftool/dbconfig/20240627-120435-marostegui.json
  • 12:00 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 12:00 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 11:59 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 11:59 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 11:59 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 11:58 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 11:53 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 11:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T364069)', diff saved to https://phabricator.wikimedia.org/P65521 and previous config saved to /var/cache/conftool/dbconfig/20240627-115244-marostegui.json
  • 11:51 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 11:51 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 11:49 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 11:49 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 11:48 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 11:47 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 11:47 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 11:46 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 11:46 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 11:46 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 11:46 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 11:42 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 11:42 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 11:41 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 11:41 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 11:41 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 11:40 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 11:40 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 11:40 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 11:40 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 11:39 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 11:39 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 11:39 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 11:39 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 11:38 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 11:38 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 11:38 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 11:36 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 11:36 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 11:35 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 11:35 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 11:35 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 11:35 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 11:35 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 11:34 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 11:34 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 11:34 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 11:34 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 11:33 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 11:33 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 11:32 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 11:32 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 11:30 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 11:29 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 11:29 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 11:28 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 11:28 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 11:28 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 11:28 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 11:28 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 11:27 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 11:27 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 11:27 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 11:27 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 11:26 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 11:25 cgoubert@deploy1002: Finished scap: Deploy new prometheus-php-fpm-exporter, prometheus-apache-exporter - T283861 (duration: 06m 17s)
  • 11:24 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idp-test1004.wikimedia.org with OS bookworm
  • 11:19 cgoubert@deploy1002: Started scap: Deploy new prometheus-php-fpm-exporter, prometheus-apache-exporter - T283861
  • 11:17 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:17 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:14 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:14 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:13 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:13 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:13 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:13 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:12 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:07 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idp-test1004.wikimedia.org with reason: host reimage
  • 11:03 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on idp-test1004.wikimedia.org with reason: host reimage
  • 11:00 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:00 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:00 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 10:51 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 10:51 claime: Deploying new prometheus-php-fpm-exporter, prometheus-apache-exporter to mw-on-k8s and shellbox - T283861
  • 10:49 slyngshede@cumin1002: START - Cookbook sre.hosts.reimage for host idp-test1004.wikimedia.org with OS bookworm
  • 10:48 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idp-test2004.wikimedia.org
  • 10:48 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idp-test2004.wikimedia.org with OS bookworm
  • 10:43 fabfur: re-enabling puppet on A:cp-text_ulsfo (reverted https://gerrit.wikimedia.org/r/c/operations/puppet/+/1050297) (T365718)
  • 10:38 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
  • 10:30 fabfur: correcting previous statement: puppet disabled just on A:cp-text_ulsfo
  • 10:28 fabfur: disable puppet on A:cp-ulsfo to apply selectively https://gerrit.wikimedia.org/r/c/operations/puppet/+/1050258 (T365718)
  • 10:24 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 10:24 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 10:20 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 10:04 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 10:04 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 09:42 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 09:41 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 09:40 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idp-test2004.wikimedia.org with reason: host reimage
  • 09:38 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on idp-test2004.wikimedia.org with reason: host reimage
  • 09:21 slyngshede@cumin1002: START - Cookbook sre.hosts.reimage for host idp-test2004.wikimedia.org with OS bookworm
  • 09:14 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp-test2004.wikimedia.org - slyngshede@cumin1002"
  • 09:13 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp-test2004.wikimedia.org - slyngshede@cumin1002"
  • 09:13 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idp-test2004.wikimedia.org on all recursors
  • 09:13 slyngshede@cumin1002: START - Cookbook sre.dns.wipe-cache idp-test2004.wikimedia.org on all recursors
  • 09:13 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:13 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp-test2004.wikimedia.org - slyngshede@cumin1002"
  • 09:12 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp-test2004.wikimedia.org - slyngshede@cumin1002"
  • 09:09 slyngshede@cumin1002: START - Cookbook sre.dns.netbox
  • 09:09 slyngshede@cumin1002: START - Cookbook sre.ganeti.makevm for new host idp-test2004.wikimedia.org
  • 09:04 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti1019.eqiad.wmnet
  • 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1019.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 09:01 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1019.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:59 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:54 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti1019.eqiad.wmnet
  • 08:48 slyngshede@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host idp-test1004.wikimedia.org
  • 08:48 slyngshede@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host idp-test1004.wikimedia.org with OS bookworm
  • 08:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1226 (T364069)', diff saved to https://phabricator.wikimedia.org/P65518 and previous config saved to /var/cache/conftool/dbconfig/20240627-084043-marostegui.json
  • 08:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance
  • 08:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance
  • 08:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts irc1001.wikimedia.org
  • 08:37 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:37 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:36 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:33 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:27 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts irc1001.wikimedia.org
  • 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts irc2001.wikimedia.org
  • 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc2001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:20 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
  • 08:10 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1025 at 100% weight T363812', diff saved to https://phabricator.wikimedia.org/P65517 and previous config saved to /var/cache/conftool/dbconfig/20240627-081044-jynus.json
  • 08:10 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1022 at 100% weight T363812', diff saved to https://phabricator.wikimedia.org/P65516 and previous config saved to /var/cache/conftool/dbconfig/20240627-081016-jynus.json
  • 08:08 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc2001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:04 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 07:59 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1025 at 50% weight T363812', diff saved to https://phabricator.wikimedia.org/P65515 and previous config saved to /var/cache/conftool/dbconfig/20240627-075944-jynus.json
  • 07:57 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 07:57 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 07:56 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1022 at 50% weight T363812', diff saved to https://phabricator.wikimedia.org/P65514 and previous config saved to /var/cache/conftool/dbconfig/20240627-075620-jynus.json
  • 07:54 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1025 at 10% weight T363812', diff saved to https://phabricator.wikimedia.org/P65513 and previous config saved to /var/cache/conftool/dbconfig/20240627-075447-jynus.json
  • 07:50 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 07:45 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1022 at 10% weight T363812', diff saved to https://phabricator.wikimedia.org/P65512 and previous config saved to /var/cache/conftool/dbconfig/20240627-074542-jynus.json
  • 07:37 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 07:36 kartik@deploy1002: Finished scap: Backport for Add Metrics Platform stream configuration and registration for MinT for Wikipedia Readers feature by Language and Product Localization team. (T368028) (duration: 08m 42s)
  • 07:36 slyngshede@cumin1002: START - Cookbook sre.hosts.reimage for host idp-test1004.wikimedia.org with OS bookworm
  • 07:35 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp-test1004.wikimedia.org - slyngshede@cumin1002"
  • 07:34 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp-test1004.wikimedia.org - slyngshede@cumin1002"
  • 07:34 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idp-test1004.wikimedia.org on all recursors
  • 07:34 slyngshede@cumin1002: START - Cookbook sre.dns.wipe-cache idp-test1004.wikimedia.org on all recursors
  • 07:33 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:33 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp-test1004.wikimedia.org - slyngshede@cumin1002"
  • 07:33 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts irc2001.wikimedia.org
  • 07:32 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp-test1004.wikimedia.org - slyngshede@cumin1002"
  • 07:31 kartik@deploy1002: kcvelaga, kartik: Continuing with sync
  • 07:30 kartik@deploy1002: kcvelaga, kartik: Backport for Add Metrics Platform stream configuration and registration for MinT for Wikipedia Readers feature by Language and Product Localization team. (T368028) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:27 kartik@deploy1002: Started scap: Backport for Add Metrics Platform stream configuration and registration for MinT for Wikipedia Readers feature by Language and Product Localization team. (T368028)
  • 07:24 slyngshede@cumin1002: START - Cookbook sre.dns.netbox
  • 07:24 slyngshede@cumin1002: START - Cookbook sre.ganeti.makevm for new host idp-test1004.wikimedia.org
  • 07:18 kartik@deploy1002: Finished scap: Backport for Enable MinT for Wikipedia readers MVP on a set of pilot wikis (T363465) (duration: 14m 19s)
  • 07:13 kartik@deploy1002: kartik: Continuing with sync
  • 07:06 kartik@deploy1002: kartik: Backport for Enable MinT for Wikipedia readers MVP on a set of pilot wikis (T363465) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:04 kartik@deploy1002: Started scap: Backport for Enable MinT for Wikipedia readers MVP on a set of pilot wikis (T363465)
  • 06:45 arnaudb@cumin1002: dbctl commit (dc=all): 'weight es1038 T368401', diff saved to https://phabricator.wikimedia.org/P65510 and previous config saved to /var/cache/conftool/dbconfig/20240627-064506-arnaudb.json
  • 06:40 arnaudb@deploy1002: Finished scap: Backport for Revert "mariadb: disable writes on es6" (duration: 07m 43s)
  • 06:35 arnaudb@deploy1002: arnaudb: Continuing with sync
  • 06:35 arnaudb@deploy1002: arnaudb: Backport for Revert "mariadb: disable writes on es6" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 06:32 arnaudb@deploy1002: Started scap: Backport for Revert "mariadb: disable writes on es6"
  • 06:23 arnaudb@cumin1002: dbctl commit (dc=all): 'weight es1037 T368401', diff saved to https://phabricator.wikimedia.org/P65509 and previous config saved to /var/cache/conftool/dbconfig/20240627-062338-arnaudb.json
  • 06:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote es1038 to es6 primary T368401', diff saved to https://phabricator.wikimedia.org/P65508 and previous config saved to /var/cache/conftool/dbconfig/20240627-061639-arnaudb.json
  • 06:15 arnaudb: Starting es6 eqiad failover from es1037 to es1038 - T368401
  • 06:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Set es1038 with weight 0 T368401', diff saved to https://phabricator.wikimedia.org/P65507 and previous config saved to /var/cache/conftool/dbconfig/20240627-061055-arnaudb.json
  • 06:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es6 T368401
  • 06:10 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es6 T368401
  • 06:09 arnaudb@deploy1002: Finished scap: Backport for mariadb: disable writes on es6 (T368401) (duration: 08m 00s)
  • 06:04 arnaudb@deploy1002: arnaudb: Continuing with sync
  • 06:04 arnaudb@deploy1002: arnaudb: Backport for mariadb: disable writes on es6 (T368401) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 06:01 arnaudb@deploy1002: Started scap: Backport for mariadb: disable writes on es6 (T368401)
  • 03:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 03:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 03:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T364069)', diff saved to https://phabricator.wikimedia.org/P65506 and previous config saved to /var/cache/conftool/dbconfig/20240627-035544-marostegui.json
  • 03:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P65505 and previous config saved to /var/cache/conftool/dbconfig/20240627-034037-marostegui.json
  • 03:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P65504 and previous config saved to /var/cache/conftool/dbconfig/20240627-032530-marostegui.json
  • 03:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T364069)', diff saved to https://phabricator.wikimedia.org/P65503 and previous config saved to /var/cache/conftool/dbconfig/20240627-031023-marostegui.json
  • 00:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2175 (T367856)', diff saved to https://phabricator.wikimedia.org/P65502 and previous config saved to /var/cache/conftool/dbconfig/20240627-005613-marostegui.json
  • 00:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 00:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 00:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T367856)', diff saved to https://phabricator.wikimedia.org/P65501 and previous config saved to /var/cache/conftool/dbconfig/20240627-005549-marostegui.json
  • 00:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P65500 and previous config saved to /var/cache/conftool/dbconfig/20240627-004042-marostegui.json
  • 00:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P65499 and previous config saved to /var/cache/conftool/dbconfig/20240627-002535-marostegui.json
  • 00:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T367856)', diff saved to https://phabricator.wikimedia.org/P65498 and previous config saved to /var/cache/conftool/dbconfig/20240627-001028-marostegui.json

2024-06-26

  • 23:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5021.eqsin.wmnet with OS bullseye
  • 23:26 mutante: people1004 - stopped confd which logs every 3 seconds that it can't find any templates (T356296)
  • 23:23 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5021.eqsin.wmnet with reason: host reimage
  • 23:20 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5021.eqsin.wmnet with reason: host reimage
  • 23:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1214 (T364069)', diff saved to https://phabricator.wikimedia.org/P65497 and previous config saved to /var/cache/conftool/dbconfig/20240626-231020-marostegui.json
  • 23:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 23:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 23:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T364069)', diff saved to https://phabricator.wikimedia.org/P65496 and previous config saved to /var/cache/conftool/dbconfig/20240626-230958-marostegui.json
  • 22:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P65495 and previous config saved to /var/cache/conftool/dbconfig/20240626-225451-marostegui.json
  • 22:47 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5021.eqsin.wmnet with OS bullseye
  • 22:41 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5021.eqsin.wmnet
  • 22:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P65494 and previous config saved to /var/cache/conftool/dbconfig/20240626-223944-marostegui.json
  • 22:26 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5020.eqsin.wmnet
  • 22:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T364069)', diff saved to https://phabricator.wikimedia.org/P65493 and previous config saved to /var/cache/conftool/dbconfig/20240626-222434-marostegui.json
  • 22:22 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5020.eqsin.wmnet with OS bullseye
  • 21:50 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5020.eqsin.wmnet with reason: host reimage
  • 21:46 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5020.eqsin.wmnet with reason: host reimage
  • 21:40 cjming: end of UTC late backport window
  • 21:38 cjming@deploy1002: Finished scap: Backport for Homepage: don't load yesterdays edits on desktop (T368405) (duration: 08m 48s)
  • 21:33 cjming@deploy1002: cjming, migr: Continuing with sync
  • 21:32 cjming@deploy1002: cjming, migr: Backport for Homepage: don't load yesterdays edits on desktop (T368405) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:29 cjming@deploy1002: Started scap: Backport for Homepage: don't load yesterdays edits on desktop (T368405)
  • 21:29 hashar: restarting CI Jenkins
  • 21:13 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
  • 21:13 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5020.eqsin.wmnet with OS bullseye
  • 21:05 cjming@deploy1002: Finished scap: Backport for Homepage: log rendering time for each module and each wiki (T368405) (duration: 14m 01s)
  • 20:59 eileen: config revision changed from 0b822cd3 to 994e7b81
  • 20:57 cjming@deploy1002: cjming, migr: Continuing with sync
  • 20:55 cjming@deploy1002: cjming, migr: Backport for Homepage: log rendering time for each module and each wiki (T368405) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:51 cjming@deploy1002: Started scap: Backport for Homepage: log rendering time for each module and each wiki (T368405)
  • 20:50 jdrewniak@deploy1002: Finished scap: Backport for Enable user pages and select special pages in dark mode (1.43.0-wmf.11) (T366364 T366375 T367375 T367581 T367582 T367583) (duration: 08m 09s)
  • 20:47 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
  • 20:45 jdrewniak@deploy1002: jdlrobson, jdrewniak: Continuing with sync
  • 20:45 jdrewniak@deploy1002: jdlrobson, jdrewniak: Backport for Enable user pages and select special pages in dark mode (1.43.0-wmf.11) (T366364 T366375 T367375 T367581 T367582 T367583) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:42 jdrewniak@deploy1002: Started scap: Backport for Enable user pages and select special pages in dark mode (1.43.0-wmf.11) (T366364 T366375 T367375 T367581 T367582 T367583)
  • 20:40 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 58s)
  • 20:33 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 07m 27s)
  • 20:28 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5020.eqsin.wmnet
  • 20:21 cjming@deploy1002: Finished scap: Backport for Update QuickSurvey coverage rate for Automoderator patroller workstream survey (T362969) (duration: 08m 46s)
  • 20:15 cjming@deploy1002: cjming, kgraessle: Continuing with sync
  • 20:14 cjming@deploy1002: cjming, kgraessle: Backport for Update QuickSurvey coverage rate for Automoderator patroller workstream survey (T362969) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:12 cjming@deploy1002: Started scap: Backport for Update QuickSurvey coverage rate for Automoderator patroller workstream survey (T362969)
  • 20:08 mutante: lists1001:/lib/systemd/system# rm wmf_auto_restart_apache2.* ; systemctl reset-failed - reaction to monitoring alert "FIRING: SystemdUnitFailed: wmf_auto_restart_apache2.service on lists1001:9100"
  • 20:08 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5019.eqsin.wmnet
  • 20:05 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5019.eqsin.wmnet with OS bullseye
  • 19:48 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.11 refs T366956
  • 19:40 jhathaway@deploy1002: Finished scap: (no justification provided) (duration: 02m 38s)
  • 19:39 jhathaway@deploy1002: Started scap: (no justification provided)
  • 19:33 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5019.eqsin.wmnet with reason: host reimage
  • 19:28 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5019.eqsin.wmnet with reason: host reimage
  • 19:18 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:16 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 19:11 ottomata: re-enabling varnishkafka-eventlogging and varnish /beacon/event handling on cache text nodes. /beacon/event/ redirects which breaks the MediaWikiPingback usage - T238230
  • 19:02 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.11 refs T366956
  • 18:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5019.eqsin.wmnet with OS bullseye
  • 18:55 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5019.eqsin.wmnet with OS bullseye
  • 18:26 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:26 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove ntp.anycast.wmnet - sukhe@cumin1002"
  • 18:25 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove ntp.anycast.wmnet - sukhe@cumin1002"
  • 18:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1211 (T364069)', diff saved to https://phabricator.wikimedia.org/P65490 and previous config saved to /var/cache/conftool/dbconfig/20240626-182355-marostegui.json
  • 18:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 18:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 18:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T364069)', diff saved to https://phabricator.wikimedia.org/P65489 and previous config saved to /var/cache/conftool/dbconfig/20240626-182333-marostegui.json
  • 18:23 sukhe@cumin1002: START - Cookbook sre.dns.netbox
  • 18:19 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5019.eqsin.wmnet with OS bullseye
  • 18:17 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.11 refs T366956
  • 18:14 sukhe: # etcdctl --username root --endpoints https://conf1007.eqiad.wmnet:4001 rmdir /conftool/v1/pools/${site}/dnsbox/ntp: T366360
  • 18:12 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5019.eqsin.wmnet
  • 18:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P65488 and previous config saved to /var/cache/conftool/dbconfig/20240626-180824-marostegui.json
  • 18:07 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@5121748]: Deploying latest DAGs to analytics Airflow instance. (duration: 00m 39s)
  • 18:06 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@5121748]: Deploying latest DAGs to analytics Airflow instance.
  • 17:59 sukhe: sudo cumin -b10 "A:cp-text" "run-puppet-agent"
  • 17:58 sukhe: sudo cumin -b1 -s30 "A:cp-text" "run-puppet-agent"
  • 17:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P65487 and previous config saved to /var/cache/conftool/dbconfig/20240626-175317-marostegui.json
  • 17:51 ottomata: disabling varnishkafka-eventlogging and varnish /beacon/event handling on ache text nodes. Puppet is disabled on all cache text, will test a few at a time first. - T238230
  • 17:46 sukhe: disable puppet in A:cp-text
  • 17:43 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5018.eqsin.wmnet
  • 17:40 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5018.eqsin.wmnet with OS bullseye
  • 17:39 sukhe: sudo cumin "A:dnsbox" "run-puppet-agent"
  • 17:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T364069)', diff saved to https://phabricator.wikimedia.org/P65486 and previous config saved to /var/cache/conftool/dbconfig/20240626-173810-marostegui.json
  • 17:37 mnz@deploy1002: Finished deploy [airflow-dags/research@5121748]: (no justification provided) (duration: 00m 11s)
  • 17:37 mnz@deploy1002: Started deploy [airflow-dags/research@5121748]: (no justification provided)
  • 17:29 xcollazo@deploy1002: Finished deploy [analytics/refinery@ca1acb3] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ca1acb34] (duration: 02m 54s)
  • 17:26 xcollazo@deploy1002: Started deploy [analytics/refinery@ca1acb3] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ca1acb34]
  • 17:26 xcollazo@deploy1002: Finished deploy [analytics/refinery@ca1acb3] (thin): Regular analytics weekly train THIN [analytics/refinery@ca1acb34] (duration: 04m 12s)
  • 17:22 xcollazo@deploy1002: Started deploy [analytics/refinery@ca1acb3] (thin): Regular analytics weekly train THIN [analytics/refinery@ca1acb34]
  • 17:17 mnz@deploy1002: Finished deploy [airflow-dags/research@1996a7a]: (no justification provided) (duration: 00m 03s)
  • 17:17 mnz@deploy1002: Started deploy [airflow-dags/research@1996a7a]: (no justification provided)
  • 17:16 sukhe: re-enable puppet on A:cp-text
  • 17:14 ladsgroup@deploy1002: Finished scap: Backport for Modify WikiExporter's BATCH_SIZE from 50000 to 10000 (T368098), Skip failing ForeignResourceStructureTest (T362425), Skip failing ForeignResourceStructureTest (T362425), Modify WikiExporter's BATCH_SIZE from 50000 to 10000 (T368098) (duration: 08m 52s)
  • 17:09 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 17:08 ladsgroup@deploy1002: ladsgroup: Backport for Modify WikiExporter's BATCH_SIZE from 50000 to 10000 (T368098), Skip failing ForeignResourceStructureTest (T362425), Skip failing ForeignResourceStructureTest (T362425), Modify WikiExporter's BATCH_SIZE from 50000 to 10000 (T368098) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwd
  • 17:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5018.eqsin.wmnet with reason: host reimage
  • 17:06 ladsgroup@deploy1002: Started scap: Backport for Modify WikiExporter's BATCH_SIZE from 50000 to 10000 (T368098), Skip failing ForeignResourceStructureTest (T362425), Skip failing ForeignResourceStructureTest (T362425), Modify WikiExporter's BATCH_SIZE from 50000 to 10000 (T368098)
  • 17:03 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5018.eqsin.wmnet with reason: host reimage
  • 17:01 xcollazo@deploy1002: Finished deploy [analytics/refinery@ca1acb3]: Regular analytics weekly train [analytics/refinery@ca1acb34] (duration: 09m 16s)
  • 16:52 xcollazo@deploy1002: Started deploy [analytics/refinery@ca1acb3]: Regular analytics weekly train [analytics/refinery@ca1acb34]
  • 16:52 sukhe: disable puppet on A:cp-text
  • 16:50 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Server swap — T362033
  • 16:50 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Server swap — T362033
  • 16:44 mnz@deploy1002: Finished deploy [airflow-dags/research@1996a7a]: (no justification provided) (duration: 00m 03s)
  • 16:44 mnz@deploy1002: Started deploy [airflow-dags/research@1996a7a]: (no justification provided)
  • 16:39 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 16:38 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 16:30 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5018.eqsin.wmnet with OS bullseye
  • 16:27 xcollazo@deploy1002: Finished deploy [analytics/refinery@ca1acb3]: Regular analytics weekly train [analytics/refinery@ca1acb34] (duration: 00m 29s)
  • 16:27 xcollazo@deploy1002: Started deploy [analytics/refinery@ca1acb3]: Regular analytics weekly train [analytics/refinery@ca1acb34]
  • 16:25 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5018.eqsin.wmnet
  • 16:15 mnz@deploy1002: Finished deploy [airflow-dags/research@1996a7a]: (no justification provided) (duration: 00m 33s)
  • 16:14 mnz@deploy1002: Started deploy [airflow-dags/research@1996a7a]: (no justification provided)
  • 15:58 sukhe: sudo cumin -b1 -s120 "A:dnsbox and not P{dns6001*}" "run-puppet-agent --enable 'rolling out CR 1048064'"
  • 15:58 sukhe: sudo cumin -b1 -s120 "A:dnsbox and not P{dns6001*}" "run-puppet-agent --enable 'rolling out CR 1049969'"
  • 15:43 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1029.eqiad.wmnet
  • 15:38 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt2003-dev.codfw.wmnet
  • 15:38 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:38 elukey@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1029.eqiad.wmnet
  • 15:36 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 15:35 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1029.eqiad.wmnet
  • 15:32 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on logstash1024.eqiad.wmnet with reason: Temporary stop to migrate the VM away from the ganeti node
  • 15:32 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 0:15:00 on logstash1024.eqiad.wmnet with reason: Temporary stop to migrate the VM away from the ganeti node
  • 15:32 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt2003-dev.codfw.wmnet
  • 15:27 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 15:27 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 15:27 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 15:27 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 15:25 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 15:25 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 15:20 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on logstash1023.eqiad.wmnet with reason: Temporary stop to migrate the VM away from the ganeti node
  • 15:20 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 0:15:00 on logstash1023.eqiad.wmnet with reason: Temporary stop to migrate the VM away from the ganeti node
  • 15:16 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt2003-dev.codfw.wmnet
  • 15:16 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:16 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt2003-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 15:15 root@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1006.eqiad.wmnet with OS bullseye
  • 15:14 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt2003-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 15:14 sukhe: sudo cumin "A:dnsbox" 'disable-puppet "rolling out CR 1048064"'
  • 15:12 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 15:08 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt2003-dev.codfw.wmnet
  • 15:08 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt2002-dev.codfw.wmnet
  • 15:07 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:07 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt2002-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 15:06 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt2002-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 15:04 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 15:00 taavi@cumin1002: END (ERROR) - Cookbook sre.puppet.renew-cert (exit_code=97) for cloudcephosd1006.eqiad.wmnet: Renew puppet certificate - taavi@cumin1002
  • 14:59 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt2002-dev.codfw.wmnet
  • 14:58 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt2001-dev.codfw.wmnet
  • 14:58 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:58 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 14:58 taavi@cumin1002: START - Cookbook sre.puppet.renew-cert for cloudcephosd1006.eqiad.wmnet: Renew puppet certificate - taavi@cumin1002
  • 14:57 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 14:54 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 14:48 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt2001-dev.codfw.wmnet
  • 14:40 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding dbproxy2007 to codfw - jhancock@cumin2002"
  • 14:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding dbproxy2007 to codfw - jhancock@cumin2002"
  • 14:33 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 14:29 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Add shellbox-video vars/config, enable on beta (T356241) (duration: 08m 22s)
  • 14:24 logmsgbot: lucaswerkmeister-wmde@deploy1002 hnowlan, lucaswerkmeister-wmde: Continuing with sync
  • 14:24 logmsgbot: lucaswerkmeister-wmde@deploy1002 hnowlan, lucaswerkmeister-wmde: Backport for Add shellbox-video vars/config, enable on beta (T356241) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:21 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Add shellbox-video vars/config, enable on beta (T356241)
  • 14:21 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore[1004-1006].eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 14:19 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for wikidatawiki: Add namespace 640 (EntitySchema) to $wgContentNamespaces (T368010) (duration: 07m 57s)
  • 14:14 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
  • 14:14 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:14 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for wikidatawiki: Add namespace 640 (EntitySchema) to $wgContentNamespaces (T368010) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:13 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:13 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:11 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:11 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for wikidatawiki: Add namespace 640 (EntitySchema) to $wgContentNamespaces (T368010)
  • 14:09 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:09 root@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1006.eqiad.wmnet with reason: host reimage
  • 14:08 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:07 jforrester@deploy1002: Finished scap: Backport for CodeEditor.vue: add watcher for disabled state (T368504) (duration: 08m 00s)
  • 14:07 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 14:06 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 14:06 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 14:06 root@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1006.eqiad.wmnet with reason: host reimage
  • 14:05 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:05 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 14:04 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:04 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 14:04 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:02 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:02 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore[1004-1006].eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 14:02 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 14:02 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore[2005-2006].codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 14:01 jforrester@deploy1002: jforrester: Continuing with sync
  • 14:01 jforrester@deploy1002: jforrester: Backport for CodeEditor.vue: add watcher for disabled state (T368504) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:01 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:01 claime: Deploying statsd-exporter for mw-api-int - T365265
  • 14:01 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database btmwiki (T368066)
  • 14:01 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 13:59 jforrester@deploy1002: Started scap: Backport for CodeEditor.vue: add watcher for disabled state (T368504)
  • 13:56 Lucas_WMDE: UTC afternoon backport+config window done (I might deploy a few more patches later out-of-window)
  • 13:55 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [u4cwiki] Enable importing from dewiki/enwiki/metawiki (T368522), [arbcom_itwiki] Change the logo and a new wordmark and a favicon (T368532) (duration: 08m 49s)
  • 13:50 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, superpes: Continuing with sync
  • 13:49 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore[2005-2006].codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 13:49 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, superpes: Backport for [u4cwiki] Enable importing from dewiki/enwiki/metawiki (T368522), [arbcom_itwiki] Change the logo and a new wordmark and a favicon (T368532) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:46 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for [u4cwiki] Enable importing from dewiki/enwiki/metawiki (T368522), [arbcom_itwiki] Change the logo and a new wordmark and a favicon (T368532)
  • 13:42 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [ltwiki] Add a new 'rollbacker' usergroup (T367993) (duration: 08m 48s)
  • 13:37 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, superpes: Continuing with sync
  • 13:36 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, superpes: Backport for [ltwiki] Add a new 'rollbacker' usergroup (T367993) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database btmwiki (T368066)
  • 13:33 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for [ltwiki] Add a new 'rollbacker' usergroup (T367993)
  • 13:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1209 (T364069)', diff saved to https://phabricator.wikimedia.org/P65481 and previous config saved to /var/cache/conftool/dbconfig/20240626-133239-marostegui.json
  • 13:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 13:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 13:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T364069)', diff saved to https://phabricator.wikimedia.org/P65480 and previous config saved to /var/cache/conftool/dbconfig/20240626-133216-marostegui.json
  • 13:31 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database btmwiki (T368066)
  • 13:30 Lucas_WMDE: lucaswerkmeister-wmde@deploy1002 /srv/mediawiki-staging (master $ u=) $ mwscript-k8s namespaceDupes maiwiki -- --fix # T363667, 0 pages/links to fix, i.e. no-op
  • 13:28 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Meta-Wiki: restrict unfuzzy rights to autoconfirmed (T368416), maiwiki: Remove 'CA' namespace alias (T363667) (duration: 10m 50s)
  • 13:28 elukey@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1029.eqiad.wmnet
  • 13:23 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, dreamrimmer: Continuing with sync
  • 13:23 root@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1006.eqiad.wmnet with OS bullseye
  • 13:20 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, dreamrimmer: Backport for Meta-Wiki: restrict unfuzzy rights to autoconfirmed (T368416), maiwiki: Remove 'CA' namespace alias (T363667) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:19 fnegri@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database btmwiki (T368066)
  • 13:17 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Meta-Wiki: restrict unfuzzy rights to autoconfirmed (T368416), maiwiki: Remove 'CA' namespace alias (T363667)
  • 13:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P65479 and previous config saved to /var/cache/conftool/dbconfig/20240626-131709-marostegui.json
  • 13:15 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [CheckUser] Stop writing old for event tables migration on group1 (T360685) (duration: 12m 09s)
  • 13:13 elukey: reload nginx on registry* nodes (Docker registry) to pick up new logging changes
  • 13:10 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, dreamyjazz: Continuing with sync
  • 13:06 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, dreamyjazz: Backport for [CheckUser] Stop writing old for event tables migration on group1 (T360685) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:03 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for [CheckUser] Stop writing old for event tables migration on group1 (T360685)
  • 13:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P65478 and previous config saved to /var/cache/conftool/dbconfig/20240626-130201-marostegui.json
  • 12:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T364069)', diff saved to https://phabricator.wikimedia.org/P65476 and previous config saved to /var/cache/conftool/dbconfig/20240626-124654-marostegui.json
  • 12:20 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 12:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2148 (T367856)', diff saved to https://phabricator.wikimedia.org/P65471 and previous config saved to /var/cache/conftool/dbconfig/20240626-121158-marostegui.json
  • 12:11 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 12:11 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 12:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T367856)', diff saved to https://phabricator.wikimedia.org/P65470 and previous config saved to /var/cache/conftool/dbconfig/20240626-121136-marostegui.json
  • 11:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P65469 and previous config saved to /var/cache/conftool/dbconfig/20240626-115628-marostegui.json
  • 11:55 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:55 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:54 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 11:54 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 11:54 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:54 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:52 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:52 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:51 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:51 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:44 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:43 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P65468 and previous config saved to /var/cache/conftool/dbconfig/20240626-114121-marostegui.json
  • 11:41 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 11:40 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 11:39 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:39 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:35 moritzm: installing emacs security updates
  • 11:27 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 11:26 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 11:26 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 11:26 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 11:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T367856)', diff saved to https://phabricator.wikimedia.org/P65467 and previous config saved to /var/cache/conftool/dbconfig/20240626-112614-marostegui.json
  • 11:24 root@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1006.eqiad.wmnet with OS bullseye
  • 11:24 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 11:23 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 11:19 jynus@cumin1002: dbctl commit (dc=all): 'Repool es2022 fully T363812', diff saved to https://phabricator.wikimedia.org/P65466 and previous config saved to /var/cache/conftool/dbconfig/20240626-111934-jynus.json
  • 11:14 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 11:13 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 11:12 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 11:12 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 11:07 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 11:07 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 11:06 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 11:06 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 10:39 jynus@cumin1002: dbctl commit (dc=all): 'Repool es2022 at 50% T363812', diff saved to https://phabricator.wikimedia.org/P65465 and previous config saved to /var/cache/conftool/dbconfig/20240626-103933-jynus.json
  • 10:25 jynus@cumin1002: dbctl commit (dc=all): 'Repool es2022 after backup T363812', diff saved to https://phabricator.wikimedia.org/P65464 and previous config saved to /var/cache/conftool/dbconfig/20240626-102523-jynus.json
  • 10:20 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 10:13 claime: enabling puppet on cp-text - T367949
  • 10:04 claime: enabling puppet on cp4037 - T367949
  • 10:02 claime: disabling puppet on cp-text - T367949
  • 09:59 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 09:55 slyngs: Update idp.wikimedia.org to CAS 6.6.15.2 (T368503)
  • 09:50 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 09:48 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 09:46 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 09:44 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 09:38 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 09:01 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idp-test1002.wikimedia.org with OS bookworm
  • 08:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2136 T365805', diff saved to https://phabricator.wikimedia.org/P65463 and previous config saved to /var/cache/conftool/dbconfig/20240626-085511-root.json
  • 08:44 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for puppetmaster1003.eqiad.wmnet: Renew puppet certificate - elukey@cumin1002
  • 08:42 elukey@cumin1002: START - Cookbook sre.puppet.renew-cert for puppetmaster1003.eqiad.wmnet: Renew puppet certificate - elukey@cumin1002
  • 08:40 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idp-test1002.wikimedia.org with reason: host reimage
  • 08:39 hashar@deploy1002: Finished deploy [gerrit/gerrit@2fc2b03]: Gerrit to 3.10 on gerrit1003 # T367419 (duration: 00m 43s)
  • 08:39 hashar@deploy1002: Started deploy [gerrit/gerrit@2fc2b03]: Gerrit to 3.10 on gerrit1003 # T367419
  • 08:38 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on idp-test1002.wikimedia.org with reason: host reimage
  • 08:37 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1203 (T364069)', diff saved to https://phabricator.wikimedia.org/P65462 and previous config saved to /var/cache/conftool/dbconfig/20240626-083733-marostegui.json
  • 08:37 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 08:37 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 08:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T364069)', diff saved to https://phabricator.wikimedia.org/P65461 and previous config saved to /var/cache/conftool/dbconfig/20240626-083711-marostegui.json
  • 08:32 hashar@deploy1002: Finished deploy [gerrit/gerrit@2fc2b03]: Gerrit to 3.10 on gerrit2002 # T367419 (duration: 00m 48s)
  • 08:31 hashar@deploy1002: Started deploy [gerrit/gerrit@2fc2b03]: Gerrit to 3.10 on gerrit2002 # T367419
  • 08:25 slyngshede@cumin1002: START - Cookbook sre.hosts.reimage for host idp-test1002.wikimedia.org with OS bookworm
  • 08:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P65460 and previous config saved to /var/cache/conftool/dbconfig/20240626-082204-marostegui.json
  • 08:11 jynus@cumin1002: dbctl commit (dc=all): 'Depool es1025 for backups T363812', diff saved to https://phabricator.wikimedia.org/P65458 and previous config saved to /var/cache/conftool/dbconfig/20240626-081130-jynus.json
  • 08:10 marostegui@cumin1002: dbctl commit (dc=all): 'Set es1023 as es5 master - this is a NOOP', diff saved to https://phabricator.wikimedia.org/P65457 and previous config saved to /var/cache/conftool/dbconfig/20240626-081014-marostegui.json
  • 08:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P65456 and previous config saved to /var/cache/conftool/dbconfig/20240626-080657-marostegui.json
  • 08:06 marostegui@cumin1002: dbctl commit (dc=all): 'Fix weights for es2021 and es2024', diff saved to https://phabricator.wikimedia.org/P65455 and previous config saved to /var/cache/conftool/dbconfig/20240626-080649-marostegui.json
  • 07:59 jynus@cumin1002: dbctl commit (dc=all): 'Depool es1022 for backups T363812', diff saved to https://phabricator.wikimedia.org/P65454 and previous config saved to /var/cache/conftool/dbconfig/20240626-075946-jynus.json
  • 07:54 jynus@cumin1002: dbctl commit (dc=all): 'Repool es2025 at 100% load', diff saved to https://phabricator.wikimedia.org/P65453 and previous config saved to /var/cache/conftool/dbconfig/20240626-075428-jynus.json
  • 07:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T364069)', diff saved to https://phabricator.wikimedia.org/P65451 and previous config saved to /var/cache/conftool/dbconfig/20240626-075043-marostegui.json
  • 07:44 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 07:33 jynus@cumin1002: dbctl commit (dc=all): 'Repool es2025 at 50% load', diff saved to https://phabricator.wikimedia.org/P65449 and previous config saved to /var/cache/conftool/dbconfig/20240626-073304-jynus.json
  • 07:28 jynus@cumin1002: dbctl commit (dc=all): 'Repool es2025 with low load for warmup', diff saved to https://phabricator.wikimedia.org/P65448 and previous config saved to /var/cache/conftool/dbconfig/20240626-072810-jynus.json
  • 07:03 moritzm: installing emacs security updates
  • 06:56 marostegui@cumin1002: dbctl commit (dc=all): 'Pool db2136 - running 10.11 with minium weight T365805', diff saved to https://phabricator.wikimedia.org/P65447 and previous config saved to /var/cache/conftool/dbconfig/20240626-065636-marostegui.json
  • 06:52 marostegui: Enable slow query log on db2136 running 10.11 T365805
  • 06:39 marostegui: Install mariadb 10.11 on s4 db2136 (depooled for now) T365805
  • 06:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2136 T365805', diff saved to https://phabricator.wikimedia.org/P65446 and previous config saved to /var/cache/conftool/dbconfig/20240626-063109-root.json
  • 06:01 marostegui: dbmaint eqiad Drop ipblocks in s1 T367632
  • 05:59 marostegui: dbmaint eqiad Drop ipblocks in s3 T367632
  • 05:57 marostegui: dbmaint eqiad Drop ipblocks in s4 T367632
  • 05:39 ryankemper: [Elastic] `curl -s -X POST https://search.svc.eqiad.wmnet:9243/_cluster/reroute?retry_failed=true` did the trick. Shard initializing, cluster should be back to green soon enough
  • 05:36 ryankemper: [Elastic] One unassigned shard; cluster status yellow. Not a big deal, looks like `shard has exceeded the maximum number of retries [5] on failed allocation attempts`, I'll try a manual `/_cluster/reroute?retry_failed=true`
  • 05:01 marostegui: dbmaint eqiad Drop ipblocks in s5 T367632
  • 04:53 marostegui: dbmaint eqiad Drop ipblocks in s2 T367632
  • 04:51 marostegui: dbmaint eqiad Drop ipblocks in s8 T367632
  • 03:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1193 (T364069)', diff saved to https://phabricator.wikimedia.org/P65445 and previous config saved to /var/cache/conftool/dbconfig/20240626-033955-marostegui.json
  • 03:39 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 03:39 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 03:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T364069)', diff saved to https://phabricator.wikimedia.org/P65444 and previous config saved to /var/cache/conftool/dbconfig/20240626-033933-marostegui.json
  • 03:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P65443 and previous config saved to /var/cache/conftool/dbconfig/20240626-032426-marostegui.json
  • 03:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P65442 and previous config saved to /var/cache/conftool/dbconfig/20240626-030919-marostegui.json
  • 02:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T364069)', diff saved to https://phabricator.wikimedia.org/P65441 and previous config saved to /var/cache/conftool/dbconfig/20240626-025412-marostegui.json
  • 00:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2138 (T367856)', diff saved to https://phabricator.wikimedia.org/P65440 and previous config saved to /var/cache/conftool/dbconfig/20240626-002103-marostegui.json
  • 00:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 00:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 00:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T367856)', diff saved to https://phabricator.wikimedia.org/P65439 and previous config saved to /var/cache/conftool/dbconfig/20240626-002041-marostegui.json
  • 00:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P65438 and previous config saved to /var/cache/conftool/dbconfig/20240626-000534-marostegui.json

2024-06-25

  • 23:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P65437 and previous config saved to /var/cache/conftool/dbconfig/20240625-235027-marostegui.json
  • 23:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T367856)', diff saved to https://phabricator.wikimedia.org/P65436 and previous config saved to /var/cache/conftool/dbconfig/20240625-233520-marostegui.json
  • 23:27 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2006-dev.codfw.wmnet with OS bookworm
  • 23:05 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2006-dev.codfw.wmnet with reason: host reimage
  • 23:02 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2006-dev.codfw.wmnet with reason: host reimage
  • 22:44 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt2006-dev.codfw.wmnet with OS bookworm
  • 22:43 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet
  • 22:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1178 (T364069)', diff saved to https://phabricator.wikimedia.org/P65435 and previous config saved to /var/cache/conftool/dbconfig/20240625-224249-marostegui.json
  • 22:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 22:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 22:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T364069)', diff saved to https://phabricator.wikimedia.org/P65434 and previous config saved to /var/cache/conftool/dbconfig/20240625-224226-marostegui.json
  • 22:33 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2005-dev.codfw.wmnet with OS bookworm
  • 22:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P65433 and previous config saved to /var/cache/conftool/dbconfig/20240625-222719-marostegui.json
  • 22:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P65432 and previous config saved to /var/cache/conftool/dbconfig/20240625-221212-marostegui.json
  • 22:10 bvibber: a webVideoTranscode job reported 'No space left on device' from a failed ffmpeg run on mw1446 recently
  • 22:09 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2005-dev.codfw.wmnet with reason: host reimage
  • 22:05 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2005-dev.codfw.wmnet with reason: host reimage
  • 21:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T364069)', diff saved to https://phabricator.wikimedia.org/P65431 and previous config saved to /var/cache/conftool/dbconfig/20240625-215705-marostegui.json
  • 21:47 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt2005-dev.codfw.wmnet with OS bookworm
  • 20:44 cjming: end of UTC late backport window
  • 20:41 cjming@deploy1002: Finished scap: Backport for Cleanup: Remove wgNavigationTimingSurveyName (T367128) (duration: 08m 29s)
  • 20:36 cjming@deploy1002: jdlrobson, cjming: Continuing with sync
  • 20:35 cjming@deploy1002: jdlrobson, cjming: Backport for Cleanup: Remove wgNavigationTimingSurveyName (T367128) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:32 cjming@deploy1002: Started scap: Backport for Cleanup: Remove wgNavigationTimingSurveyName (T367128)
  • 20:31 cjming@deploy1002: Finished scap: Backport for Enable dark mode on more pages (T366378 T367374 T366373 T366520 T366373) (duration: 15m 04s)
  • 20:26 cjming@deploy1002: jdlrobson, cjming: Continuing with sync
  • 20:19 cjming@deploy1002: jdlrobson, cjming: Backport for Enable dark mode on more pages (T366378 T367374 T366373 T366520 T366373) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:16 cjming@deploy1002: Started scap: Backport for Enable dark mode on more pages (T366378 T367374 T366373 T366520 T366373)
  • 20:14 cjming@deploy1002: Finished scap: Backport for Temporarily disable '4K' 2160p and mid 1440p transcodes (T368433) (duration: 08m 36s)
  • 20:11 Emperor: restart swift-proxy on ms-fe2010 ms-fe1011 T360913
  • 20:09 cjming@deploy1002: cjming, bvibber: Continuing with sync
  • 20:08 cjming@deploy1002: cjming, bvibber: Backport for Temporarily disable '4K' 2160p and mid 1440p transcodes (T368433) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:05 cjming@deploy1002: Started scap: Backport for Temporarily disable '4K' 2160p and mid 1440p transcodes (T368433)
  • 20:03 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5017.eqsin.wmnet with OS bullseye
  • 20:01 hashar@deploy1002: Finished deploy [integration/docroot@1eb5f4c]: remove CollaborationKit T368092 (duration: 00m 07s)
  • 20:01 hashar@deploy1002: Started deploy [integration/docroot@1eb5f4c]: remove CollaborationKit T368092
  • 19:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2126 (T367856)', diff saved to https://phabricator.wikimedia.org/P65430 and previous config saved to /var/cache/conftool/dbconfig/20240625-192947-marostegui.json
  • 19:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 19:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 19:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 19:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 19:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T367856)', diff saved to https://phabricator.wikimedia.org/P65429 and previous config saved to /var/cache/conftool/dbconfig/20240625-192910-marostegui.json
  • 19:28 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5017.eqsin.wmnet with reason: host reimage
  • 19:25 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5017.eqsin.wmnet with reason: host reimage
  • 19:23 sukhe: re-enable puppet on lvs2011
  • 19:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P65428 and previous config saved to /var/cache/conftool/dbconfig/20240625-191403-marostegui.json
  • 18:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P65426 and previous config saved to /var/cache/conftool/dbconfig/20240625-185856-marostegui.json
  • 18:49 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5017.eqsin.wmnet with OS bullseye
  • 18:49 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5017.eqsin.wmnet with OS bullseye
  • 18:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T367856)', diff saved to https://phabricator.wikimedia.org/P65425 and previous config saved to /var/cache/conftool/dbconfig/20240625-184349-marostegui.json
  • 18:31 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5017.eqsin.wmnet with OS bullseye
  • 18:28 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5017.eqsin.wmnet
  • 18:22 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2004-dev.codfw.wmnet with OS bookworm
  • 18:14 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.11 refs T366956
  • 18:06 topranks: bringing up link from ssw1-a1-codfw to ssw1-d1-codfw T364095
  • 17:57 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2004-dev.codfw.wmnet with reason: host reimage
  • 17:55 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2004-dev.codfw.wmnet with reason: host reimage
  • 17:51 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2004.codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 17:44 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2004.codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 17:43 brett: Re-re-pooling lvs2011 - T368165
  • 17:37 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt2004-dev.codfw.wmnet with OS bookworm
  • 17:36 brett: Depooling lvs2011 due to elevated socket/tcp errors - T368165
  • 17:28 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2004-dev.codfw.wmnet with OS bookworm
  • 17:28 brett: Pooling lvs2011 - T368165
  • 17:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1177 (T364069)', diff saved to https://phabricator.wikimedia.org/P65424 and previous config saved to /var/cache/conftool/dbconfig/20240625-172502-marostegui.json
  • 17:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 17:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 17:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T364069)', diff saved to https://phabricator.wikimedia.org/P65423 and previous config saved to /var/cache/conftool/dbconfig/20240625-172440-marostegui.json
  • 17:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P65422 and previous config saved to /var/cache/conftool/dbconfig/20240625-170933-marostegui.json
  • 17:06 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 17:04 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2004-dev.codfw.wmnet with reason: host reimage
  • 17:02 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2004-dev.codfw.wmnet with reason: host reimage
  • 17:01 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-codfw: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 16:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P65421 and previous config saved to /var/cache/conftool/dbconfig/20240625-165426-marostegui.json
  • 16:49 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 16:43 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt2004-dev.codfw.wmnet with OS bookworm
  • 16:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T364069)', diff saved to https://phabricator.wikimedia.org/P65420 and previous config saved to /var/cache/conftool/dbconfig/20240625-163919-marostegui.json
  • 16:37 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 16:33 arnaudb@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 100%: post T365986 repool', diff saved to https://phabricator.wikimedia.org/P65419 and previous config saved to /var/cache/conftool/dbconfig/20240625-163330-arnaudb.json
  • 16:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1437.eqiad.wmnet
  • 16:31 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw1437.eqiad.wmnet
  • 16:27 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw1437.eqiad.wmnet with reason: Resizing disk
  • 16:27 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on mw1437.eqiad.wmnet with reason: Resizing disk
  • 16:23 bvibber: running requeueTranscodes for missing audio files on commons (mwmaint1002) cf T368364
  • 16:23 claime: depooling mw1437
  • 16:19 claime: cleaning up shellbox leftover files on mw1437.eqiad.wmnet
  • 16:19 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 16:18 arnaudb@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 75%: post T365986 repool', diff saved to https://phabricator.wikimedia.org/P65418 and previous config saved to /var/cache/conftool/dbconfig/20240625-161824-arnaudb.json
  • 16:15 claime: Extending vg-srv on mw1437
  • 16:10 brennen@deploy1002: Finished deploy [phabricator/deployment@72ad841]: deploy phab1004 for T368392 - followup T364728 (duration: 00m 39s)
  • 16:10 brennen@deploy1002: Started deploy [phabricator/deployment@72ad841]: deploy phab1004 for T368392 - followup T364728
  • 16:09 brennen@deploy1002: Finished deploy [phabricator/deployment@72ad841]: deploy phab2002 for T368392 - followup T364728 (duration: 00m 33s)
  • 16:08 brennen@deploy1002: Started deploy [phabricator/deployment@72ad841]: deploy phab2002 for T368392 - followup T364728
  • 16:05 brennen: silencing phabricator hosts prior to deploy
  • 16:03 arnaudb@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 50%: post T365986 repool', diff saved to https://phabricator.wikimedia.org/P65417 and previous config saved to /var/cache/conftool/dbconfig/20240625-160318-arnaudb.json
  • 15:33 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-codfw: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 15:33 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs[1011-1021].eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 15:33 arnaudb@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 10%: post T365986 repool', diff saved to https://phabricator.wikimedia.org/P65415 and previous config saved to /var/cache/conftool/dbconfig/20240625-153307-arnaudb.json
  • 15:31 Dreamy_Jazz: Ran `mwscript extensions/CheckUser/maintenance/deleteReadOldRowsInCuChanges.php --wiki=testwiki` for T366781
  • 15:22 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 15:21 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 15:21 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 15:20 claime: Deploying statsd to mw-api-ext - T365265
  • 15:19 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 15:18 arnaudb@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 5%: post T365986 repool', diff saved to https://phabricator.wikimedia.org/P65414 and previous config saved to /var/cache/conftool/dbconfig/20240625-151802-arnaudb.json
  • 15:06 brennen@deploy1002: Finished deploy [phabricator/deployment@f58dd50]: deploy phab1004 for T368392 (duration: 00m 50s)
  • 15:05 brennen@deploy1002: Started deploy [phabricator/deployment@f58dd50]: deploy phab1004 for T368392
  • 15:05 brennen@deploy1002: Finished deploy [phabricator/deployment@f58dd50]: deploy phab2002 for T368392 (duration: 00m 33s)
  • 15:04 brennen@deploy1002: Started deploy [phabricator/deployment@f58dd50]: deploy phab2002 for T368392
  • 15:03 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
  • 15:03 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
  • 15:03 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
  • 15:02 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
  • 15:00 topranks: rebooting lsw1-e5-eqiad to upgrade JunOS on switch T365986
  • 14:58 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on 7 hosts with reason: JunOS upgrade lsw1-e5-eqiad
  • 14:58 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on 7 hosts with reason: JunOS upgrade lsw1-e5-eqiad
  • 14:57 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-e5-eqiad,lsw1-e5-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e5-eqiad
  • 14:57 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-e5-eqiad,lsw1-e5-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e5-eqiad
  • 14:56 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 14:56 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on es1035.eqiad.wmnet with reason: T365986
  • 14:56 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:45:00 on es1035.eqiad.wmnet with reason: T365986
  • 14:56 arnaudb@cumin1002: dbctl commit (dc=all): 'T365986 - depool es1035', diff saved to https://phabricator.wikimedia.org/P65413 and previous config saved to /var/cache/conftool/dbconfig/20240625-145558-arnaudb.json
  • 14:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-e5-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e5-eqiad
  • 14:55 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 14:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-e5-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e5-eqiad
  • 14:50 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 14:49 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 14:45 urbanecm@deploy1002: Finished scap: Backport for WikiPageWriter: Do not run AbuseFilter when UltimateAuthority is used (T368275) (duration: 11m 45s)
  • 14:40 urbanecm@deploy1002: urbanecm: Continuing with sync
  • 14:40 urbanecm@deploy1002: urbanecm: Backport for WikiPageWriter: Do not run AbuseFilter when UltimateAuthority is used (T368275) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:36 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding dbproxy2005 to codfw - jhancock@cumin2002"
  • 14:35 sukhe: sudo cumin -b1 -s900 "A:dnsbox" "run-puppet-agent --enable 'rolling out CR 1049165' && systemctl restart ntp.service"
  • 14:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding dbproxy2005 to codfw - jhancock@cumin2002"
  • 14:33 vgutierrez: rolling upgrade of fifo-log-demux on A:cp-eqiad - T364383
  • 14:33 urbanecm@deploy1002: Started scap: Backport for WikiPageWriter: Do not run AbuseFilter when UltimateAuthority is used (T368275)
  • 14:30 vgutierrez: disable puppet on A:cp-eqiad before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1049570 - T364383
  • 14:24 dcausse: re-indexing all wikidata entity schemas (T368010)
  • 14:23 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 14:17 urbanecm@deploy1002: Finished scap: Backport for Add change tag "Community Configuration" (T366989), Add change tag "Community Configuration" (T366989), WikiPageWriter: Do not run AbuseFilter when UltimateAuthority is used (T368275) (duration: 58m 28s)
  • 14:15 sukhe: sudo cumin "A:dnsbox" 'disable-puppet "rolling out CR 1049165"'
  • 14:12 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs[1011-1021].eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 14:11 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-codfw and A:lvs
  • 14:10 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-codfw and A:lvs
  • 14:09 urbanecm@deploy1002: urbanecm: Continuing with sync
  • 14:05 sukhe: restart pybal on lvs2014
  • 14:02 eevans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
  • 14:02 eevans@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
  • 14:01 eevans@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
  • 14:01 eevans@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
  • 14:00 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-eqiad and A:lvs
  • 14:00 urbanecm@deploy1002: urbanecm: Backport for Add change tag "Community Configuration" (T366989), Add change tag "Community Configuration" (T366989), WikiPageWriter: Do not run AbuseFilter when UltimateAuthority is used (T368275) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:59 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
  • 13:59 eevans@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
  • 13:54 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-eqiad and A:lvs
  • 13:51 sukhe: restart pybal on lvs1020
  • 13:44 sukhe: disable puppet on A:lvs and A:codfw for CR 1049560
  • 13:43 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:43 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt forkrb1002 - jclark@cumin1002"
  • 13:42 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt forkrb1002 - jclark@cumin1002"
  • 13:39 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 13:37 mvernon@cumin2002: conftool action : set/pooled=yes; selector: cluster=apus
  • 13:36 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
  • 13:36 eevans@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
  • 13:29 vgutierrez: IPIP encapsulation enabled on ldap-ro.eqiad.wikimedia.org - T367861
  • 13:26 vgutierrez: rolling restart of pybal on lvs1020 and lvs1018 - T367861
  • 13:18 urbanecm@deploy1002: Started scap: Backport for Add change tag "Community Configuration" (T366989), Add change tag "Community Configuration" (T366989), WikiPageWriter: Do not run AbuseFilter when UltimateAuthority is used (T368275)
  • 13:07 fabfur: temporary disabled puppet on cp4037 to test benthos configuration (T367756)
  • 12:51 cgoubert@deploy1002: Finished scap: Deploy udp2log rate-limiting - T365655 - T368098 (duration: 05m 49s)
  • 12:46 cgoubert@deploy1002: Started scap: Deploy udp2log rate-limiting - T365655 - T368098
  • 12:44 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 12:44 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 12:42 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 12:42 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 12:12 XioNoX: push NTP changes on pfw3
  • 12:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1172 (T364069)', diff saved to https://phabricator.wikimedia.org/P65411 and previous config saved to /var/cache/conftool/dbconfig/20240625-120926-marostegui.json
  • 12:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 12:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 11:58 vgutierrez: rolling upgrade of fifo-log-demux on A:cp-esams - T364383
  • 11:56 vgutierrez: disable puppet on A:cp-esams before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1049529 - T364383
  • 11:55 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:53 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:45 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:45 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:40 marostegui: m2 dbmaint eqiad Stop db1217:3322 to clone db1228 T368374
  • 10:12 jmm@deploy1002: Finished scap: (no justification provided) (duration: 03m 30s)
  • 10:11 jmm@deploy1002: Started scap: (no justification provided)
  • 09:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 21 days, 0:00:00 on 11 hosts with reason: Turning down appserver clusters
  • 09:53 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 21 days, 0:00:00 on 11 hosts with reason: Turning down appserver clusters
  • 09:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 21 days, 0:00:00 on 25 hosts with reason: Turning down appserver clusters
  • 09:49 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 21 days, 0:00:00 on 25 hosts with reason: Turning down appserver clusters
  • 09:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db[1217,1228].eqiad.wmnet with reason: Cloning
  • 09:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db[1217,1228].eqiad.wmnet with reason: Cloning
  • 09:34 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db1228 from dbctl T368374', diff saved to https://phabricator.wikimedia.org/P65409 and previous config saved to /var/cache/conftool/dbconfig/20240625-093454-marostegui.json
  • 09:34 slyngs: Switching idp-test.wikimedia.org to CAS 7
  • 09:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1228 T368374', diff saved to https://phabricator.wikimedia.org/P65408 and previous config saved to /var/cache/conftool/dbconfig/20240625-093221-root.json
  • 08:45 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2025.codfw.wmnet with reason: full dump
  • 08:45 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2025.codfw.wmnet with reason: full dump
  • 08:32 jynus@cumin1002: dbctl commit (dc=all): 'Depool es2025', diff saved to https://phabricator.wikimedia.org/P65407 and previous config saved to /var/cache/conftool/dbconfig/20240625-083216-jynus.json
  • 08:31 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2022.codfw.wmnet with reason: full dump
  • 08:31 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2022.codfw.wmnet with reason: full dump
  • 08:26 jynus@cumin1002: dbctl commit (dc=all): 'Depool es2022', diff saved to https://phabricator.wikimedia.org/P65406 and previous config saved to /var/cache/conftool/dbconfig/20240625-082649-jynus.json
  • 07:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dborch1001.wikimedia.org
  • 07:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dborch1001.wikimedia.org
  • 07:19 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 07:19 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 07:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T364069)', diff saved to https://phabricator.wikimedia.org/P65405 and previous config saved to /var/cache/conftool/dbconfig/20240625-071855-marostegui.json
  • 07:14 marostegui: Optimize pagelinks on old s8 codfw master db2165 dbmaint T364069
  • 07:06 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Long schema change
  • 07:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Long schema change
  • 07:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P65404 and previous config saved to /var/cache/conftool/dbconfig/20240625-070348-marostegui.json
  • 07:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2165 T368355', diff saved to https://phabricator.wikimedia.org/P65403 and previous config saved to /var/cache/conftool/dbconfig/20240625-070252-marostegui.json
  • 07:01 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2161 to s8 primary T368355', diff saved to https://phabricator.wikimedia.org/P65402 and previous config saved to /var/cache/conftool/dbconfig/20240625-070127-marostegui.json
  • 07:01 marostegui: Starting s8 codfw failover from db2165 to db2161 - T368355
  • 07:00 arnaudb@deploy1002: Finished scap: Backport for Revert "dbconfig: temporary disable writes on es7" (duration: 07m 47s)
  • 06:55 arnaudb@deploy1002: arnaudb: Continuing with sync
  • 06:55 arnaudb@deploy1002: arnaudb: Backport for Revert "dbconfig: temporary disable writes on es7" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 06:54 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 06:52 arnaudb@deploy1002: Started scap: Backport for Revert "dbconfig: temporary disable writes on es7"
  • 06:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P65401 and previous config saved to /var/cache/conftool/dbconfig/20240625-064841-marostegui.json
  • 06:45 arnaudb@deploy1002: Sync cancelled.
  • 06:45 arnaudb@deploy1002: arnaudb: Backport for Revert "dbconfig: temporary disable writes on es7" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 06:42 arnaudb@deploy1002: Started scap: Backport for Revert "dbconfig: temporary disable writes on es7"
  • 06:40 arnaudb@cumin1002: dbctl commit (dc=all): 'T368020', diff saved to https://phabricator.wikimedia.org/P65400 and previous config saved to /var/cache/conftool/dbconfig/20240625-064000-arnaudb.json
  • 06:39 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 33 hosts with reason: Primary switchover s8 T368355
  • 06:39 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2161 with weight 0 T368355', diff saved to https://phabricator.wikimedia.org/P65399 and previous config saved to /var/cache/conftool/dbconfig/20240625-063908-root.json
  • 06:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 33 hosts with reason: Primary switchover s8 T368355
  • 06:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote es1039 to es7 primary T368020', diff saved to https://phabricator.wikimedia.org/P65398 and previous config saved to /var/cache/conftool/dbconfig/20240625-063453-arnaudb.json
  • 06:33 arnaudb: Starting es7 eqiad failover from es1035 to es1039 - T368020
  • 06:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T364069)', diff saved to https://phabricator.wikimedia.org/P65397 and previous config saved to /var/cache/conftool/dbconfig/20240625-063334-marostegui.json
  • 06:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Set es1039 with weight 0 T368020', diff saved to https://phabricator.wikimedia.org/P65396 and previous config saved to /var/cache/conftool/dbconfig/20240625-062640-arnaudb.json
  • 06:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es7 T368020
  • 06:25 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es7 T368020
  • 06:24 arnaudb@deploy1002: Finished scap: Backport for dbconfig: temporary disable writes on es7 (T368020) (duration: 18m 47s)
  • 06:19 arnaudb@deploy1002: arnaudb: Continuing with sync
  • 06:17 arnaudb@deploy1002: arnaudb: Backport for dbconfig: temporary disable writes on es7 (T368020) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 06:11 marostegui: Drop ipblocks from s7 T367632
  • 06:05 arnaudb@deploy1002: Started scap: Backport for dbconfig: temporary disable writes on es7 (T368020)
  • 06:02 marostegui: Drop ipblocks from s6 T367632
  • 05:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1167 (T364069)', diff saved to https://phabricator.wikimedia.org/P65395 and previous config saved to /var/cache/conftool/dbconfig/20240625-053312-marostegui.json
  • 05:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 05:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 05:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 05:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 05:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2125 (T367856)', diff saved to https://phabricator.wikimedia.org/P65394 and previous config saved to /var/cache/conftool/dbconfig/20240625-053239-marostegui.json
  • 05:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 05:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 04:01 mwpresync@deploy1002: Pruned MediaWiki: 1.43.0-wmf.8 (duration: 00m 55s)
  • 03:55 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.43.0-wmf.11 refs T366956 (duration: 52m 19s)
  • 03:03 mwpresync@deploy1002: Started scap: testwikis wikis to 1.43.0-wmf.11 refs T366956
  • 01:48 brett: Running authdns-update on dns1004 to pool eqsin - T365763
  • 01:43 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=cache_text,dc=eqsin
  • 01:40 brett: Removing downtime for cp[5017-5024] as nvme drives are installed and hosts back online - T365763
  • 00:43 sukhe: [correction of command] sudo pkill ffmpeg: mw1438, high CPU usage, ffmpeg processes
  • 00:43 sukhe: sudo pkill mpeg: mw1438, high CPU usage, ffmpeg processes
  • 00:01 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 8 hosts with reason: T365763
  • 00:01 brett@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on 8 hosts with reason: T365763

2024-06-24

  • 23:02 brett: Running authdns-update on dns1004 to depool eqsin - T365763
  • 23:00 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2003.codfw.wmnet
  • 23:00 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:00 cwhite@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
  • 22:57 cwhite@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
  • 22:53 cwhite@cumin2002: START - Cookbook sre.dns.netbox
  • 22:46 cwhite@cumin2002: START - Cookbook sre.hosts.decommission for hosts logstash2003.codfw.wmnet
  • 22:46 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2002.codfw.wmnet
  • 22:46 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:46 cwhite@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
  • 22:41 cwhite@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
  • 22:38 cwhite@cumin2002: START - Cookbook sre.dns.netbox
  • 22:26 cwhite@cumin2002: START - Cookbook sre.hosts.decommission for hosts logstash2002.codfw.wmnet
  • 22:26 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2001.codfw.wmnet
  • 22:26 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:26 cwhite@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
  • 22:24 cwhite@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
  • 22:18 cwhite@cumin2002: START - Cookbook sre.dns.netbox
  • 22:11 cwhite@cumin2002: START - Cookbook sre.hosts.decommission for hosts logstash2001.codfw.wmnet
  • 21:34 inflatador: bking@alerts1001 uninstall deb pkg `ripgrep` T368107
  • 21:03 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-redacteddb1001.eqiad.wmnet
  • 21:00 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 20:53 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-redacteddb1001.eqiad.wmnet
  • 20:36 inflatador: bking@alert1001 install `ripgrep` deb pkg T368107
  • 20:22 ladsgroup@deploy1002: Synchronized php-1.43.0-wmf.10/includes/libs/rdbms/loadbalancer/LoadBalancer.php: (no justification provided) (duration: 11m 04s)
  • 20:21 mutante: snapsho1017 - systemctl mask commonsrdf-dump ; systemctl mask commonsjson-dump T368098
  • 20:18 taavi: taavi@snapshot1017 ~ $ sudo systemctl stop commons*.service
  • 20:01 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1056.eqiad.wmnet with OS bookworm
  • 19:35 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1056.eqiad.wmnet with reason: host reimage
  • 19:32 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1056.eqiad.wmnet with reason: host reimage
  • 19:27 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 19:27 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 19:26 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 19:26 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 19:08 mutante: LDAP - added daphnesmit to group 'wmf' - Phabricator: added dsmit-wmf to WMF-NDA group T368140
  • 19:02 sukhe: ms-fe1009: restart swift-proxy: T360913
  • 18:59 mutante: ms-fe1011 - restarted swift-proxy
  • 18:53 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 18:52 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 15 hosts
  • 18:52 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for 15 hosts
  • 18:50 eevans@cumin1002: END (ERROR) - Cookbook sre.cassandra.roll-restart (exit_code=97) for nodes matching A:restbase-eqiad: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 18:50 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 18:50 sukhe: sudo cumin -s1 -b60 'ms-fe1010*,ms-fe1013*' 'systemctl restart swift-proxy'
  • 18:50 mutante: ms-fe1010,ms-fe1013 - restart swift-proxy - T360913
  • 18:48 ladsgroup@deploy1002: Synchronized private/PrivateSettings.php: Rotate ChronologyProtector secret (duration: 11m 33s)
  • 18:46 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching P{P:cassandra%rack = "d"} and A:restbase and A:eqiad: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 18:43 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 18:41 ladsgroup@deploy1002: ladsgroup: Rotate ChronologyProtector secret synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 18:17 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1055.eqiad.wmnet with OS bookworm
  • 18:16 mutante: ms-fe1012:~] $ sudo systemctl restart swift-proxy T360913
  • 18:16 mutante: ms-fe1012:~] $ sudo systemctl restart swift-proxy T360931
  • 18:07 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 18:06 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 18:06 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 18:05 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 18:04 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1020.eqiad.wmnet
  • 18:04 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs1020.eqiad.wmnet
  • 18:03 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching P{P:cassandra%rack = "d"} and A:restbase and A:eqiad: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 18:02 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching P{P:cassandra%rack = "b"} and A:restbase and A:eqiad: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 17:57 sukhe: restart on pybal lvs1019
  • 17:56 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs
  • 17:53 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=apus,dc=eqiad
  • 17:50 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs
  • 17:50 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-codfw and A:lvs
  • 17:49 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-codfw and A:lvs
  • 17:48 sbassett@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 17:48 sbassett@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 17:48 sbassett@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 17:48 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-codfw and A:lvs
  • 17:47 sbassett@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 17:47 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1055.eqiad.wmnet with reason: host reimage
  • 17:47 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-codfw and A:lvs
  • 17:47 sbassett@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 17:47 sbassett@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 17:46 sbassett@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 17:46 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 17:46 sbassett@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 17:46 sbassett@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 17:46 sbassett@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 17:45 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 17:44 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 17:44 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1055.eqiad.wmnet with reason: host reimage
  • 17:43 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 17:34 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=apus,dc=codfw
  • 17:33 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 17:32 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 17:28 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1055.eqiad.wmnet with OS bookworm
  • 17:28 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 17:27 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 17:23 sukhe: restart pybal on lvs2013
  • 17:20 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 17:19 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 17:18 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching P{P:cassandra%rack = "b"} and A:restbase and A:eqiad: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 17:13 sukhe: restart pybal on lvs1020 and lvs1019
  • 17:09 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 17:08 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 16:55 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 16:51 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 16:49 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 16:48 sukhe: restart pybal on lvs1020
  • 16:47 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 16:44 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1054.eqiad.wmnet with reason: host reimage
  • 16:41 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1054.eqiad.wmnet with reason: host reimage
  • 16:33 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase[1031,1034-1036].eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 16:27 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1054.eqiad.wmnet with OS bookworm
  • 16:20 sukhe: restart pybal on lvs1020
  • 16:01 dancy@deploy1002: Installation of scap version "4.89.0" completed for 1 hosts
  • 16:00 dancy@deploy1002: Installing scap version "4.89.0" for 1 hosts
  • 15:59 sukhe: restart pybal on lvs1020
  • 15:59 dancy@deploy1002: Installing scap version "4.89.0" for 248 hosts
  • 15:57 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase[1031,1034-1036].eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 15:50 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs1010.eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 15:49 sukhe: restart pybal on lvs1020
  • 15:43 vgutierrez: updated termination_state cache haproxy metrics, expect higher CD and CR rates - T367963
  • 15:42 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs1010.eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 15:29 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: sync
  • 15:29 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: sync
  • 15:20 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: sync
  • 15:20 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: sync
  • 15:17 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
  • 15:16 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: sync
  • 15:16 elukey@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: sync
  • 15:15 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
  • 15:15 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
  • 15:13 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
  • 15:11 mvernon@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs (T279621)
  • 15:11 claime: Enabling statsd-exporter on mw-jobrunner - T365265
  • 15:11 mvernon@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs (T279621)
  • 15:09 vgutierrez: rolling upgrade of fifo-log-demux on A:cp-drmrs - T364383
  • 15:08 Emperor: enable/run puppet on eqiad lvs for apus LVS rollout T279621
  • 15:08 Dreamy_Jazz: Afternoon UTC backport window done
  • 15:08 dreamyjazz@deploy1002: Finished scap: Backport for extension-list: Add IPReputation (T360067) (duration: 30m 37s)
  • 15:07 vgutierrez: [fixed url] disable puppet on A:cp-drmrs before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1049198 - T364383
  • 15:06 vgutierrez: disable puppet on A:cp-drmrs before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1049104 - T364383
  • 15:02 sukhe: restart pybal on lvs2014
  • 15:01 mvernon@cumin1002: END (ERROR) - Cookbook sre.loadbalancer.restart-pybal (exit_code=97) rolling-restart of pybal on A:lvs-secondary-codfw or A:lvs-low-traffic-codfw and A:lvs (T279621)
  • 15:00 dreamyjazz@deploy1002: kharlan, dreamyjazz: Continuing with sync
  • 14:57 dreamyjazz@deploy1002: kharlan, dreamyjazz: Backport for extension-list: Add IPReputation (T360067) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:56 mvernon@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-codfw or A:lvs-low-traffic-codfw and A:lvs (T279621)
  • 14:53 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4052.ulsfo.wmnet
  • 14:52 Emperor: enable/run puppet on codfw lvs for apus LVS rollout T279621
  • 14:49 Emperor: stop puppet on eqiad/codfw lvs prior to apus LVS rollout T279621
  • 14:48 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on cp4052.ulsfo.wmnet with reason: Upgrade glibc
  • 14:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 0:15:00 on cp4052.ulsfo.wmnet with reason: Upgrade glibc
  • 14:47 elukey: depool cp4052 to deploy a new version of glibc - T367978
  • 14:47 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4052.ulsfo.wmnet
  • 14:37 dreamyjazz@deploy1002: Started scap: Backport for extension-list: Add IPReputation (T360067)
  • 14:34 urbanecm@deploy1002: Finished scap: Backport for ptwiki: Undeploy CommunityConfiguration (T368121) (duration: 07m 31s)
  • 14:32 mnz@deploy1002: Finished deploy [airflow-dags/research@b682892]: (no justification provided) (duration: 00m 31s)
  • 14:31 mnz@deploy1002: Started deploy [airflow-dags/research@b682892]: (no justification provided)
  • 14:27 urbanecm@deploy1002: Started scap: Backport for ptwiki: Undeploy CommunityConfiguration (T368121)
  • 14:27 Dreamy_Jazz: Restarting MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 14:17 mvernon@cumin1002: conftool action : set/pooled=yes:weight=40; selector: cluster=apus
  • 14:13 urbanecm@deploy1002: Finished scap: Backport for Growth: Enable CommunityConfiguration at idwiki (T366629), Growth: Enable CommunityConfiguration on round 1 wikis (T368121), AX Language selector entrypoint: Fix AX URL (T363183) (duration: 25m 37s)
  • 14:07 urbanecm@deploy1002: kartik, urbanecm: Continuing with sync
  • 14:03 sukhe: running homer in cr*{eqiad*,codfw*} to remove ntp.anycast.wmnet from policies/cr-labs: T366360
  • 13:57 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 13:56 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 13:49 urbanecm@deploy1002: kartik, urbanecm: Backport for Growth: Enable CommunityConfiguration at idwiki (T366629), Growth: Enable CommunityConfiguration on round 1 wikis (T368121), AX Language selector entrypoint: Fix AX URL (T363183) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:47 urbanecm@deploy1002: Started scap: Backport for Growth: Enable CommunityConfiguration at idwiki (T366629), Growth: Enable CommunityConfiguration on round 1 wikis (T368121), AX Language selector entrypoint: Fix AX URL (T363183)
  • 13:47 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 13:44 urbanecm@deploy1002: Finished scap: Backport for mediawiki.org: Sync xml/export-*.xsd files with MW core (T343622), CommonSettings: Restore the original behaviour of Reference Previews (T366419), [MediaModeration] Update 'From' email address to wiki@wikimedia.org (T368258) (duration: 35m 30s)
  • 13:41 vgutierrez: disable puppet on A:cp-magru before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1049178 - T364383
  • 13:40 vgutierrez: rolling upgrade of fifo-log-demux on A:cp-magru - T364383
  • 13:40 elukey: [correction] depool cp4037 to deploy a new version of glibc - T367978
  • 13:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on cp4037.ulsfo.wmnet with reason: Upgrade glibc
  • 13:39 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 0:15:00 on cp4037.ulsfo.wmnet with reason: Upgrade glibc
  • 13:39 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 13:37 urbanecm@deploy1002: urbanecm, func, dreamyjazz: Continuing with sync
  • 13:35 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 13:35 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 13:32 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4034.ulsfo.wmnet
  • 13:31 elukey: depool cp4034 to deploy a new version of glibc - T367978
  • 13:29 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 13:29 elukey: uploaded debmonitor-client_0.4.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia,bookworm-wikimedia
  • 13:29 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 13:24 urbanecm@deploy1002: urbanecm, func, dreamyjazz: Backport for mediawiki.org: Sync xml/export-*.xsd files with MW core (T343622), CommonSettings: Restore the original behaviour of Reference Previews (T366419), [MediaModeration] Update 'From' email address to wiki@wikimedia.org (T368258) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:22 mnz@deploy1002: Finished deploy [airflow-dags/research@1996a7a]: (no justification provided) (duration: 00m 32s)
  • 13:22 mnz@deploy1002: Started deploy [airflow-dags/research@1996a7a]: (no justification provided)
  • 13:08 urbanecm@deploy1002: Started scap: Backport for mediawiki.org: Sync xml/export-*.xsd files with MW core (T343622), CommonSettings: Restore the original behaviour of Reference Previews (T366419), [MediaModeration] Update 'From' email address to wiki@wikimedia.org (T368258)
  • 12:53 vgutierrez: IPIP encapsulation enabled on ldap-ro.codfw.wikimedia.org. - T367861
  • 12:50 vgutierrez: rolling restart of pybal on lvs2014 and lvs2012 - T367861
  • 12:23 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 12:21 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 12:17 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 12:16 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=appserver,name=mw1364.eqiad.wmnet
  • 12:16 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: dc=codfw,cluster=appserver,name=mw2276.codfw.wmnet
  • 12:15 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=api_appserver,name=mw1398.eqiad.wmnet
  • 12:15 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: dc=codfw,cluster=api_appserver,name=mw2299.codfw.wmnet
  • 12:13 moritzm: installing pymysql security updates
  • 12:11 Dreamy_Jazz: Restarting MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 12:10 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mwdebug[2001-2002].codfw.wmnet,mwdebug[1001-1002].eqiad.wmnet
  • 12:10 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mwdebug[2001-2002].codfw.wmnet,mwdebug[1001-1002].eqiad.wmnet
  • 12:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 21 days, 0:00:00 on 31 hosts with reason: Waiting for reimage to kubernetes
  • 12:10 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 12:09 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 21 days, 0:00:00 on 31 hosts with reason: Waiting for reimage to kubernetes
  • 12:09 claime: Downtiming all legacy api_appserver and appserver - T368058
  • 12:07 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: cluster=api_appserver
  • 12:07 claime: Setting all legacy api_appservers to inactive - T368058
  • 12:07 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: cluster=appserver
  • 12:06 claime: Setting all legacy appservers to inactive - T368058
  • 12:05 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 12:04 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 12:01 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 11:59 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 11:57 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 11:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 11:55 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 11:52 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2360.codfw.wmnet
  • 11:52 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2360.codfw.wmnet
  • 11:52 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2358.codfw.wmnet
  • 11:52 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2358.codfw.wmnet
  • 11:52 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 11:52 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2339.codfw.wmnet
  • 11:52 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2339.codfw.wmnet
  • 11:51 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw1406.eqiad.wmnet
  • 11:51 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw1406.eqiad.wmnet
  • 11:51 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply
  • 11:51 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw1403.eqiad.wmnet
  • 11:51 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw1403.eqiad.wmnet
  • 11:51 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
  • 11:50 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
  • 11:49 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/push-notifications: apply
  • 11:49 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 11:46 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 11:45 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
  • 11:44 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/push-notifications: apply
  • 11:44 moritzm: installing php8.2 security updates
  • 11:42 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 11:26 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 11:13 btullis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:13 btullis@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove AAAA records from an-redacteddb1001 - btullis@cumin1002"
  • 11:01 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove AAAA records from an-redacteddb1001 - btullis@cumin1002"
  • 10:58 btullis@cumin1002: START - Cookbook sre.dns.netbox
  • 10:51 cgoubert@cumin1002: conftool action : set/pooled=yes:weight=10; selector: name=(mw1420.eqiad.wmnet|mw1407.eqiad.wmnet),dc=eqiad,cluster=jobrunner
  • 10:50 cgoubert@cumin1002: conftool action : set/pooled=no:weight=10; selector: name=(mw1420.eqiad.wmnet|mw1407.eqiad.wmnet),dc=eqiad,cluster=jobrunner
  • 10:41 cgoubert@cumin1002: conftool action : set/pooled=no:weight=10; selector: name=(mw1420.eqiad.wmnet|mw1407.eqiad.wmnet),dc=eqiad,cluster=videoscaler
  • 10:41 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/push-notifications: apply
  • 10:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1420.eqiad.wmnet
  • 10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw1420.eqiad.wmnet
  • 10:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1407.eqiad.wmnet
  • 10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw1407.eqiad.wmnet
  • 10:39 claime: pooling mw1420.eqiad.wmnet,mw1407.eqiad.wmnet as videoscalers - T368058
  • 10:36 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1407.eqiad.wmnet with OS buster
  • 10:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1420.eqiad.wmnet with OS buster
  • 10:14 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:10 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 10:04 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 10:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1407.eqiad.wmnet with reason: host reimage
  • 09:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1420.eqiad.wmnet with reason: host reimage
  • 09:57 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 09:56 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1407.eqiad.wmnet with reason: host reimage
  • 09:56 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1420.eqiad.wmnet with reason: host reimage
  • 09:46 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 09:45 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 09:44 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 09:44 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 09:44 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 09:44 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 09:43 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 09:42 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1420.eqiad.wmnet with OS buster
  • 09:42 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 09:42 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 09:41 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1407.eqiad.wmnet with OS buster
  • 09:39 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 09:34 claime: Reimaging scap::proxies, mediawiki deployments may be unavailable - T368058
  • 09:33 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 09:33 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 09:22 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:20 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for checker.tools.wmflabs.org
  • 09:20 taavi@cumin1002: START - Cookbook sre.hosts.remove-downtime for checker.tools.wmflabs.org
  • 09:19 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:17 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on checker.tools.wmflabs.org with reason: rebooting the toolschecker VM
  • 09:17 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on checker.tools.wmflabs.org with reason: rebooting the toolschecker VM
  • 09:10 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:10 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 05:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 05:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 05:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T367856)', diff saved to https://phabricator.wikimedia.org/P65388 and previous config saved to /var/cache/conftool/dbconfig/20240624-050309-marostegui.json
  • 04:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P65387 and previous config saved to /var/cache/conftool/dbconfig/20240624-044802-marostegui.json
  • 04:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P65386 and previous config saved to /var/cache/conftool/dbconfig/20240624-043254-marostegui.json
  • 04:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T367856)', diff saved to https://phabricator.wikimedia.org/P65385 and previous config saved to /var/cache/conftool/dbconfig/20240624-041747-marostegui.json
  • 01:59 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1224 (T367856)', diff saved to https://phabricator.wikimedia.org/P65384 and previous config saved to /var/cache/conftool/dbconfig/20240624-015859-marostegui.json
  • 01:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 01:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 01:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T367856)', diff saved to https://phabricator.wikimedia.org/P65383 and previous config saved to /var/cache/conftool/dbconfig/20240624-015836-marostegui.json
  • 01:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P65382 and previous config saved to /var/cache/conftool/dbconfig/20240624-014329-marostegui.json
  • 01:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P65381 and previous config saved to /var/cache/conftool/dbconfig/20240624-012822-marostegui.json
  • 01:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T367856)', diff saved to https://phabricator.wikimedia.org/P65380 and previous config saved to /var/cache/conftool/dbconfig/20240624-011315-marostegui.json

2024-06-23

  • 22:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1201 (T367856)', diff saved to https://phabricator.wikimedia.org/P65379 and previous config saved to /var/cache/conftool/dbconfig/20240623-225008-marostegui.json
  • 22:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 22:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 22:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T367856)', diff saved to https://phabricator.wikimedia.org/P65378 and previous config saved to /var/cache/conftool/dbconfig/20240623-224946-marostegui.json
  • 22:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P65377 and previous config saved to /var/cache/conftool/dbconfig/20240623-223439-marostegui.json
  • 22:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P65376 and previous config saved to /var/cache/conftool/dbconfig/20240623-221932-marostegui.json
  • 22:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T367856)', diff saved to https://phabricator.wikimedia.org/P65375 and previous config saved to /var/cache/conftool/dbconfig/20240623-220426-marostegui.json
  • 19:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T367856)', diff saved to https://phabricator.wikimedia.org/P65374 and previous config saved to /var/cache/conftool/dbconfig/20240623-193306-marostegui.json
  • 19:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 19:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 19:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T367856)', diff saved to https://phabricator.wikimedia.org/P65373 and previous config saved to /var/cache/conftool/dbconfig/20240623-193244-marostegui.json
  • 19:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P65372 and previous config saved to /var/cache/conftool/dbconfig/20240623-191737-marostegui.json
  • 19:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P65371 and previous config saved to /var/cache/conftool/dbconfig/20240623-190230-marostegui.json
  • 18:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T367856)', diff saved to https://phabricator.wikimedia.org/P65370 and previous config saved to /var/cache/conftool/dbconfig/20240623-184722-marostegui.json
  • 16:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T367856)', diff saved to https://phabricator.wikimedia.org/P65369 and previous config saved to /var/cache/conftool/dbconfig/20240623-161243-marostegui.json
  • 16:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 16:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 16:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T367856)', diff saved to https://phabricator.wikimedia.org/P65368 and previous config saved to /var/cache/conftool/dbconfig/20240623-161221-marostegui.json
  • 15:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P65367 and previous config saved to /var/cache/conftool/dbconfig/20240623-155714-marostegui.json
  • 15:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P65366 and previous config saved to /var/cache/conftool/dbconfig/20240623-154207-marostegui.json
  • 15:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T367856)', diff saved to https://phabricator.wikimedia.org/P65365 and previous config saved to /var/cache/conftool/dbconfig/20240623-152700-marostegui.json
  • 12:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1173 (T367856)', diff saved to https://phabricator.wikimedia.org/P65364 and previous config saved to /var/cache/conftool/dbconfig/20240623-124522-marostegui.json
  • 12:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 12:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 12:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T367856)', diff saved to https://phabricator.wikimedia.org/P65363 and previous config saved to /var/cache/conftool/dbconfig/20240623-124459-marostegui.json
  • 12:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P65362 and previous config saved to /var/cache/conftool/dbconfig/20240623-122952-marostegui.json
  • 12:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P65361 and previous config saved to /var/cache/conftool/dbconfig/20240623-121445-marostegui.json
  • 11:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T367856)', diff saved to https://phabricator.wikimedia.org/P65360 and previous config saved to /var/cache/conftool/dbconfig/20240623-115938-marostegui.json
  • 11:06 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: sync
  • 11:06 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: sync
  • 09:28 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T367856)', diff saved to https://phabricator.wikimedia.org/P65359 and previous config saved to /var/cache/conftool/dbconfig/20240623-092833-marostegui.json
  • 09:28 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 09:28 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 09:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T367856)', diff saved to https://phabricator.wikimedia.org/P65358 and previous config saved to /var/cache/conftool/dbconfig/20240623-092811-marostegui.json
  • 09:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P65357 and previous config saved to /var/cache/conftool/dbconfig/20240623-091304-marostegui.json
  • 08:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P65356 and previous config saved to /var/cache/conftool/dbconfig/20240623-085757-marostegui.json
  • 08:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T367856)', diff saved to https://phabricator.wikimedia.org/P65355 and previous config saved to /var/cache/conftool/dbconfig/20240623-084250-marostegui.json
  • 06:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T367856)', diff saved to https://phabricator.wikimedia.org/P65354 and previous config saved to /var/cache/conftool/dbconfig/20240623-060520-marostegui.json
  • 06:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 06:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 06:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 06:04 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance

2024-06-22

  • 20:28 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
  • 20:28 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
  • 16:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 16:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 16:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T364069)', diff saved to https://phabricator.wikimedia.org/P65353 and previous config saved to /var/cache/conftool/dbconfig/20240622-161841-marostegui.json
  • 16:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P65352 and previous config saved to /var/cache/conftool/dbconfig/20240622-160333-marostegui.json
  • 15:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P65351 and previous config saved to /var/cache/conftool/dbconfig/20240622-154826-marostegui.json
  • 15:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T364069)', diff saved to https://phabricator.wikimedia.org/P65350 and previous config saved to /var/cache/conftool/dbconfig/20240622-153318-marostegui.json
  • 12:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2195 (T364069)', diff saved to https://phabricator.wikimedia.org/P65349 and previous config saved to /var/cache/conftool/dbconfig/20240622-120437-marostegui.json
  • 12:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance
  • 12:04 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance
  • 12:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65348 and previous config saved to /var/cache/conftool/dbconfig/20240622-120404-marostegui.json
  • 11:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P65347 and previous config saved to /var/cache/conftool/dbconfig/20240622-114857-marostegui.json
  • 11:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P65346 and previous config saved to /var/cache/conftool/dbconfig/20240622-113350-marostegui.json
  • 11:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65345 and previous config saved to /var/cache/conftool/dbconfig/20240622-111842-marostegui.json
  • 06:48 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65344 and previous config saved to /var/cache/conftool/dbconfig/20240622-064802-marostegui.json
  • 06:47 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 06:47 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 06:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T364069)', diff saved to https://phabricator.wikimedia.org/P65343 and previous config saved to /var/cache/conftool/dbconfig/20240622-064739-marostegui.json
  • 06:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P65342 and previous config saved to /var/cache/conftool/dbconfig/20240622-063232-marostegui.json
  • 06:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P65341 and previous config saved to /var/cache/conftool/dbconfig/20240622-061725-marostegui.json
  • 06:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T364069)', diff saved to https://phabricator.wikimedia.org/P65340 and previous config saved to /var/cache/conftool/dbconfig/20240622-060216-marostegui.json
  • 05:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2197.codfw.wmnet with reason: Long schema change
  • 05:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2197.codfw.wmnet with reason: Long schema change
  • 01:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2167 (T364069)', diff saved to https://phabricator.wikimedia.org/P65339 and previous config saved to /var/cache/conftool/dbconfig/20240622-015020-marostegui.json
  • 01:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 01:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 01:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T364069)', diff saved to https://phabricator.wikimedia.org/P65338 and previous config saved to /var/cache/conftool/dbconfig/20240622-014958-marostegui.json
  • 01:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P65337 and previous config saved to /var/cache/conftool/dbconfig/20240622-013451-marostegui.json
  • 01:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P65336 and previous config saved to /var/cache/conftool/dbconfig/20240622-011943-marostegui.json
  • 01:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T364069)', diff saved to https://phabricator.wikimedia.org/P65335 and previous config saved to /var/cache/conftool/dbconfig/20240622-010436-marostegui.json

2024-06-21

  • 23:54 cwhite: delete remaining 2024.03 log indexes to make room on logstash eqiad and codfw T368180
  • 23:43 brett@puppetmaster1001: dbctl commit (dc=all): 'set db1206 s1 weight to 1 - T368098', diff saved to https://phabricator.wikimedia.org/P65334 and previous config saved to /var/cache/conftool/dbconfig/20240621-234328-brett.json
  • 23:28 brett: # dbctl instance db1206 set-weight 10 --section s1
  • 21:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T367856)', diff saved to https://phabricator.wikimedia.org/P65333 and previous config saved to /var/cache/conftool/dbconfig/20240621-213503-marostegui.json
  • 21:31 cwhite: restart apache2 on gerrit1003
  • 21:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P65332 and previous config saved to /var/cache/conftool/dbconfig/20240621-211956-marostegui.json
  • 21:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P65331 and previous config saved to /var/cache/conftool/dbconfig/20240621-210448-marostegui.json
  • 20:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T367856)', diff saved to https://phabricator.wikimedia.org/P65330 and previous config saved to /var/cache/conftool/dbconfig/20240621-204941-marostegui.json
  • 20:37 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2166 (T364069)', diff saved to https://phabricator.wikimedia.org/P65329 and previous config saved to /var/cache/conftool/dbconfig/20240621-203659-marostegui.json
  • 20:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 20:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 20:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T364069)', diff saved to https://phabricator.wikimedia.org/P65328 and previous config saved to /var/cache/conftool/dbconfig/20240621-203636-marostegui.json
  • 20:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P65327 and previous config saved to /var/cache/conftool/dbconfig/20240621-202129-marostegui.json
  • 20:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P65326 and previous config saved to /var/cache/conftool/dbconfig/20240621-200622-marostegui.json
  • 19:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T364069)', diff saved to https://phabricator.wikimedia.org/P65325 and previous config saved to /var/cache/conftool/dbconfig/20240621-195115-marostegui.json
  • 19:43 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 16:00 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:59 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 15:41 elukey@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 15:41 elukey@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 15:40 elukey@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 15:39 elukey@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 15:37 elukey@cumin1002: END (FAIL) - Cookbook sre.netbox.update-extras (exit_code=1) rolling restart_daemons on A:netbox-canary
  • 15:37 elukey@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 15:20 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2164 (T364069)', diff saved to https://phabricator.wikimedia.org/P65322 and previous config saved to /var/cache/conftool/dbconfig/20240621-152038-marostegui.json
  • 15:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 15:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 15:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 15:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 15:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T364069)', diff saved to https://phabricator.wikimedia.org/P65321 and previous config saved to /var/cache/conftool/dbconfig/20240621-152011-marostegui.json
  • 15:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P65319 and previous config saved to /var/cache/conftool/dbconfig/20240621-150504-marostegui.json
  • 15:01 ejegg: fundraising civicrm upgraded from 8a0b5bea to 13a13f3a
  • 14:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P65318 and previous config saved to /var/cache/conftool/dbconfig/20240621-144957-marostegui.json
  • 14:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T364069)', diff saved to https://phabricator.wikimedia.org/P65317 and previous config saved to /var/cache/conftool/dbconfig/20240621-143450-marostegui.json
  • 14:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T367856)', diff saved to https://phabricator.wikimedia.org/P65314 and previous config saved to /var/cache/conftool/dbconfig/20240621-141050-marostegui.json
  • 14:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2217.codfw.wmnet with reason: Maintenance
  • 14:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2217.codfw.wmnet with reason: Maintenance
  • 14:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T367856)', diff saved to https://phabricator.wikimedia.org/P65313 and previous config saved to /var/cache/conftool/dbconfig/20240621-141028-marostegui.json
  • 13:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P65312 and previous config saved to /var/cache/conftool/dbconfig/20240621-135521-marostegui.json
  • 13:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P65309 and previous config saved to /var/cache/conftool/dbconfig/20240621-134013-marostegui.json
  • 13:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T367856)', diff saved to https://phabricator.wikimedia.org/P65306 and previous config saved to /var/cache/conftool/dbconfig/20240621-132506-marostegui.json
  • 13:21 btullis@deploy1002: Finished deploy [performance/asoranking@febfb9f]: (no justification provided) (duration: 00m 04s)
  • 13:21 btullis@deploy1002: Started deploy [performance/asoranking@febfb9f]: (no justification provided)
  • 13:08 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 13:07 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 13:07 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 13:07 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 13:07 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 13:06 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 11:37 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) shellbox-video.discovery.wmnet on all recursors
  • 11:37 hnowlan@cumin1002: START - Cookbook sre.dns.wipe-cache shellbox-video.discovery.wmnet on all recursors
  • 11:06 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2214 (T367856)', diff saved to https://phabricator.wikimedia.org/P65303 and previous config saved to /var/cache/conftool/dbconfig/20240621-110638-marostegui.json
  • 11:06 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2214.codfw.wmnet with reason: Maintenance
  • 11:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2214.codfw.wmnet with reason: Maintenance
  • 10:57 Emperor: restart swift-proxy on ms-fe2011 ms-fe2012 T360913
  • 10:56 Emperor: restart swift-proxy on ms-fe1010 T360913
  • 10:36 kamila@cumin1002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl2002.codfw.wmnet
  • 10:36 kamila@cumin1002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl2001.codfw.wmnet
  • 10:28 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
  • 10:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T364069)', diff saved to https://phabricator.wikimedia.org/P65302 and previous config saved to /var/cache/conftool/dbconfig/20240621-100554-marostegui.json
  • 10:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 10:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 10:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T364069)', diff saved to https://phabricator.wikimedia.org/P65301 and previous config saved to /var/cache/conftool/dbconfig/20240621-100531-marostegui.json
  • 09:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P65300 and previous config saved to /var/cache/conftool/dbconfig/20240621-095024-marostegui.json
  • 09:45 brouberol@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12 days, 0:00:00 on karapace[1001-1002].eqiad.wmnet with reason: The hosts are soon to be decommissioned
  • 09:45 brouberol@cumin2002: START - Cookbook sre.hosts.downtime for 12 days, 0:00:00 on karapace[1001-1002].eqiad.wmnet with reason: The hosts are soon to be decommissioned
  • 09:41 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1053.eqiad.wmnet with OS bookworm
  • 09:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P65299 and previous config saved to /var/cache/conftool/dbconfig/20240621-093517-marostegui.json
  • 09:31 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240603/ using stat1009.eqiad.wmnet)
  • 09:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T364069)', diff saved to https://phabricator.wikimedia.org/P65298 and previous config saved to /var/cache/conftool/dbconfig/20240621-092009-marostegui.json
  • 09:16 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
  • 09:14 aborrero@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
  • 09:02 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: host reimage
  • 08:57 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: host reimage
  • 08:56 aborrero@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1053.eqiad.wmnet with OS bookworm
  • 08:47 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1053.eqiad.wmnet
  • 08:41 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
  • 08:39 aborrero@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudvirt1053.eqiad.wmnet
  • 08:14 vgutierrez: restarting logrotate.service on cp[3068,3070-3071].esams.wmnet
  • 08:04 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 08:04 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 08:03 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 08:03 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 08:00 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 08:00 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 07:54 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: repool to fill up vslow/dump', diff saved to https://phabricator.wikimedia.org/P65297 and previous config saved to /var/cache/conftool/dbconfig/20240621-075404-arnaudb.json
  • 07:38 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: repool to fill up vslow/dump', diff saved to https://phabricator.wikimedia.org/P65296 and previous config saved to /var/cache/conftool/dbconfig/20240621-073858-arnaudb.json
  • 07:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: repool to fill up vslow/dump', diff saved to https://phabricator.wikimedia.org/P65295 and previous config saved to /var/cache/conftool/dbconfig/20240621-072353-arnaudb.json
  • 07:08 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: repool to fill up vslow/dump', diff saved to https://phabricator.wikimedia.org/P65294 and previous config saved to /var/cache/conftool/dbconfig/20240621-070847-arnaudb.json
  • 07:04 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool for debugging T368098', diff saved to https://phabricator.wikimedia.org/P65293 and previous config saved to /var/cache/conftool/dbconfig/20240621-070358-arnaudb.json
  • 04:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2162 (T364069)', diff saved to https://phabricator.wikimedia.org/P65292 and previous config saved to /var/cache/conftool/dbconfig/20240621-045107-marostegui.json
  • 04:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 04:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 04:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T364069)', diff saved to https://phabricator.wikimedia.org/P65291 and previous config saved to /var/cache/conftool/dbconfig/20240621-045044-marostegui.json
  • 04:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 04:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 04:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T367856)', diff saved to https://phabricator.wikimedia.org/P65290 and previous config saved to /var/cache/conftool/dbconfig/20240621-044455-marostegui.json
  • 04:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P65289 and previous config saved to /var/cache/conftool/dbconfig/20240621-043537-marostegui.json
  • 04:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P65288 and previous config saved to /var/cache/conftool/dbconfig/20240621-042948-marostegui.json
  • 04:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P65287 and previous config saved to /var/cache/conftool/dbconfig/20240621-042030-marostegui.json
  • 04:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P65286 and previous config saved to /var/cache/conftool/dbconfig/20240621-041441-marostegui.json
  • 04:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T364069)', diff saved to https://phabricator.wikimedia.org/P65285 and previous config saved to /var/cache/conftool/dbconfig/20240621-040523-marostegui.json
  • 03:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T367856)', diff saved to https://phabricator.wikimedia.org/P65284 and previous config saved to /var/cache/conftool/dbconfig/20240621-035934-marostegui.json
  • 03:04 ejegg: fundraising civicrm upgraded from 2e1db811 to 8a0b5bea
  • 01:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T367856)', diff saved to https://phabricator.wikimedia.org/P65283 and previous config saved to /var/cache/conftool/dbconfig/20240621-014545-marostegui.json
  • 01:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 01:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 01:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T367856)', diff saved to https://phabricator.wikimedia.org/P65282 and previous config saved to /var/cache/conftool/dbconfig/20240621-014523-marostegui.json
  • 01:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P65281 and previous config saved to /var/cache/conftool/dbconfig/20240621-013016-marostegui.json
  • 01:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P65280 and previous config saved to /var/cache/conftool/dbconfig/20240621-011509-marostegui.json
  • 01:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T367856)', diff saved to https://phabricator.wikimedia.org/P65279 and previous config saved to /var/cache/conftool/dbconfig/20240621-010002-marostegui.json
  • 00:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T352010)', diff saved to https://phabricator.wikimedia.org/P65278 and previous config saved to /var/cache/conftool/dbconfig/20240621-005237-ladsgroup.json
  • 00:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P65277 and previous config saved to /var/cache/conftool/dbconfig/20240621-003730-ladsgroup.json
  • 00:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P65276 and previous config saved to /var/cache/conftool/dbconfig/20240621-002223-ladsgroup.json
  • 00:08 mutante: [cp3072:~] $ sudo systemctl start varnishkafka-webrequest.service
  • 00:08 mutante: [cp3067:~] $ sudo systemctl start logrotate
  • 00:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T352010)', diff saved to https://phabricator.wikimedia.org/P65275 and previous config saved to /var/cache/conftool/dbconfig/20240621-000716-ladsgroup.json
  • 00:00 sukhe: restarting haproxy on cp3068 and cp3072

2024-06-20

  • 23:47 zabe@deploy1002: Finished scap: Update interwiki cache (duration: 10m 12s)
  • 23:36 zabe@deploy1002: Started scap: Update interwiki cache
  • 23:35 zabe: zabe@mwmaint1002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=btmwiki --cluster=all 2>&1 | tee /tmp/btmwiki.UpdateSearchIndexConfig.log # T368038
  • 23:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2161 (T364069)', diff saved to https://phabricator.wikimedia.org/P65274 and previous config saved to /var/cache/conftool/dbconfig/20240620-233346-marostegui.json
  • 23:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 23:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 23:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T364069)', diff saved to https://phabricator.wikimedia.org/P65273 and previous config saved to /var/cache/conftool/dbconfig/20240620-233324-marostegui.json
  • 23:33 zabe@deploy1002: Finished scap: Creating btmwiki (T368038) (duration: 12m 20s)
  • 23:20 zabe@deploy1002: Started scap: Creating btmwiki (T368038)
  • 23:20 zabe: create Wikipedia Mandailing # T368038
  • 23:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P65272 and previous config saved to /var/cache/conftool/dbconfig/20240620-231817-marostegui.json
  • 23:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P65271 and previous config saved to /var/cache/conftool/dbconfig/20240620-230310-marostegui.json
  • 22:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T364069)', diff saved to https://phabricator.wikimedia.org/P65270 and previous config saved to /var/cache/conftool/dbconfig/20240620-224803-marostegui.json
  • 22:39 mutante: aphlict1002/aphlict2001 - systemctl stop aphlict_lograte.timer (and .service); systemctl disable aphlict_logrotate.timer (and .service); systemctl daemon-reload; systemctl reset-failed T367960
  • 22:33 zabe@deploy1002: Synchronized wmf-config/InitialiseSettings.php: T361041 T363825 T366649 (duration: 09m 55s)
  • 22:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T367856)', diff saved to https://phabricator.wikimedia.org/P65269 and previous config saved to /var/cache/conftool/dbconfig/20240620-222909-marostegui.json
  • 22:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 22:28 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 22:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T367856)', diff saved to https://phabricator.wikimedia.org/P65268 and previous config saved to /var/cache/conftool/dbconfig/20240620-222847-marostegui.json
  • 22:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P65267 and previous config saved to /var/cache/conftool/dbconfig/20240620-221340-marostegui.json
  • 21:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P65266 and previous config saved to /var/cache/conftool/dbconfig/20240620-215833-marostegui.json
  • 21:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T367856)', diff saved to https://phabricator.wikimedia.org/P65265 and previous config saved to /var/cache/conftool/dbconfig/20240620-214326-marostegui.json
  • 21:12 ebernhardson@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:12 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
  • 21:12 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
  • 21:11 ebernhardson@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:10 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
  • 21:09 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
  • 21:09 brett: Include ncmonitor 1.0.0 in wikimedia-bookworm apt repo
  • 21:09 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:08 ebernhardson@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:08 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:08 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
  • 21:07 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
  • 21:07 ebernhardson@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:06 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
  • 21:06 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
  • 21:05 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
  • 21:04 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
  • 21:03 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
  • 21:03 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
  • 20:53 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on elastic1105.eqiad.wmnet with reason: T348977
  • 20:53 bking@cumin2002: START - Cookbook sre.hosts.downtime for 6 days, 0:00:00 on elastic1105.eqiad.wmnet with reason: T348977
  • 20:44 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
  • 20:44 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/page-analytics: apply
  • 20:43 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
  • 20:42 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/media-analytics: apply
  • 20:40 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
  • 20:40 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
  • 20:39 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
  • 20:38 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
  • 20:36 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
  • 20:36 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
  • 20:34 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
  • 20:33 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
  • 20:28 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
  • 20:27 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/page-analytics: apply
  • 20:27 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
  • 20:27 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/media-analytics: apply
  • 20:27 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
  • 20:26 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
  • 20:26 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
  • 20:26 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
  • 20:25 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
  • 20:25 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
  • 20:25 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 20:24 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 19:58 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 19:58 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 19:57 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 19:57 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 19:57 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 19:57 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 19:57 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 19:57 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 19:56 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 19:55 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 19:54 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 19:52 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 19:51 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 19:18 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1105* for T348977 - bking@cumin2002
  • 19:18 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1105* for T348977 - bking@cumin2002
  • 19:18 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: elastic1105 for T348977 - bking@cumin2002
  • 19:18 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1105 for T348977 - bking@cumin2002
  • 19:04 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host elastic2088.codfw.wmnet
  • 19:01 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 18:58 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 18:21 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1053.eqiad.wmnet with OS bookworm
  • 18:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T364069)', diff saved to https://phabricator.wikimedia.org/P65263 and previous config saved to /var/cache/conftool/dbconfig/20240620-181635-marostegui.json
  • 18:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 18:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 18:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T364069)', diff saved to https://phabricator.wikimedia.org/P65262 and previous config saved to /var/cache/conftool/dbconfig/20240620-181613-marostegui.json
  • 18:06 inflatador: bking@an-airflow1007 install `ripgrep` deb pkg
  • 18:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P65261 and previous config saved to /var/cache/conftool/dbconfig/20240620-180104-marostegui.json
  • 17:51 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
  • 17:48 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
  • 17:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P65260 and previous config saved to /var/cache/conftool/dbconfig/20240620-174557-marostegui.json
  • 17:44 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host elastic2088.codfw.wmnet
  • 17:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1236 (T352010)', diff saved to https://phabricator.wikimedia.org/P65259 and previous config saved to /var/cache/conftool/dbconfig/20240620-174125-ladsgroup.json
  • 17:41 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Maintenance
  • 17:41 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Maintenance
  • 17:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T364069)', diff saved to https://phabricator.wikimedia.org/P65258 and previous config saved to /var/cache/conftool/dbconfig/20240620-173050-marostegui.json
  • 17:30 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1053.eqiad.wmnet with OS bookworm
  • 17:15 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm
  • 17:15 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin1002"
  • 17:14 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin1002"
  • 16:34 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm
  • 16:33 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 75%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65256 and previous config saved to /var/cache/conftool/dbconfig/20240620-163348-arnaudb.json
  • 16:30 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1052.eqiad.wmnet with OS bookworm
  • 16:18 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 50%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65254 and previous config saved to /var/cache/conftool/dbconfig/20240620-161842-arnaudb.json
  • 16:18 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Fix Special:Notifications (T368029) (duration: 12m 21s)
  • 16:10 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, urbanecm: Continuing with sync
  • 16:10 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, urbanecm: Backport for Fix Special:Notifications (T368029) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:07 hnowlan@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1019*,lvs2013*} and A:lvs (T357309)
  • 16:06 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: test glibc updates - bking@cumin2002 - T367978
  • 16:06 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
  • 16:05 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Fix Special:Notifications (T368029)
  • 16:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1028.eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 16:03 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 25%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65253 and previous config saved to /var/cache/conftool/dbconfig/20240620-160337-arnaudb.json
  • 16:03 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
  • 16:02 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2282.codfw.wmnet
  • 16:01 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw2282.codfw.wmnet
  • 16:01 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=mw2282.codfw.wmnet,cluster=kubernetes,service=kubesvc
  • 16:00 claime: Repooling and uncordoning mw2282.codfw.wmnet following move - T361856
  • 15:59 hnowlan@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1019*,lvs2013*} and A:lvs (T357309)
  • 15:59 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 15:58 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:57 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2019.codfw.wmnet|wikikube-worker2020.codfw.wmnet|wikikube-worker2021.codfw.wmnet|wikikube-worker2022.codfw.wmnet|wikikube-worker2023.codfw.wmnet|wikikube-worker2024.codfw.wmnet),cluster=kubernetes,service=kubesvc
  • 15:57 claime: Pooling and uncordoning wikikube-worker2019.codfw.wmnet,wikikube-worker2020.codfw.wmnet,wikikube-worker2021.codfw.wmnet,wikikube-worker2022.codfw.wmnet,wikikube-worker2023.codfw.wmnet,wikikube-worker2024.codfw.wmnet - T351074
  • 15:55 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase1028.eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 15:55 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 15:55 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 15:54 hnowlan@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T357309)
  • 15:52 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: test glibc updates - bking@cumin2002 - T367978
  • 15:48 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 10%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65252 and previous config saved to /var/cache/conftool/dbconfig/20240620-154831-arnaudb.json
  • 15:46 hnowlan@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T357309)
  • 15:46 claime: homer 'cr*codfw*' commit 'T351074'
  • 15:45 cmooney@cumin1002: START - Cookbook sre.hosts.dhcp for host wikikube-ctrl2002.codfw.wmnet
  • 15:43 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1052.eqiad.wmnet with OS bookworm
  • 15:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2019.codfw.wmnet with OS bullseye
  • 15:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2020.codfw.wmnet with OS bullseye
  • 15:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2022.codfw.wmnet with OS bullseye
  • 15:34 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 15:33 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:33 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 5%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65251 and previous config saved to /var/cache/conftool/dbconfig/20240620-153326-arnaudb.json
  • 15:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2024.codfw.wmnet with OS bullseye
  • 15:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2023.codfw.wmnet with OS bullseye
  • 15:27 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2405.codfw.wmnet
  • 15:27 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2405.codfw.wmnet
  • 15:27 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2404.codfw.wmnet
  • 15:27 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2404.codfw.wmnet
  • 15:27 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2403.codfw.wmnet
  • 15:27 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2403.codfw.wmnet
  • 15:27 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2400.codfw.wmnet
  • 15:27 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2400.codfw.wmnet
  • 15:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2021.codfw.wmnet with OS bullseye
  • 15:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2019.codfw.wmnet with reason: host reimage
  • 15:23 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:20 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:19 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:19 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2020.codfw.wmnet with reason: host reimage
  • 15:18 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 2%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65249 and previous config saved to /var/cache/conftool/dbconfig/20240620-151820-arnaudb.json
  • 15:18 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2022.codfw.wmnet with reason: host reimage
  • 15:14 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 15:14 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2024.codfw.wmnet with reason: host reimage
  • 15:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2023.codfw.wmnet with reason: host reimage
  • 15:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2021.codfw.wmnet with reason: host reimage
  • 15:06 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:05 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:04 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore100[4-6].eqiad.wmnet: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 15:03 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2024.codfw.wmnet with reason: host reimage
  • 15:03 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2023.codfw.wmnet with reason: host reimage
  • 15:03 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2022.codfw.wmnet with reason: host reimage
  • 15:03 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2019.codfw.wmnet with reason: host reimage
  • 15:02 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2020.codfw.wmnet with reason: host reimage
  • 15:02 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2021.codfw.wmnet with reason: host reimage
  • 15:02 jhathaway@deploy1002: Finished scap: (no justification provided) (duration: 04m 15s)
  • 15:01 topranks: rebooting lsw1-e6-eqiad to upgrade JunOS on switch T365987
  • 15:01 jhathaway@deploy1002: Started scap: (no justification provided)
  • 14:58 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on an-worker[1160-1162].eqiad.wmnet,es1036.eqiad.wmnet,ms-be1077.eqiad.wmnet with reason: JunOS upgrade lsw1-e6-eqiad
  • 14:58 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on an-worker[1160-1162].eqiad.wmnet,es1036.eqiad.wmnet,ms-be1077.eqiad.wmnet with reason: JunOS upgrade lsw1-e6-eqiad
  • 14:58 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-e6-eqiad,lsw1-e6-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e6-eqiad
  • 14:57 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-e6-eqiad,lsw1-e6-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e6-eqiad
  • 14:57 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lsw1-f6-eqiad.mgmt
  • 14:57 cmooney@cumin1002: START - Cookbook sre.hosts.remove-downtime for lsw1-f6-eqiad.mgmt
  • 14:56 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-e6-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f6-eqiad
  • 14:56 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-e6-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f6-eqiad
  • 14:56 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
  • 14:54 bking@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1020.eqiad.wmnet
  • 14:54 bking@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs1020.eqiad.wmnet
  • 14:54 bking@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1018.eqiad.wmnet
  • 14:54 bking@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs1018.eqiad.wmnet
  • 14:54 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 14:53 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-f6-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f6-eqiad
  • 14:53 bking@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 6 hosts
  • 14:53 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-f6-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f6-eqiad
  • 14:53 bking@cumin2002: START - Cookbook sre.hosts.remove-downtime for 6 hosts
  • 14:48 sukhe: homer "*" commit "rolling out NTP ACL change"
  • 14:48 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
  • 14:48 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2024.codfw.wmnet with OS bullseye
  • 14:47 arnaudb@cumin1002: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: post T367854 repool', diff saved to https://phabricator.wikimedia.org/P65248 and previous config saved to /var/cache/conftool/dbconfig/20240620-144750-arnaudb.json
  • 14:47 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2023.codfw.wmnet with OS bullseye
  • 14:47 vgutierrez: rolling restart of pybal on lvs1020 and lvs1018 - T367511
  • 14:47 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2022.codfw.wmnet with OS bullseye
  • 14:47 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2021.codfw.wmnet with OS bullseye
  • 14:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2020.codfw.wmnet with OS bullseye
  • 14:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2364 to wikikube-worker2024
  • 14:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2024
  • 14:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2019.codfw.wmnet with OS bullseye
  • 14:46 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore100[4-6].eqiad.wmnet: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 14:45 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2024
  • 14:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2364 to wikikube-worker2024 - cgoubert@cumin1002"
  • 14:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T367856)', diff saved to https://phabricator.wikimedia.org/P65247 and previous config saved to /var/cache/conftool/dbconfig/20240620-144423-marostegui.json
  • 14:44 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2364 to wikikube-worker2024 - cgoubert@cumin1002"
  • 14:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 14:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 14:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 14:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 14:43 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
  • 14:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 14:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 14:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367856)', diff saved to https://phabricator.wikimedia.org/P65246 and previous config saved to /var/cache/conftool/dbconfig/20240620-144341-marostegui.json
  • 14:42 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore200[5-6].codfw.wmnet: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 14:40 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:40 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2364 to wikikube-worker2024
  • 14:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Main board swap — T362033
  • 14:39 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Main board swap — T362033
  • 14:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2363 to wikikube-worker2023
  • 14:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2023
  • 14:38 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1051.eqiad.wmnet with OS bookworm
  • 14:38 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
  • 14:37 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2023
  • 14:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2363 to wikikube-worker2023 - cgoubert@cumin1002"
  • 14:37 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2324.codfw.wmnet
  • 14:37 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2324.codfw.wmnet
  • 14:36 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2323.codfw.wmnet
  • 14:36 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2323.codfw.wmnet
  • 14:36 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw1489.eqiad.wmnet
  • 14:36 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw1489.eqiad.wmnet
  • 14:35 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 14:35 sukhe: running authdns-update for CR 1047074
  • 14:35 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2363 to wikikube-worker2023 - cgoubert@cumin1002"
  • 14:34 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 14:32 arnaudb@cumin1002: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: post T367854 repool', diff saved to https://phabricator.wikimedia.org/P65245 and previous config saved to /var/cache/conftool/dbconfig/20240620-143244-arnaudb.json
  • 14:32 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:32 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2363 to wikikube-worker2023
  • 14:31 moritzm: imported python-pymysql 1.0.2-2~wmf11u2 to apt.wikimedia.org (merge of the security fix from DSA 5700 on top of our internal backport)
  • 14:31 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 depool ahead of T365987', diff saved to https://phabricator.wikimedia.org/P65244 and previous config saved to /var/cache/conftool/dbconfig/20240620-143109-arnaudb.json
  • 14:30 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1036.eqiad.wmnet with reason: T365987
  • 14:30 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore200[5-6].codfw.wmnet: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 14:30 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:30:00 on es1036.eqiad.wmnet with reason: T365987
  • 14:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2362 to wikikube-worker2022
  • 14:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2022
  • 14:29 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2004.codfw.wmnet: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 14:28 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2022
  • 14:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2362 to wikikube-worker2022 - cgoubert@cumin1002"
  • 14:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P65243 and previous config saved to /var/cache/conftool/dbconfig/20240620-142834-marostegui.json
  • 14:27 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2362 to wikikube-worker2022 - cgoubert@cumin1002"
  • 14:27 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 14:26 sukhe: sudo cumin 'O:alerting_host' 'run-puppet-agent'
  • 14:25 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 14:25 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 14:25 elukey@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Update wmf-plugin for K8s ml-staging - elukey@cumin1002
  • 14:25 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:24 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2362 to wikikube-worker2022
  • 14:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 14:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 14:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 14:24 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 14:22 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2004.codfw.wmnet: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 14:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2360 to wikikube-worker2021
  • 14:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2021
  • 14:21 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2021
  • 14:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2360 to wikikube-worker2021 - cgoubert@cumin1002"
  • 14:19 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2360 to wikikube-worker2021 - cgoubert@cumin1002"
  • 14:17 arnaudb@cumin1002: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: post T367854 repool', diff saved to https://phabricator.wikimedia.org/P65242 and previous config saved to /var/cache/conftool/dbconfig/20240620-141739-arnaudb.json
  • 14:17 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: IPIP migration
  • 14:17 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:17 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: IPIP migration
  • 14:17 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2360 to wikikube-worker2021
  • 14:16 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2358 to wikikube-worker2020
  • 14:16 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2020
  • 14:15 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2020
  • 14:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2358 to wikikube-worker2020 - cgoubert@cumin1002"
  • 14:14 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 14:14 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 14:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P65241 and previous config saved to /var/cache/conftool/dbconfig/20240620-141328-marostegui.json
  • 14:13 elukey@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Update wmf-plugin for K8s ml-staging - elukey@cumin1002
  • 14:13 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2358 to wikikube-worker2020 - cgoubert@cumin1002"
  • 14:10 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:10 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2358 to wikikube-worker2020
  • 14:10 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1051.eqiad.wmnet with reason: host reimage
  • 14:10 marostegui@cumin2002: dbctl commit (dc=all): 'db1236 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P65240 and previous config saved to /var/cache/conftool/dbconfig/20240620-141010-root.json
  • 14:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2339 to wikikube-worker2019
  • 14:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2019
  • 14:09 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2019
  • 14:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2339 to wikikube-worker2019 - cgoubert@cumin1002"
  • 14:07 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1051.eqiad.wmnet with reason: host reimage
  • 14:07 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2339 to wikikube-worker2019 - cgoubert@cumin1002"
  • 14:04 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:04 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2339 to wikikube-worker2019
  • 14:02 arnaudb@cumin1002: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: post T367854 repool', diff saved to https://phabricator.wikimedia.org/P65239 and previous config saved to /var/cache/conftool/dbconfig/20240620-140233-arnaudb.json
  • 14:01 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1049.eqiad.wmnet with OS bookworm
  • 14:01 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1050.eqiad.wmnet with OS bookworm
  • 13:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367856)', diff saved to https://phabricator.wikimedia.org/P65238 and previous config saved to /var/cache/conftool/dbconfig/20240620-135820-marostegui.json
  • 13:57 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
  • 13:56 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
  • 13:56 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 13:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T367856)', diff saved to https://phabricator.wikimedia.org/P65237 and previous config saved to /var/cache/conftool/dbconfig/20240620-135610-marostegui.json
  • 13:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 13:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 13:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367856)', diff saved to https://phabricator.wikimedia.org/P65236 and previous config saved to /var/cache/conftool/dbconfig/20240620-135559-marostegui.json
  • 13:55 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 13:55 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 13:55 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 13:55 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 13:55 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 13:55 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 13:54 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 13:54 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 13:54 marostegui@cumin2002: dbctl commit (dc=all): 'db1236 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P65235 and previous config saved to /var/cache/conftool/dbconfig/20240620-135438-root.json
  • 13:54 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 13:53 claime: Depooling mw2339.codfw.wmnet,mw2358.codfw.wmnet,mw2360.codfw.wmnet,mw2362.codfw.wmnet,mw2363.codfw.wmnet,mw2364.codfw.wmnet for reimage to k8s - T351074
  • 13:53 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
  • 13:52 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
  • 13:52 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 13:51 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 13:51 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 13:50 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 13:50 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 13:50 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1051.eqiad.wmnet with OS bookworm
  • 13:50 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 13:47 arnaudb@cumin1002: dbctl commit (dc=all): 'db1165 (re)pooling @ 10%: post T367854 repool', diff saved to https://phabricator.wikimedia.org/P65234 and previous config saved to /var/cache/conftool/dbconfig/20240620-134728-arnaudb.json
  • 13:46 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 13:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P65233 and previous config saved to /var/cache/conftool/dbconfig/20240620-134052-marostegui.json
  • 13:39 marostegui@cumin2002: dbctl commit (dc=all): 'db1236 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P65232 and previous config saved to /var/cache/conftool/dbconfig/20240620-133907-root.json
  • 13:36 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1050.eqiad.wmnet with reason: host reimage
  • 13:32 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1049.eqiad.wmnet with reason: host reimage
  • 13:28 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1050.eqiad.wmnet with reason: host reimage
  • 13:28 hashar@deploy1002: Finished deploy [integration/docroot@7f59f49]: build: Updating eslint-config-wikimedia to 0.28.2 (duration: 00m 06s)
  • 13:28 hashar@deploy1002: Started deploy [integration/docroot@7f59f49]: build: Updating eslint-config-wikimedia to 0.28.2
  • 13:27 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1049.eqiad.wmnet with reason: host reimage
  • 13:27 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 13:26 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 13:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P65231 and previous config saved to /var/cache/conftool/dbconfig/20240620-132545-marostegui.json
  • 13:24 reedy@deploy1002: Synchronized wmf-config/: T368003 (duration: 10m 39s)
  • 13:24 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 13:23 marostegui@cumin2002: dbctl commit (dc=all): 'db1236 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P65230 and previous config saved to /var/cache/conftool/dbconfig/20240620-132335-root.json
  • 13:23 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 13:22 elukey: upload dragonfly packages 1.0.6-2 to bookworm-wikimedia - T365253
  • 13:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367856)', diff saved to https://phabricator.wikimedia.org/P65228 and previous config saved to /var/cache/conftool/dbconfig/20240620-131038-marostegui.json
  • 13:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T364069)', diff saved to https://phabricator.wikimedia.org/P65227 and previous config saved to /var/cache/conftool/dbconfig/20240620-131031-marostegui.json
  • 13:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 13:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 13:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2124 (T367856)', diff saved to https://phabricator.wikimedia.org/P65226 and previous config saved to /var/cache/conftool/dbconfig/20240620-130928-marostegui.json
  • 13:09 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1050.eqiad.wmnet with OS bookworm
  • 13:09 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1049.eqiad.wmnet with OS bookworm
  • 13:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 13:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 13:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 13:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 13:08 marostegui@cumin2002: dbctl commit (dc=all): 'db1236 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P65225 and previous config saved to /var/cache/conftool/dbconfig/20240620-130804-root.json
  • 13:07 sukhe: running homer on cr*{eqiad,codfw}* for CR 1046737: update policies/cr-labs.yaml for new NTP servers: T366360
  • 13:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin1002.eqiad.wmnet
  • 13:05 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2003.codfw.wmnet
  • 13:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cumin1002.eqiad.wmnet
  • 13:00 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-staging2003.codfw.wmnet
  • 12:54 sukhe: sudo cumin -b1 -s30 "A:installserver" "run-puppet-agent": T366360
  • 12:51 marostegui@cumin2002: dbctl commit (dc=all): 'db1236 (re)pooling @ 5%: 1', diff saved to https://phabricator.wikimedia.org/P65223 and previous config saved to /var/cache/conftool/dbconfig/20240620-125139-root.json
  • 12:51 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 12:44 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 12:37 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1048.eqiad.wmnet with OS bookworm
  • 12:33 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1047.eqiad.wmnet with OS bookworm
  • 12:11 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1048.eqiad.wmnet with reason: host reimage
  • 12:08 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1047.eqiad.wmnet with reason: host reimage
  • 12:06 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1048.eqiad.wmnet with reason: host reimage
  • 12:04 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1047.eqiad.wmnet with reason: host reimage
  • 11:52 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=mw2282.codfw.wmnet,cluster=kubernetes,service=kubesvc
  • 11:48 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 11:48 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1048.eqiad.wmnet with OS bookworm
  • 11:47 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bookworm
  • 11:41 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 11:38 XioNoX: merge netbox-extra CR1038869 - Fix lots of CI errors
  • 11:33 jgiannelos@deploy1002: Finished deploy [restbase/deploy@f867c66]: (no justification provided) (duration: 30m 12s)
  • 11:27 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 11:26 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 11:25 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
  • 11:25 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
  • 11:21 akosiaris: upgrade mathoid to 2024-06-18-233457-production T349118
  • 11:20 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: sync
  • 11:20 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: sync
  • 11:03 jgiannelos@deploy1002: Started deploy [restbase/deploy@f867c66]: (no justification provided)
  • 10:57 dreamyjazz@deploy1002: Finished scap: Backport for [testwiki] Fix assignment of 'checkuser-temporary-account' right (T367170) (duration: 15m 03s)
  • 10:48 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
  • 10:44 dreamyjazz@deploy1002: dreamyjazz: Backport for [testwiki] Fix assignment of 'checkuser-temporary-account' right (T367170) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:42 dreamyjazz@deploy1002: Started scap: Backport for [testwiki] Fix assignment of 'checkuser-temporary-account' right (T367170)
  • 10:41 Amir1: running extensions/Echo/maintenance/removeOrphanedEvents.php --force on all wikis (T308084)
  • 10:37 dreamyjazz@deploy1002: Finished scap: Backport for [testwiki] Assign 'checkuser-temporary-account' to the sysop group (T367170) (duration: 13m 49s)
  • 10:33 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1046.eqiad.wmnet with OS bookworm
  • 10:33 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1045.eqiad.wmnet with OS bookworm
  • 10:31 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: name=mw2321.codfw.wmnet,cluster=kubernetes,service=kubesvc
  • 10:31 claime: repooling and uncordoning mw2321.codfw.wmnet - T367862
  • 10:31 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
  • 10:30 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2321.codfw.wmnet
  • 10:30 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw2321.codfw.wmnet
  • 10:28 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
  • 10:25 dreamyjazz@deploy1002: dreamyjazz: Backport for [testwiki] Assign 'checkuser-temporary-account' to the sysop group (T367170) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:24 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
  • 10:23 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
  • 10:23 dreamyjazz@deploy1002: Started scap: Backport for [testwiki] Assign 'checkuser-temporary-account' to the sysop group (T367170)
  • 10:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw2321.codfw.wmnet with reason: Test scap with host unavailable
  • 10:20 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 10:20 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2321.codfw.wmnet with reason: Test scap with host unavailable
  • 10:19 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 10:18 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:18 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 10:17 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 10:16 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 10:16 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:15 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=mw2321.codfw.wmnet,cluster=kubernetes,service=kubesvc
  • 10:14 claime: Draining and depooling mw2321.codfw.wmnet to test 1047031 - T367862
  • 10:14 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 10:07 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: host reimage
  • 10:04 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: host reimage
  • 10:04 claime: Running puppet on A:wikikube-worker
  • 10:02 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: host reimage
  • 10:01 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: host reimage
  • 10:00 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 10:00 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 09:51 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 09:51 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 09:50 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 09:49 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 09:47 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:45 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1046.eqiad.wmnet with OS bookworm
  • 09:45 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1045.eqiad.wmnet with OS bookworm
  • 09:45 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:16 zabe: zabe@mwmaint1002:~$ mwscript createAndPromote.php sysop_plwiki AramilFeraxa REDACTED --bureaucrat --sysop # T361041
  • 08:57 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
  • 08:51 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
  • 08:51 cmooney@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Release v0.6.6 - cmooney@cumin1002
  • 08:50 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
  • 08:49 cmooney@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Release v0.6.6 - cmooney@cumin1002
  • 08:36 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1044.eqiad.wmnet with OS bookworm
  • 08:33 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
  • 08:23 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
  • 08:16 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.43.0-wmf.10 refs T361404
  • 08:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1001.wikimedia.org
  • 08:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1001.wikimedia.org
  • 08:08 moritzm: reboot of irc1001 to nudge clients to re-connect to the new bullseye host T331702
  • 08:06 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
  • 08:03 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
  • 07:53 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:53 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 07:53 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 07:52 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 07:48 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1044.eqiad.wmnet with OS bookworm
  • 07:04 moritzm: failover irc.wikimedia.org to the new Bullseye servers T331702
  • 06:04 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18:00:00 on an-worker1085.eqiad.wmnet with reason: T367825 hw maint 2024-06-20
  • 06:03 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 18:00:00 on an-worker1085.eqiad.wmnet with reason: T367825 hw maint 2024-06-20
  • 05:27 marostegui: Deploy schema change on old s7 eqiad master dbmaint (db1236) T364299
  • 05:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Long schema change
  • 05:25 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Long schema change
  • 05:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1236 T367857', diff saved to https://phabricator.wikimedia.org/P65220 and previous config saved to /var/cache/conftool/dbconfig/20240620-052359-root.json
  • 05:22 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1181 to s7 primary and set section read-write T367857', diff saved to https://phabricator.wikimedia.org/P65219 and previous config saved to /var/cache/conftool/dbconfig/20240620-052253-marostegui.json
  • 05:22 marostegui@cumin1002: dbctl commit (dc=all): 'Set s7 eqiad as read-only for maintenance - T367857', diff saved to https://phabricator.wikimedia.org/P65218 and previous config saved to /var/cache/conftool/dbconfig/20240620-052230-marostegui.json
  • 05:22 marostegui: Starting s7 eqiad failover from db1236 to db1181 - T367857
  • 05:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Long schema change
  • 05:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Long schema change
  • 05:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 T367857
  • 05:04 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1181 with weight 0 T367857', diff saved to https://phabricator.wikimedia.org/P65217 and previous config saved to /var/cache/conftool/dbconfig/20240620-050428-marostegui.json
  • 05:04 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 28 hosts with reason: Primary switchover s7 T367857
  • 02:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T367856)', diff saved to https://phabricator.wikimedia.org/P65216 and previous config saved to /var/cache/conftool/dbconfig/20240620-022416-marostegui.json
  • 02:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 02:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 02:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 02:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 02:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367856)', diff saved to https://phabricator.wikimedia.org/P65215 and previous config saved to /var/cache/conftool/dbconfig/20240620-022349-marostegui.json
  • 02:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P65214 and previous config saved to /var/cache/conftool/dbconfig/20240620-020842-marostegui.json
  • 01:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P65213 and previous config saved to /var/cache/conftool/dbconfig/20240620-015335-marostegui.json
  • 01:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367856)', diff saved to https://phabricator.wikimedia.org/P65212 and previous config saved to /var/cache/conftool/dbconfig/20240620-013827-marostegui.json

2024-06-19

  • 23:05 zabe: zabe@mwmaint1002:~$ mwscript createAndPromote.php arbcom_itwiki Superpes15 REDACTED --bureaucrat --sysop
  • 23:05 zabe: zabe@mwmaint1002:~$ mwscript createAndPromote.php u4cwiki Superpes15 REDACTED --bureaucrat --sysop
  • 21:08 oblivian@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wikikube-ctrl[2001-2002].codfw.wmnet with reason: Reimage --kamila
  • 21:08 oblivian@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wikikube-ctrl[2001-2002].codfw.wmnet with reason: Reimage --kamila
  • 20:33 zabe@deploy1002: Finished scap: Backport for [tlywiki] Change the logo and wordmark/tagline (T366431) (duration: 14m 41s)
  • 20:24 zabe@deploy1002: superpes, zabe: Continuing with sync
  • 20:23 zabe@deploy1002: superpes, zabe: Backport for [tlywiki] Change the logo and wordmark/tagline (T366431) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:19 zabe@deploy1002: Started scap: Backport for [tlywiki] Change the logo and wordmark/tagline (T366431)
  • 19:08 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: host reimage
  • 19:05 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: host reimage
  • 18:54 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2001.codfw.wmnet with reason: host reimage
  • 18:51 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2001.codfw.wmnet with reason: host reimage
  • 18:49 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
  • 18:48 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
  • 18:40 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
  • 18:35 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
  • 18:34 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
  • 18:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T367856)', diff saved to https://phabricator.wikimedia.org/P65211 and previous config saved to /var/cache/conftool/dbconfig/20240619-182922-marostegui.json
  • 18:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 18:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 18:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367856)', diff saved to https://phabricator.wikimedia.org/P65210 and previous config saved to /var/cache/conftool/dbconfig/20240619-182900-marostegui.json
  • 18:21 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
  • 18:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P65209 and previous config saved to /var/cache/conftool/dbconfig/20240619-181353-marostegui.json
  • 17:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P65208 and previous config saved to /var/cache/conftool/dbconfig/20240619-175846-marostegui.json
  • 17:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367856)', diff saved to https://phabricator.wikimedia.org/P65207 and previous config saved to /var/cache/conftool/dbconfig/20240619-174338-marostegui.json
  • 17:21 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:21 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl200[12] to a new rack - kamila@cumin1002"
  • 17:20 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl200[12] to a new rack - kamila@cumin1002"
  • 17:13 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 17:05 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1044.eqiad.wmnet with OS bookworm
  • 17:01 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl2002
  • 17:01 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl2002
  • 17:01 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl2001
  • 17:01 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl2001
  • 17:00 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:00 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl200[12] to a new rack - kamila@cumin1002"
  • 16:59 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl200[12] to a new rack - kamila@cumin1002"
  • 16:42 sukhe: sudo cumin 'A:durum' 'run-puppet-agent' to switch timesyncd NTP pools to ntp-[abc].anycast.wmnet: T366360
  • 16:27 claime: pooling and uncordoning mw2321.codfw.wmnet - T367702
  • 16:27 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=mw2321.codfw.wmnet,cluster=kubernetes,service=kubesvc
  • 16:19 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: service=(ntp-a|ntp-b|ntp-c)
  • 16:15 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox-dev to netbox-dev2003.codfw.wmnet with reason: Netbox 4 on netbox-dev2003 - ayounsi@cumin1002 - T336275
  • 16:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw2321.codfw.wmnet back to active - cgoubert@cumin1002"
  • 16:12 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw2321.codfw.wmnet back to active - cgoubert@cumin1002"
  • 16:09 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 16:03 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox-dev to netbox-dev2003.codfw.wmnet with reason: Netbox 4 on netbox-dev2003 - ayounsi@cumin1002 - T336275
  • 15:55 ayounsi@cumin1002: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox-dev to netbox-dev2003.codfw.wmnet with reason: Netbox 4 on netbox-dev2003 - ayounsi@cumin1002 - T336275
  • 15:55 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox-dev to netbox-dev2003.codfw.wmnet with reason: Netbox 4 on netbox-dev2003 - ayounsi@cumin1002 - T336275
  • 15:51 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:50 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:46 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:46 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 15:45 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 15:44 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:43 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1044.eqiad.wmnet with OS bookworm
  • 15:32 taavi@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1042
  • 15:32 taavi@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1042
  • 15:24 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:24 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:23 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:23 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:23 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:22 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:16 sukhe: sudo cumin -b1 -s120 'A:dnsbox' 'run-puppet-agent --enable "merging CR 1046685"': T366360
  • 15:08 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:08 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS info - pt1979@cumin2002"
  • 15:07 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS info - pt1979@cumin2002"
  • 15:07 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1043.eqiad.wmnet with OS bookworm
  • 15:06 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:04 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns2006.wikimedia.org,service=ntp-c
  • 15:03 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:01 pfischer@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on mw2282.codfw.wmnet with reason: Host move
  • 15:01 pfischer@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:01 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on mw2282.codfw.wmnet with reason: Host move
  • 15:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2282.codfw.wmnet
  • 15:00 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw2282.codfw.wmnet
  • 14:59 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.remove-downtime (exit_code=97) for wikikube-worker2003.codfw.wmnet
  • 14:59 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for wikikube-worker2003.codfw.wmnet
  • 14:42 marostegui: Deploy schema change on s2 eqiad master dbmaint T364069
  • 14:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155-1156].eqiad.wmnet with reason: Long schema change
  • 14:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155-1156].eqiad.wmnet with reason: Long schema change
  • 14:39 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155,1158].eqiad.wmnet with reason: Long schema change
  • 14:38 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1042.eqiad.wmnet with OS bookworm
  • 14:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155,1158].eqiad.wmnet with reason: Long schema change
  • 14:38 taavi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - taavi@cumin1002"
  • 14:37 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: host reimage
  • 14:36 taavi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - taavi@cumin1002"
  • 14:35 moritzm: installing nano security updates
  • 14:34 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: host reimage
  • 14:24 moritzm: installing libvpx security updates
  • 14:23 moritzm: installing pymysql security updates
  • 14:19 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
  • 14:19 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1043.eqiad.wmnet with OS bookworm
  • 14:17 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
  • 14:14 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
  • 14:12 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 14:11 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: host reimage
  • 14:11 taavi@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudvirt1043.eqiad.wmnet']
  • 14:10 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-staging2003.codfw.wmnet with OS bookworm
  • 14:10 klausman@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - klausman@cumin2002"
  • 14:09 taavi@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1043.eqiad.wmnet']
  • 14:09 klausman@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - klausman@cumin2002"
  • 14:09 taavi@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudvirt1043.eqiad.wmnet']
  • 14:08 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: host reimage
  • 14:08 taavi@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1043.eqiad.wmnet']
  • 14:07 taavi@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cloudvirt1043.eqiad.wmnet']
  • 14:07 taavi@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1043.eqiad.wmnet']
  • 14:01 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1044.eqiad.wmnet with OS bookworm
  • 13:57 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging2003.codfw.wmnet with reason: host reimage
  • 13:54 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging2003.codfw.wmnet with reason: host reimage
  • 13:53 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1042.eqiad.wmnet with OS bookworm
  • 13:53 taavi@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1042.eqiad.wmnet with OS bookworm
  • 13:51 klausman@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Trying to fix Puppet error on ml-staging2003 - klausman@cumin2002"
  • 13:50 klausman@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Trying to fix Puppet error on ml-staging2003 - klausman@cumin2002"
  • 13:49 klausman@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Trying to fix Puppet error on ml-staging2003 - klausman@cumin2002"
  • 13:48 klausman@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Trying to fix Puppet error on ml-staging2003 - klausman@cumin2002"
  • 13:43 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1042.eqiad.wmnet with OS bookworm
  • 13:42 klausman@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Trying to fix Puppet error on ml-staging2003 - klausman@cumin2002"
  • 13:41 taavi@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
  • 13:41 klausman@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Trying to fix Puppet error on ml-staging2003 - klausman@cumin2002"
  • 13:35 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet
  • 13:35 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
  • 13:35 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1043.eqiad.wmnet with OS bookworm
  • 13:35 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5017.*} and A:cp
  • 13:32 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5017.*} and A:cp
  • 13:32 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
  • 13:32 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns6001.wikimedia.org,service=ntp-a
  • 13:31 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp5017.eqsin.wmnet
  • 13:31 taavi@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
  • 13:28 sukhe: enable puppet on dns6001 to test CR 1046685
  • 13:23 sukhe: sudo cumin 'A:dnsbox' 'disable-puppet "merging CR 1046685"': T366360
  • 13:22 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:21 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on mw2282.codfw.wmnet with reason: host move
  • 13:21 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on mw2282.codfw.wmnet with reason: host move
  • 13:20 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 13:17 kamila_: drained mw2282.codfw.wmnet for T361856
  • 13:16 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1044.eqiad.wmnet with OS bookworm
  • 13:06 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
  • 13:04 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: service=ntp-[abc]
  • 13:04 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: sync on production
  • 12:52 taavi@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
  • 12:51 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2011.codfw.wmnet|wikikube-worker2012.codfw.wmnet|wikikube-worker2013.codfw.wmnet|wikikube-worker2014.codfw.wmnet|wikikube-worker2017.codfw.wmnet|wikikube-worker2018.codfw.wmnet),cluster=kubernetes,service=kubesvc
  • 12:40 claime: homer 'cr*codfw*' commit 'T351074'
  • 12:38 pfischer@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:38 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet
  • 12:38 pfischer@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:37 taavi@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
  • 12:37 taavi@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
  • 12:36 taavi@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
  • 12:36 taavi@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
  • 12:36 taavi@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
  • 12:35 taavi@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1042.eqiad.wmnet with OS bookworm
  • 12:34 klausman: Puppet management of install2004 restored, lpxelinux.0 also restored.
  • 12:24 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp5017.eqsin.wmnet
  • 12:22 pfischer@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:21 pfischer@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:20 pfischer@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:19 pfischer@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:17 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ml-staging2003.codfw.wmnet with OS bookworm
  • 12:14 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 12:14 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 12:13 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1043.eqiad.wmnet with OS bookworm
  • 12:12 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 12:11 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 12:11 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 12:10 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 12:08 klausman: Will test-replace the PXE chainloader (/srv/tftpboot/lpxelinux.0) on install2003 with a newer version to see if it fixes the ldlinux.c32 error. Puppet will be disabled on that machine for the duration.
  • 12:07 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 12:07 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 12:06 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet
  • 12:03 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp5017.eqsin.wmnet
  • 12:03 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet
  • 12:02 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp5017.eqsin.wmnet
  • 12:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65204 and previous config saved to /var/cache/conftool/dbconfig/20240619-120142-root.json
  • 12:01 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on idp-test1002.wikimedia.org with reason: CAS 7 upgrade
  • 12:01 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on idp-test1002.wikimedia.org with reason: CAS 7 upgrade
  • 12:00 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 12:00 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5017.*} and A:cp
  • 11:57 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1042.eqiad.wmnet with OS bookworm
  • 11:57 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5017.*} and A:cp
  • 11:50 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 11:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65203 and previous config saved to /var/cache/conftool/dbconfig/20240619-114636-root.json
  • 11:36 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-eqsin
  • 11:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65201 and previous config saved to /var/cache/conftool/dbconfig/20240619-113131-root.json
  • 11:26 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 11:18 ayounsi@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=97) for new host netbox-dev2003.codfw.wmnet
  • 11:18 ayounsi@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host netbox-dev2003.codfw.wmnet with OS bookworm
  • 11:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2012.codfw.wmnet with OS bullseye
  • 11:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65200 and previous config saved to /var/cache/conftool/dbconfig/20240619-111625-root.json
  • 11:15 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 11:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2013.codfw.wmnet with OS bullseye
  • 11:14 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 11:14 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 11:13 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 11:12 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 11:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2014.codfw.wmnet with OS bullseye
  • 11:10 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 11:08 hnowlan@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2017.codfw.wmnet with OS bullseye
  • 11:07 hnowlan@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 11:07 hnowlan@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:06 hnowlan@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:04 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 11:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2018.codfw.wmnet with OS bullseye
  • 11:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2011.codfw.wmnet with OS bullseye
  • 11:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65199 and previous config saved to /var/cache/conftool/dbconfig/20240619-110120-root.json
  • 10:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2012.codfw.wmnet with reason: host reimage
  • 10:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2013.codfw.wmnet with reason: host reimage
  • 10:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2014.codfw.wmnet with reason: host reimage
  • 10:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2017.codfw.wmnet with reason: host reimage
  • 10:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65198 and previous config saved to /var/cache/conftool/dbconfig/20240619-104614-root.json
  • 10:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2018.codfw.wmnet with reason: host reimage
  • 10:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2011.codfw.wmnet with reason: host reimage
  • 10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2017.codfw.wmnet with reason: host reimage
  • 10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2018.codfw.wmnet with reason: host reimage
  • 10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2013.codfw.wmnet with reason: host reimage
  • 10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2014.codfw.wmnet with reason: host reimage
  • 10:40 jmm@deploy1002: Finished scap: (no justification provided) (duration: 04m 03s)
  • 10:39 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2012.codfw.wmnet with reason: host reimage
  • 10:39 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2011.codfw.wmnet with reason: host reimage
  • 10:36 jmm@deploy1002: Started scap: (no justification provided)
  • 10:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65197 and previous config saved to /var/cache/conftool/dbconfig/20240619-103109-root.json
  • 10:25 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2018.codfw.wmnet with OS bullseye
  • 10:25 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2017.codfw.wmnet with OS bullseye
  • 10:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2124 (T367856)', diff saved to https://phabricator.wikimedia.org/P65196 and previous config saved to /var/cache/conftool/dbconfig/20240619-102504-marostegui.json
  • 10:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 10:24 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2014.codfw.wmnet with OS bullseye
  • 10:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 10:24 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2013.codfw.wmnet with OS bullseye
  • 10:24 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2012.codfw.wmnet with OS bullseye
  • 10:23 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2011.codfw.wmnet with OS bullseye
  • 10:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2409 to wikikube-worker2018
  • 10:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2018
  • 10:22 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2018
  • 10:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2409 to wikikube-worker2018 - cgoubert@cumin1002"
  • 10:21 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2409 to wikikube-worker2018 - cgoubert@cumin1002"
  • 10:18 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:18 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2409 to wikikube-worker2018
  • 10:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2408 to wikikube-worker2017
  • 10:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2017
  • 10:17 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2017
  • 10:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2408 to wikikube-worker2017 - cgoubert@cumin1002"
  • 10:16 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2408 to wikikube-worker2017 - cgoubert@cumin1002"
  • 10:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P65195 and previous config saved to /var/cache/conftool/dbconfig/20240619-101625-marostegui.json
  • 10:14 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:14 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2408 to wikikube-worker2017
  • 10:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2405 to wikikube-worker2014
  • 10:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2014
  • 10:12 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2014
  • 10:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2405 to wikikube-worker2014 - cgoubert@cumin1002"
  • 10:09 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2405 to wikikube-worker2014 - cgoubert@cumin1002"
  • 10:06 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:06 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2405 to wikikube-worker2014
  • 10:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2404 to wikikube-worker2013
  • 10:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2013
  • 10:05 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 10:05 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2013
  • 10:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2404 to wikikube-worker2013 - cgoubert@cumin1002"
  • 10:03 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2404 to wikikube-worker2013 - cgoubert@cumin1002"
  • 10:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T364069)', diff saved to https://phabricator.wikimedia.org/P65194 and previous config saved to /var/cache/conftool/dbconfig/20240619-100118-marostegui.json
  • 10:00 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s1
  • 10:00 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 09:59 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2404 to wikikube-worker2013
  • 09:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2403 to wikikube-worker2012
  • 09:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2012
  • 09:55 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 09:54 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 09:53 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2012
  • 09:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2403 to wikikube-worker2012 - cgoubert@cumin1002"
  • 09:51 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2403 to wikikube-worker2012 - cgoubert@cumin1002"
  • 09:51 ayounsi@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "netbox-dev2003 - ayounsi@cumin1002"
  • 09:47 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "netbox-dev2003 - ayounsi@cumin1002"
  • 09:47 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 09:47 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2403 to wikikube-worker2012
  • 09:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2400 to wikikube-worker2011
  • 09:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2011
  • 09:46 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2011
  • 09:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2400 to wikikube-worker2011 - cgoubert@cumin1002"
  • 09:44 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 09:43 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2400 to wikikube-worker2011 - cgoubert@cumin1002"
  • 09:40 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 09:40 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2400 to wikikube-worker2011
  • 09:40 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-eqsin
  • 09:34 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:32 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 09:22 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 09:21 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 09:16 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netbox-dev2003.codfw.wmnet with reason: host reimage
  • 09:15 claime: Depooling mw2400.codfw.wmnet,mw2403.codfw.wmnet,mw2404.codfw.wmnet,mw2405.codfw.wmnet,mw2408.codfw.wmnet,mw2409.codfw.wmnet for reimage - T351074
  • 09:13 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on netbox-dev2003.codfw.wmnet with reason: host reimage
  • 09:11 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 09:10 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe2001.codfw.wmnet with OS bookworm
  • 09:01 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5025.*} and A:cp
  • 08:59 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe1001.eqiad.wmnet with OS bookworm
  • 08:58 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5025.*} and A:cp
  • 08:57 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5017.*} and A:cp
  • 08:54 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5017.*} and A:cp
  • 08:52 fabfur: upgrading eqsin cp hosts to haproxy 2.8.10 (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1047436) (T367756)
  • 08:51 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe2001.codfw.wmnet with reason: host reimage
  • 08:48 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe2001.codfw.wmnet with reason: host reimage
  • 08:40 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe1001.eqiad.wmnet with reason: host reimage
  • 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15830
  • 08:38 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe1001.eqiad.wmnet with reason: host reimage
  • 08:35 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 15830
  • 08:31 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2001.codfw.wmnet with OS bookworm
  • 08:30 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-fe2001.codfw.wmnet with OS bookworm
  • 08:30 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on P{ms-fe*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 08:24 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on P{ms-fe*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 08:23 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe1001.eqiad.wmnet with OS bookworm
  • 08:23 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-fe1001.eqiad.wmnet with OS bookworm
  • 08:18 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.10 refs T361404
  • 08:12 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2001.codfw.wmnet with OS bookworm
  • 08:11 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe1001.eqiad.wmnet with OS bookworm
  • 08:09 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on P{ms-fe*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 08:03 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on P{ms-fe*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 08:01 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host netbox-dev2003.codfw.wmnet with OS bookworm
  • 08:00 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netbox-dev2003.codfw.wmnet - ayounsi@cumin1002"
  • 07:59 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netbox-dev2003.codfw.wmnet - ayounsi@cumin1002"
  • 07:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox-dev2003.codfw.wmnet on all recursors
  • 07:59 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache netbox-dev2003.codfw.wmnet on all recursors
  • 07:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netbox-dev2003.codfw.wmnet - ayounsi@cumin1002"
  • 07:57 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netbox-dev2003.codfw.wmnet - ayounsi@cumin1002"
  • 07:54 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 07:54 ayounsi@cumin1002: START - Cookbook sre.ganeti.makevm for new host netbox-dev2003.codfw.wmnet
  • 07:48 kartik@deploy1002: Finished scap: Backport for igwiki: Enable MinT for Wikipedia readers (T363464) (duration: 18m 55s)
  • 07:38 kartik@deploy1002: kartik: Continuing with sync
  • 07:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 07:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 07:33 kartik@deploy1002: kartik: Backport for igwiki: Enable MinT for Wikipedia readers (T363464) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:29 kartik@deploy1002: Started scap: Backport for igwiki: Enable MinT for Wikipedia readers (T363464)
  • 07:22 kartik@deploy1002: Finished scap: Backport for testwiki: Enable MinT for Wikipedia readers MVP on a Igbo Wikipedia (T367852) (duration: 20m 12s)
  • 07:20 marostegui: Deploy schema change on old s7 eqiad master db1160 dbmaint T364069
  • 07:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65192 and previous config saved to /var/cache/conftool/dbconfig/20240619-071516-root.json
  • 07:12 kartik@deploy1002: kartik: Continuing with sync
  • 07:07 kartik@deploy1002: kartik: Backport for testwiki: Enable MinT for Wikipedia readers MVP on a Igbo Wikipedia (T367852) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:02 kartik@deploy1002: Started scap: Backport for testwiki: Enable MinT for Wikipedia readers MVP on a Igbo Wikipedia (T367852)
  • 07:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65191 and previous config saved to /var/cache/conftool/dbconfig/20240619-070010-root.json
  • 06:52 jynus: stop db1240:s1, wipe and reimport db1240:s3 T367162
  • 06:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65190 and previous config saved to /var/cache/conftool/dbconfig/20240619-064505-root.json
  • 06:40 XioNoX: merge Puppet "Prepare for netbox-dev" CR1047081
  • 06:33 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65189 and previous config saved to /var/cache/conftool/dbconfig/20240619-063337-root.json
  • 06:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65188 and previous config saved to /var/cache/conftool/dbconfig/20240619-062959-root.json
  • 06:21 _joe_: upgrading conftool everywhere T367919
  • 06:18 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65187 and previous config saved to /var/cache/conftool/dbconfig/20240619-061831-root.json
  • 06:17 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: After reimage', diff saved to https://phabricator.wikimedia.org/P65186 and previous config saved to /var/cache/conftool/dbconfig/20240619-061721-root.json
  • 06:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65185 and previous config saved to /var/cache/conftool/dbconfig/20240619-061454-root.json
  • 06:08 _joe_: uploaded newer python-conftool packages T367919
  • 06:05 _joe_: deleting manually thirdparty/conda repositories from reprepro T364550
  • 06:03 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65184 and previous config saved to /var/cache/conftool/dbconfig/20240619-060326-root.json
  • 06:02 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: After reimage', diff saved to https://phabricator.wikimedia.org/P65183 and previous config saved to /var/cache/conftool/dbconfig/20240619-060216-root.json
  • 05:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65182 and previous config saved to /var/cache/conftool/dbconfig/20240619-055948-root.json
  • 05:48 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65181 and previous config saved to /var/cache/conftool/dbconfig/20240619-054820-root.json
  • 05:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: After reimage', diff saved to https://phabricator.wikimedia.org/P65180 and previous config saved to /var/cache/conftool/dbconfig/20240619-054710-root.json
  • 05:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65179 and previous config saved to /var/cache/conftool/dbconfig/20240619-054443-root.json
  • 05:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65178 and previous config saved to /var/cache/conftool/dbconfig/20240619-054259-root.json
  • 05:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T364069)', diff saved to https://phabricator.wikimedia.org/P65177 and previous config saved to /var/cache/conftool/dbconfig/20240619-054214-marostegui.json
  • 05:33 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65176 and previous config saved to /var/cache/conftool/dbconfig/20240619-053315-root.json
  • 05:32 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: After reimage', diff saved to https://phabricator.wikimedia.org/P65175 and previous config saved to /var/cache/conftool/dbconfig/20240619-053205-root.json
  • 05:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65174 and previous config saved to /var/cache/conftool/dbconfig/20240619-052754-root.json
  • 05:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 05:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 05:18 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65173 and previous config saved to /var/cache/conftool/dbconfig/20240619-051809-root.json
  • 05:17 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 10%: After reimage', diff saved to https://phabricator.wikimedia.org/P65172 and previous config saved to /var/cache/conftool/dbconfig/20240619-051659-root.json
  • 05:12 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65171 and previous config saved to /var/cache/conftool/dbconfig/20240619-051248-root.json
  • 05:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1169', diff saved to https://phabricator.wikimedia.org/P65170 and previous config saved to /var/cache/conftool/dbconfig/20240619-051233-root.json
  • 05:10 marostegui@cumin1002: dbctl commit (dc=all): 'repool db1169', diff saved to https://phabricator.wikimedia.org/P65169 and previous config saved to /var/cache/conftool/dbconfig/20240619-051014-marostegui.json
  • 05:09 marostegui@cumin1002: dbctl commit (dc=all): 'test depool db1169', diff saved to https://phabricator.wikimedia.org/P65168 and previous config saved to /var/cache/conftool/dbconfig/20240619-050951-marostegui.json

2024-06-18

  • 23:22 jforrester@deploy1002: Finished scap: Backport for Use isEnumType in selector and isCustomEnum for creating literals (T367159), findAddedContentNeedingReference was removed accidentally (T367920) (duration: 17m 16s)
  • 23:12 jforrester@deploy1002: jforrester, kemayo: Continuing with sync
  • 23:10 jforrester@deploy1002: jforrester, kemayo: Backport for Use isEnumType in selector and isCustomEnum for creating literals (T367159), findAddedContentNeedingReference was removed accidentally (T367920) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:05 jforrester@deploy1002: Started scap: Backport for Use isEnumType in selector and isCustomEnum for creating literals (T367159), findAddedContentNeedingReference was removed accidentally (T367920)
  • 22:49 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 22:49 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 22:31 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 22:20 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 22:19 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 22:09 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 22:07 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 21:56 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 21:55 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 21:54 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 21:44 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 21:26 jdrewniak@deploy1002: Finished scap: Backport for Improve responsive images and avoid for inline (T367463), Fix codex link styles overriding other link styles (T367844) (duration: 16m 33s)
  • 21:16 jdrewniak@deploy1002: jdlrobson, jdrewniak: Continuing with sync
  • 21:14 jdrewniak@deploy1002: jdlrobson, jdrewniak: Backport for Improve responsive images and avoid for inline (T367463), Fix codex link styles overriding other link styles (T367844) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:09 jdrewniak@deploy1002: Started scap: Backport for Improve responsive images and avoid for inline (T367463), Fix codex link styles overriding other link styles (T367844)
  • 21:07 jdrewniak@deploy1002: Sync cancelled.
  • 21:07 jdrewniak@deploy1002: jdrewniak, jdlrobson: Backport for Improve responsive images and avoid for inline (T367463), Fix codex link styles overriding other link styles (T367844) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:03 jdrewniak@deploy1002: Started scap: Backport for Improve responsive images and avoid for inline (T367463), Fix codex link styles overriding other link styles (T367844)
  • 20:59 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 20:50 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
  • 20:50 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
  • 20:49 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
  • 20:49 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
  • 20:47 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
  • 20:47 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
  • 20:33 urbanecm@deploy1002: Finished scap: Backport for cswiki: adding throttle rule, removing old throttle rule (T367858), Deploy references edit check to phase 1 wikis (T361843), Turn on Visual Editor collab beta feature on officewiki (duration: 18m 59s)
  • 20:24 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 20:22 urbanecm@deploy1002: kemayo, urbanecm, superzerocool: Continuing with sync
  • 20:18 urbanecm@deploy1002: kemayo, urbanecm, superzerocool: Backport for cswiki: adding throttle rule, removing old throttle rule (T367858), Deploy references edit check to phase 1 wikis (T361843), Turn on Visual Editor collab beta feature on officewiki synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:14 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 20:14 urbanecm@deploy1002: Started scap: Backport for cswiki: adding throttle rule, removing old throttle rule (T367858), Deploy references edit check to phase 1 wikis (T361843), Turn on Visual Editor collab beta feature on officewiki
  • 20:10 urbanecm@deploy1002: Sync cancelled.
  • 20:10 urbanecm@deploy1002: urbanecm, superzerocool, kemayo: Backport for cswiki: adding throttle rule, removing old throttle rule (T367858), Deploy references edit check to phase 1 wikis (T361843), Turn on Visual Editor collab beta feature on officewiki synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:09 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 20:09 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 20:06 urbanecm@deploy1002: Started scap: Backport for cswiki: adding throttle rule, removing old throttle rule (T367858), Deploy references edit check to phase 1 wikis (T361843), Turn on Visual Editor collab beta feature on officewiki
  • 19:59 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 19:42 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 19:42 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 19:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-staging2003.codfw.wmnet with OS bookworm
  • 19:33 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 19:30 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 19:29 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 19:29 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 19:26 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 19:26 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 19:17 mutante: lists1001 - systemctl reset-failed - clean up systemd state due to units not found anymore after migration - disable puppet and then deploy gerrit:1047160 on lists to fix invalid unit name - T331706
  • 18:49 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 18:44 swfrench-wmf: updated conftool to 3.0.0 on hosts (cp,ncredir) in esams for T365123
  • 18:39 swfrench-wmf: updated conftool to 3.0.0 on hosts (cp,ncredir) in eqsin for T365123
  • 18:33 swfrench-wmf: updated conftool to 3.0.0 on hosts (cp,ncredir) in drmrs for T365123
  • 18:29 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 18:27 swfrench-wmf: updated conftool to 3.0.0 on hosts (cp,ncredir) in magru for T365123
  • 18:19 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ml-staging2003.codfw.wmnet with OS bookworm
  • 18:17 swfrench-wmf: updated conftool to 3.0.0 on hosts (cp,ncredir) in ulsfo for T365123
  • 18:16 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-staging2003.codfw.wmnet with OS bookworm
  • 18:16 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ml-staging2003.codfw.wmnet with OS bookworm
  • 17:37 swfrench-wmf: updated conftool to 3.0.0 on bullseye hosts in eqiad for T365123
  • 17:35 swfrench-wmf: updated conftool to 3.0.0 on bookworm hosts in eqiad for T365123
  • 17:34 swfrench-wmf: updated conftool to 3.0.0 on buster hosts in eqiad for T365123
  • 17:21 cdanis: resetting Wiki response time metric on wikimedia.statuspage.io following complete switch to k8s - T362323 T367894
  • 17:16 swfrench-wmf: updated conftool to 3.0.0 on remaining bullseye hosts in codfw for T365123
  • 17:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ml-staging2003.codfw.wmnet with OS bookworm
  • 17:14 swfrench-wmf: updated conftool to 3.0.0 on remaining bookworm hosts in codfw for T365123
  • 17:12 swfrench-wmf: updated conftool to 3.0.0 on remaining buster hosts in codfw for T365123
  • 16:42 swfrench-wmf: conftool on puppetmaster2001 updated to 3.0.0 for T365123
  • 16:39 swfrench-wmf: validated dbctl 3.0.0 on cumin2002 (noop edit to note: on parsercache spare pc2014) for T365123
  • 16:39 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
  • 16:34 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=s1
  • 16:31 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-worker1093.eqiad.wmnet with reason: T367825 hw maint
  • 16:31 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-worker1093.eqiad.wmnet with reason: T367825 hw maint
  • 16:29 swfrench-wmf: conftool on cumin2002 updated to 3.0.0 for T365123
  • 16:23 claime: resetting Wiki response time metric on wikimedia.statuspage.io following complete switch to k8s - T362323
  • 16:23 swfrench-wmf: depooled / pooled mw2441.codfw.wmnet to smoke-test python3-conftool for T365123
  • 16:22 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 16:20 arnaudb@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 100%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P65167 and previous config saved to /var/cache/conftool/dbconfig/20240618-162053-arnaudb.json
  • 16:19 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
  • 16:11 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
  • 16:05 arnaudb@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 75%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P65166 and previous config saved to /var/cache/conftool/dbconfig/20240618-160548-arnaudb.json
  • 16:02 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 15:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-staging2003.codfw.wmnet with OS bookworm
  • 15:53 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1018.eqiad.wmnet,service=s2
  • 15:53 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1018.eqiad.wmnet,service=s7
  • 15:52 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 15:51 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
  • 15:51 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:50 arnaudb@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 50%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P65165 and previous config saved to /var/cache/conftool/dbconfig/20240618-155042-arnaudb.json
  • 15:50 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:50 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
  • 15:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1202 (T364069)', diff saved to https://phabricator.wikimedia.org/P65164 and previous config saved to /var/cache/conftool/dbconfig/20240618-155000-marostegui.json
  • 15:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 15:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 15:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T364069)', diff saved to https://phabricator.wikimedia.org/P65163 and previous config saved to /var/cache/conftool/dbconfig/20240618-154938-marostegui.json
  • 15:49 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-staging2003
  • 15:49 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ml-staging2003
  • 15:48 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:47 swfrench-wmf: included conftool 3.0.0 into buster/bullseye/bookworm-wikimedia on apt.w.o for T365123
  • 15:47 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:46 hnowlan@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:45 hnowlan@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:44 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5032.*} and A:cp
  • 15:43 hnowlan@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:42 hnowlan@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:42 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5032.*} and A:cp
  • 15:41 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5030.*} and A:cp
  • 15:39 fabfur: upgrade haproxy to v2.8.10 on cp5030,cp5032 (T367756)
  • 15:39 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5030.*} and A:cp
  • 15:38 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp3066.*} and A:cp
  • 15:36 fabfur: upgrade haproxy to v2.8.10 on cp3066 (T367756)
  • 15:35 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp3066.*} and A:cp
  • 15:35 arnaudb@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 25%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P65162 and previous config saved to /var/cache/conftool/dbconfig/20240618-153537-arnaudb.json
  • 15:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P65161 and previous config saved to /var/cache/conftool/dbconfig/20240618-153430-marostegui.json
  • 15:30 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be1003.eqiad.wmnet with OS bookworm
  • 15:30 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
  • 15:30 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 15:30 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 15:23 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:20 arnaudb@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 10%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P65159 and previous config saved to /var/cache/conftool/dbconfig/20240618-152031-arnaudb.json
  • 15:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P65158 and previous config saved to /var/cache/conftool/dbconfig/20240618-151923-marostegui.json
  • 15:11 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
  • 15:07 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
  • 15:07 brennen@deploy1002: Finished deploy [phabricator/deployment@ef680d8]: revert phab1004 after breakage for T367775 (duration: 00m 15s)
  • 15:07 brennen@deploy1002: Started deploy [phabricator/deployment@ef680d8]: revert phab1004 after breakage for T367775
  • 15:06 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe1002.eqiad.wmnet with OS bookworm
  • 15:06 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 15:06 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:06 brennen@deploy1002: Finished deploy [phabricator/deployment@ebe3a94]: deploy phab1004 for T367775 (duration: 00m 47s)
  • 15:05 brennen@deploy1002: Started deploy [phabricator/deployment@ebe3a94]: deploy phab1004 for T367775
  • 15:05 brennen@deploy1002: Finished deploy [phabricator/deployment@ebe3a94]: deploy phab2002 for T367775 (duration: 00m 36s)
  • 15:04 brennen@deploy1002: Started deploy [phabricator/deployment@ebe3a94]: deploy phab2002 for T367775
  • 15:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T364069)', diff saved to https://phabricator.wikimedia.org/P65157 and previous config saved to /var/cache/conftool/dbconfig/20240618-150416-marostegui.json
  • 15:03 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
  • 15:03 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
  • 15:03 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
  • 15:03 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
  • 15:00 mforns@deploy1002: Finished deploy [airflow-dags/analytics@4f7d29a]: (no justification provided) (duration: 00m 28s)
  • 15:00 topranks: rebooting lsw1-f7-eqiad to upgrade JunOS on switch T365984
  • 15:00 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host htmldumper1001.eqiad.wmnet
  • 15:00 mforns@deploy1002: Started deploy [airflow-dags/analytics@4f7d29a]: (no justification provided)
  • 14:57 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:35:00 on an-worker[1172-1174].eqiad.wmnet,es1040.eqiad.wmnet,ms-be1081.eqiad.wmnet with reason: JunOS upgrade lsw1-f7-eqiad
  • 14:57 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:35:00 on an-worker[1172-1174].eqiad.wmnet,es1040.eqiad.wmnet,ms-be1081.eqiad.wmnet with reason: JunOS upgrade lsw1-f7-eqiad
  • 14:56 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-f7-eqiad,lsw1-f7-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f7-eqiad
  • 14:56 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-f7-eqiad,lsw1-f7-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f7-eqiad
  • 14:53 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host htmldumper1001.eqiad.wmnet
  • 14:49 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be1003.eqiad.wmnet with OS bookworm
  • 14:47 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:40:00 on lsw1-f7-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f7-eqiad
  • 14:47 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:40:00 on lsw1-f7-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f7-eqiad
  • 14:44 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be1001.eqiad.wmnet with OS bookworm
  • 14:44 jynus: reenable puppet on backup2002
  • 14:40 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ml-serve2001.codfw.wmnet with reason: Hardware maintenance for memory errors
  • 14:40 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ml-serve2001.codfw.wmnet with reason: Hardware maintenance for memory errors
  • 14:39 arnaudb@cumin1002: dbctl commit (dc=all): 'es1040 depool - T365984', diff saved to https://phabricator.wikimedia.org/P65156 and previous config saved to /var/cache/conftool/dbconfig/20240618-143951-arnaudb.json
  • 14:39 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4046.ulsfo.wmnet
  • 14:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1040.eqiad.wmnet with reason: T365984
  • 14:39 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1040.eqiad.wmnet with reason: T365984
  • 14:36 sukhe: enabling puppet and running puppet agent on cp4037
  • 14:34 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ml-staging2003.codfw.wmnet with OS bookworm
  • 14:24 claime: trafficserver: move 100% of traffic to mw-on-k8s - T362323
  • 14:23 btullis@cumin1002: START - Cookbook sre.presto.reboot-workers for Presto an-presto cluster: Reboot Presto nodes
  • 14:22 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1001.eqiad.wmnet with reason: host reimage
  • 14:21 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 14:21 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 14:21 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 14:21 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 14:20 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 14:20 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 14:20 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 14:20 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 14:20 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
  • 14:20 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
  • 14:19 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1001.eqiad.wmnet with reason: host reimage
  • 14:17 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
  • 14:17 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 14:17 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 14:09 swfrench-wmf: included conftool 3.0.0 into buster-wikimedia on apt.w.o for T365123
  • 14:06 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe1002.eqiad.wmnet with reason: host reimage
  • 14:03 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe1002.eqiad.wmnet with reason: host reimage
  • 14:02 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be1001.eqiad.wmnet with OS bookworm
  • 13:57 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
  • 13:57 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
  • 13:57 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 13:57 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 13:54 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 13:54 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 13:52 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 13:52 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 13:52 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 13:51 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 13:51 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 13:50 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 13:49 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe1002.eqiad.wmnet with OS bookworm
  • 13:49 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-fe1002.eqiad.wmnet with OS bookworm
  • 13:49 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 13:47 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
  • 13:47 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
  • 13:47 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 13:45 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 13:40 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo
  • 13:39 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo
  • 13:37 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe1002.eqiad.wmnet with OS bookworm
  • 13:35 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-db1002.eqiad.wmnet
  • 13:34 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes azwiktionary --fix # T367264; 7 pages fixed, 10 links fixed
  • 13:33 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Add VL namespace alias to Azerbaijani Wiktionary (T367264) (duration: 16m 07s)
  • 13:29 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-db1002.eqiad.wmnet
  • 13:28 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
  • 13:28 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
  • 13:23 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, dreamrimmer: Continuing with sync
  • 13:22 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-mariadb1002.eqiad.wmnet
  • 13:22 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, dreamrimmer: Backport for Add VL namespace alias to Azerbaijani Wiktionary (T367264) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:19 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db1208.eqiad.wmnet
  • 13:19 btullis@cumin1002: START - Cookbook sre.hosts.remove-downtime for db1208.eqiad.wmnet
  • 13:17 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Add VL namespace alias to Azerbaijani Wiktionary (T367264)
  • 13:16 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 13:16 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 13:16 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-mariadb1002.eqiad.wmnet
  • 13:10 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: sync
  • 13:10 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-coord1004.eqiad.wmnet
  • 13:09 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: sync
  • 13:09 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: sync
  • 13:08 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: sync
  • 13:07 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: sync
  • 13:07 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: sync
  • 13:07 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: sync
  • 13:06 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: sync
  • 13:06 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: sync
  • 13:04 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: sync
  • 13:04 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-coord1004.eqiad.wmnet
  • 12:56 vgutierrez: rolling upgrade on A:cp-eqsin to fifo-log-demux 0.7.5 - T364383
  • 12:53 vgutierrez: disable puppet on A:cp-eqsin before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1047070 - T364383
  • 12:52 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo
  • 12:51 marostegui: Deploy schema change on old s4 eqiad master db1160 dbmaint T364069
  • 12:51 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo
  • 12:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1160', diff saved to https://phabricator.wikimedia.org/P65155 and previous config saved to /var/cache/conftool/dbconfig/20240618-124945-root.json
  • 12:48 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 12:47 fabfur: upgrade haproxy to v2.8.10 on all ulsfo cp hosts (T367756)
  • 12:47 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 12:43 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 12:42 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 12:42 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 12:42 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 12:40 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 12:36 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 12:35 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be2003.codfw.wmnet
  • 12:29 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-be2003.codfw.wmnet
  • 12:22 moritzm: rebalance ganeti eqiad/D following reboots
  • 12:15 eoghan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lists[1001,1004,2001].wikimedia.org with reason: Mailman migration
  • 12:15 eoghan@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on lists[1001,1004,2001].wikimedia.org with reason: Mailman migration
  • 12:06 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:06 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add IPv6 records for mw, parse and wikikube-worker hosts - cmooney@cumin1002"
  • 12:05 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 12:05 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add IPv6 records for mw, parse and wikikube-worker hosts - cmooney@cumin1002"
  • 12:04 topranks: adding Netbox-generated IPv6 DNS records for wikikube-worker, mw and parse hosts
  • 12:04 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 12:00 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 11:59 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 11:59 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 11:59 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 11:58 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 11:58 effie: Slowly pointing mediawiki in eqiad to mw-mcrouter daemonset - T346690
  • 11:54 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:54 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 11:53 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 11:53 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:50 eoghan@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) lists.wikimedia.org on all recursors
  • 11:50 eoghan@cumin1002: START - Cookbook sre.dns.wipe-cache lists.wikimedia.org on all recursors
  • 11:48 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1208.eqiad.wmnet with OS bookworm
  • 11:42 marostegui: Delete ipblocks table on clouddb2002-dev (labtestwiki) T367632
  • 11:40 marostegui: Rename ipblocks table on db1169 (enwiki) T367632
  • 11:29 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 11:29 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 11:28 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-master1002.eqiad.wmnet
  • 11:26 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1208.eqiad.wmnet with reason: host reimage
  • 11:24 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1208.eqiad.wmnet with reason: host reimage
  • 11:22 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-test-master1002.eqiad.wmnet
  • 11:18 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-druid1001.eqiad.wmnet
  • 11:14 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 11:14 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 11:13 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-test-druid1001.eqiad.wmnet
  • 11:13 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
  • 11:13 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
  • 11:12 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-ui1001.eqiad.wmnet
  • 11:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1194 (T364069)', diff saved to https://phabricator.wikimedia.org/P65152 and previous config saved to /var/cache/conftool/dbconfig/20240618-111001-marostegui.json
  • 11:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 11:09 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host db1208.eqiad.wmnet with OS bookworm
  • 11:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 11:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T364069)', diff saved to https://phabricator.wikimedia.org/P65151 and previous config saved to /var/cache/conftool/dbconfig/20240618-110939-marostegui.json
  • 11:08 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-test-ui1001.eqiad.wmnet
  • 11:08 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
  • 11:08 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
  • 11:07 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
  • 11:05 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1208.eqiad.wmnet with reason: Upgrading to bookworm
  • 11:05 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-presto1001.eqiad.wmnet
  • 11:05 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1208.eqiad.wmnet with reason: Upgrading to bookworm
  • 11:01 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-test-presto1001.eqiad.wmnet
  • 10:58 fabfur: cp3066 repooled and puppet enabled (T367756)
  • 10:58 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp3066.esams.wmnet
  • 10:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P65150 and previous config saved to /var/cache/conftool/dbconfig/20240618-105432-marostegui.json
  • 10:48 marostegui: dbmaint codfw s2 deploy schema change T364069
  • 10:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P65149 and previous config saved to /var/cache/conftool/dbconfig/20240618-103925-marostegui.json
  • 10:33 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
  • 10:33 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
  • 10:33 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
  • 10:33 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
  • 10:33 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 10:33 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 10:33 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 10:32 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 10:32 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 10:32 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 10:32 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 10:32 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 10:32 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 10:32 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 10:32 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 10:31 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 10:31 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 10:31 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 10:30 moritzm: upload openjdk-21 21.0.3+9-2~deb12u2 for bookworm/wikimedia (secondary rebuild on build2001 following the initial bootstrap build) https://phabricator.wikimedia.org/T367487
  • 10:30 cgoubert@deploy1002: Finished scap: Deploy statsd exporter - T365265 (duration: 03m 39s)
  • 10:27 cgoubert@deploy1002: Started scap: Deploy statsd exporter - T365265
  • 10:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T364069)', diff saved to https://phabricator.wikimedia.org/P65148 and previous config saved to /var/cache/conftool/dbconfig/20240618-102418-marostegui.json
  • 10:21 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65147 and previous config saved to /var/cache/conftool/dbconfig/20240618-102130-root.json
  • 10:14 eoghan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lists[1001,1004,2001].wikimedia.org with reason: Mailman migration
  • 10:14 eoghan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lists[1001,1004,2001].wikimedia.org with reason: Mailman migration
  • 10:06 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65146 and previous config saved to /var/cache/conftool/dbconfig/20240618-100624-root.json
  • 10:05 fabfur: cp3066 currently depooled and puppet disabled for T367756
  • 10:04 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp3066.esams.wmnet
  • 09:53 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker1019.eqiad.wmnet|wikikube-worker1020.eqiad.wmnet|wikikube-worker1021.eqiad.wmnet),cluster=kubernetes,service=kubesvc
  • 09:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1002.wikimedia.org
  • 09:51 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65145 and previous config saved to /var/cache/conftool/dbconfig/20240618-095119-root.json
  • 09:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1002.wikimedia.org
  • 09:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2002.wikimedia.org
  • 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2002.wikimedia.org
  • 09:36 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65144 and previous config saved to /var/cache/conftool/dbconfig/20240618-093614-root.json
  • 09:27 moritzm: arm keyholder on acmechief2002
  • 09:21 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65143 and previous config saved to /var/cache/conftool/dbconfig/20240618-092108-root.json
  • 09:13 moritzm: rebooting ganeti2029
  • 09:10 marostegui: dbmaint eqiad s4 deploy schema change T367261
  • 09:06 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65142 and previous config saved to /var/cache/conftool/dbconfig/20240618-090603-root.json
  • 09:05 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 08:53 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.10 refs T361404
  • 08:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db1165 depool to troubleshoot hardware issues', diff saved to https://phabricator.wikimedia.org/P65141 and previous config saved to /var/cache/conftool/dbconfig/20240618-085254-arnaudb.json
  • 08:52 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1165.eqiad.wmnet with reason: hardware issues
  • 08:51 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1165.eqiad.wmnet with reason: hardware issues
  • 08:51 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 7 days, 0:00:00 on db1165.eqiad.wmnet with reason: repl issues
  • 08:51 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1165.eqiad.wmnet with reason: repl issues
  • 08:50 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65140 and previous config saved to /var/cache/conftool/dbconfig/20240618-085057-root.json
  • 08:45 hashar@deploy1002: Finished deploy [integration/docroot@7a92240]: doc: Add mwseaql Rust crate (duration: 00m 07s)
  • 08:45 hashar@deploy1002: Started deploy [integration/docroot@7a92240]: doc: Add mwseaql Rust crate
  • 08:43 fabfur: cp4037 currently depooled and puppet disabled for T367756
  • 08:41 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 08:40 jiji@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-eqiad
  • 08:34 marostegui: dbmaint eqiad s6 deploy schema change on eqiad master T364069
  • 08:29 XioNoX: deploy pfw policy update 1718644831 - T367796
  • 07:56 moritzm: uploaded python-irc 8.5.3+dfsg-4+wmf1 to apt.wikimedia.org T331702
  • 07:40 marostegui: dbmaint codfw s7 deploy schema change on codfw master T364069
  • 07:33 jiji@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
  • 07:31 kart_: Updated cxserver to 2024-06-13-045621-production (T364122, T138401)
  • 07:30 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 07:29 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 07:28 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 07:28 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 07:26 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 07:26 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 07:20 kartik@deploy1002: Finished scap: Backport for Content Translation: Adjust the Machine translation limit for Telugu WP from 70% to 75% (T367838) (duration: 16m 36s)
  • 07:15 marostegui: dbmaint eqiad s5 deploy schema change on primary master T364069
  • 07:12 marostegui: dbmaint codfw s4 deploy schema change T367261
  • 07:12 marostegui: dbmaint codfw s4 deploy schema change
  • 07:11 kartik@deploy1002: kartik: Continuing with sync
  • 07:09 kartik@deploy1002: kartik: Backport for Content Translation: Adjust the Machine translation limit for Telugu WP from 70% to 75% (T367838) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:04 kartik@deploy1002: Started scap: Backport for Content Translation: Adjust the Machine translation limit for Telugu WP from 70% to 75% (T367838)
  • 06:52 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1240.eqiad.wmnet with reason: data reload
  • 06:52 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1240.eqiad.wmnet with reason: data reload
  • 06:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1191 (T364069)', diff saved to https://phabricator.wikimedia.org/P65139 and previous config saved to /var/cache/conftool/dbconfig/20240618-060100-marostegui.json
  • 06:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 06:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 06:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65138 and previous config saved to /var/cache/conftool/dbconfig/20240618-060038-marostegui.json
  • 05:55 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2102.codfw.wmnet
  • 05:55 jynus@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 05:55 jynus@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2102.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin2002"
  • 05:53 jynus@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2102.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin2002"
  • 05:50 jynus@cumin2002: START - Cookbook sre.dns.netbox
  • 05:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P65137 and previous config saved to /var/cache/conftool/dbconfig/20240618-054531-marostegui.json
  • 05:44 jynus@cumin2002: START - Cookbook sre.hosts.decommission for hosts db2102.codfw.wmnet
  • 05:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P65136 and previous config saved to /var/cache/conftool/dbconfig/20240618-053024-marostegui.json
  • 05:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65135 and previous config saved to /var/cache/conftool/dbconfig/20240618-051517-marostegui.json
  • 05:00 marostegui: dbmaint codfw s5 deploy schema change on db2213 T364299
  • 04:57 marostegui: dbmaint eqiad s2 deploy schema change on db2207 T364299
  • 04:54 marostegui: dbmaint eqiad s4 deploy schema change on db1160 T364299
  • 04:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Long schema change
  • 04:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Long schema change
  • 04:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1160 T367378', diff saved to https://phabricator.wikimedia.org/P65134 and previous config saved to /var/cache/conftool/dbconfig/20240618-044908-root.json
  • 04:48 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1238 to s4 primary and set section read-write T367378', diff saved to https://phabricator.wikimedia.org/P65133 and previous config saved to /var/cache/conftool/dbconfig/20240618-044806-marostegui.json
  • 04:47 marostegui@cumin1002: dbctl commit (dc=all): 'Set s4 eqiad as read-only for maintenance - T367378', diff saved to https://phabricator.wikimedia.org/P65132 and previous config saved to /var/cache/conftool/dbconfig/20240618-044747-marostegui.json
  • 04:47 marostegui: Starting s4 eqiad failover from db1160 to db1238 - T367378
  • 04:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 33 hosts with reason: Primary switchover s4 T367378
  • 04:20 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1238 with weight 0 T367378', diff saved to https://phabricator.wikimedia.org/P65131 and previous config saved to /var/cache/conftool/dbconfig/20240618-042054-marostegui.json
  • 04:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 33 hosts with reason: Primary switchover s4 T367378
  • 04:02 mwpresync@deploy1002: Pruned MediaWiki: 1.43.0-wmf.7 (duration: 02m 50s)
  • 04:01 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.43.0-wmf.10 refs T361404 (duration: 58m 57s)
  • 03:03 mwpresync@deploy1002: Started scap: testwikis wikis to 1.43.0-wmf.10 refs T361404
  • 01:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65130 and previous config saved to /var/cache/conftool/dbconfig/20240618-013639-marostegui.json
  • 01:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 01:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 01:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T364069)', diff saved to https://phabricator.wikimedia.org/P65129 and previous config saved to /var/cache/conftool/dbconfig/20240618-013616-marostegui.json
  • 01:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P65128 and previous config saved to /var/cache/conftool/dbconfig/20240618-012109-marostegui.json
  • 01:10 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4044.ulsfo.wmnet
  • 01:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P65127 and previous config saved to /var/cache/conftool/dbconfig/20240618-010601-marostegui.json
  • 00:57 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS bullseye
  • 00:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T364069)', diff saved to https://phabricator.wikimedia.org/P65126 and previous config saved to /var/cache/conftool/dbconfig/20240618-005054-marostegui.json
  • 00:34 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage
  • 00:31 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage
  • 00:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T352010)', diff saved to https://phabricator.wikimedia.org/P65125 and previous config saved to /var/cache/conftool/dbconfig/20240618-002823-ladsgroup.json
  • 00:18 zabe@deploy1002: Finished scap: Update interwiki cache (duration: 14m 03s)
  • 00:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P65124 and previous config saved to /var/cache/conftool/dbconfig/20240618-001316-ladsgroup.json
  • 00:10 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS bullseye
  • 00:10 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4044.ulsfo.wmnet with OS bullseye
  • 00:05 zabe: zabe@mwmaint1002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=u4cwiki --cluster=all 2>&1 | tee /tmp/u4c.UpdateSearchIndexConfig.log # T366649
  • 00:04 zabe@deploy1002: Started scap: Update interwiki cache
  • 00:02 zabe@deploy1002: Finished scap: T366649 (duration: 15m 16s)
  • 00:00 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS bullseye

2024-06-17

  • 23:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P65123 and previous config saved to /var/cache/conftool/dbconfig/20240617-235809-ladsgroup.json
  • 23:52 zabe@deploy1002: zabe: Continuing with sync
  • 23:52 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4044.ulsfo.wmnet
  • 23:51 zabe@deploy1002: zabe: T366649 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:48 zabe: zabe@mwmaint1002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=arbcom_itwiki --cluster=all 2>&1 | tee /tmp/arbcom_it.UpdateSearchIndexConfig.log # T363825
  • 23:47 zabe@deploy1002: Started scap: T366649
  • 23:46 zabe: Create an 'Universal Code of Conduct Coordinating Committee (U4C)' private wiki # T366649
  • 23:44 zabe@deploy1002: Finished scap: T363825 (duration: 15m 00s)
  • 23:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T352010)', diff saved to https://phabricator.wikimedia.org/P65122 and previous config saved to /var/cache/conftool/dbconfig/20240617-234302-ladsgroup.json
  • 23:34 zabe@deploy1002: zabe: Continuing with sync
  • 23:34 zabe@deploy1002: zabe: T363825 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:29 zabe@deploy1002: Started scap: T363825
  • 23:29 zabe: create private wiki for itwiki arbcom # T363825
  • 23:23 cdobbins@cumin1002: conftool action : set/pooled=yes; selector: name=cp4043.ulsfo.wmnet
  • 23:14 cdobbins@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4043.ulsfo.wmnet with OS bullseye
  • 22:52 cdobbins@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4043.ulsfo.wmnet with reason: host reimage
  • 22:49 cdobbins@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4043.ulsfo.wmnet with reason: host reimage
  • 22:42 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1041.eqiad.wmnet with OS bookworm
  • 22:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P65121 and previous config saved to /var/cache/conftool/dbconfig/20240617-223010-ladsgroup.json
  • 22:28 cdobbins@cumin1002: START - Cookbook sre.hosts.reimage for host cp4043.ulsfo.wmnet with OS bullseye
  • 22:26 cdobbins@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4043.ulsfo.wmnet with OS bullseye
  • 22:25 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching cassandra-dev200[2-3].codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 22:15 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: host reimage
  • 22:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P65120 and previous config saved to /var/cache/conftool/dbconfig/20240617-221503-ladsgroup.json
  • 22:12 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: host reimage
  • 22:11 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching cassandra-dev200[2-3].codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 22:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching cassandra-dev2001.codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 21:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P65119 and previous config saved to /var/cache/conftool/dbconfig/20240617-215956-ladsgroup.json
  • 21:59 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching cassandra-dev2001.codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 21:55 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1041.eqiad.wmnet with OS bookworm
  • 21:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P65118 and previous config saved to /var/cache/conftool/dbconfig/20240617-214449-ladsgroup.json
  • 21:41 cdobbins@cumin1002: START - Cookbook sre.hosts.reimage for host cp4043.ulsfo.wmnet with OS bullseye
  • 21:20 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1040.eqiad.wmnet with OS bookworm
  • 21:09 cdobbins@cumin1002: conftool action : set/pooled=no; selector: name=cp4043.ulsfo.wmnet
  • 21:09 cdobbins@cumin1002: conftool action : set/pooled=no; selector: name=4043.ulsfo.wmnet
  • 21:05 jforrester@deploy1002: Finished scap: Backport for Fix styles for new heading HTML (T367468) (duration: 18m 57s)
  • 20:59 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1174 (T364069)', diff saved to https://phabricator.wikimedia.org/P65117 and previous config saved to /var/cache/conftool/dbconfig/20240617-205955-marostegui.json
  • 20:59 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 20:59 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 20:55 jforrester@deploy1002: jforrester: Continuing with sync
  • 20:52 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: host reimage
  • 20:50 jforrester@deploy1002: jforrester: Backport for Fix styles for new heading HTML (T367468) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:50 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: host reimage
  • 20:46 jforrester@deploy1002: Started scap: Backport for Fix styles for new heading HTML (T367468)
  • 20:34 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1040.eqiad.wmnet with OS bookworm
  • 20:33 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1039.eqiad.wmnet with OS bookworm
  • 20:08 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4042.ulsfo.wmnet
  • 20:07 jforrester@deploy1002: jforrester: Continuing with sync
  • 20:07 jforrester@deploy1002: jforrester: Backport for [wikifunctionswiki] Remove right to promote/demote sysops and bureaucrats from staff (T365627), Add a note that you cannot change wgCategoryCollation easily (T362494 T366809) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:06 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1039.eqiad.wmnet with reason: host reimage
  • 20:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS bullseye
  • 20:02 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1039.eqiad.wmnet with reason: host reimage
  • 20:02 jforrester@deploy1002: Started scap: Backport for [wikifunctionswiki] Remove right to promote/demote sysops and bureaucrats from staff (T365627), Add a note that you cannot change wgCategoryCollation easily (T362494 T366809)
  • 19:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2204 (T352010)', diff saved to https://phabricator.wikimedia.org/P65116 and previous config saved to /var/cache/conftool/dbconfig/20240617-195520-ladsgroup.json
  • 19:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2204.codfw.wmnet with reason: Maintenance
  • 19:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2204.codfw.wmnet with reason: Maintenance
  • 19:43 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1039.eqiad.wmnet with OS bookworm
  • 19:40 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage
  • 19:38 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage
  • 19:22 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1038.eqiad.wmnet with OS bookworm
  • 19:15 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS bullseye
  • 19:15 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4042.ulsfo.wmnet with OS bullseye
  • 18:57 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: host reimage
  • 18:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS bullseye
  • 18:54 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: host reimage
  • 18:42 ladsgroup@deploy1002: Finished scap: Backport for Change static footer icons to the new one (T256190), Remove footer override (duration: 17m 12s)
  • 18:36 ejegg: fundraising civicrm upgraded from 66acce1f to a25a359b
  • 18:36 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1038.eqiad.wmnet with OS bookworm
  • 18:33 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1037.eqiad.wmnet with OS bookworm
  • 18:30 ladsgroup@deploy1002: ladsgroup, jforrester: Continuing with sync
  • 18:29 ladsgroup@deploy1002: ladsgroup, jforrester: Backport for Change static footer icons to the new one (T256190), Remove footer override synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 18:24 ladsgroup@deploy1002: Started scap: Backport for Change static footer icons to the new one (T256190), Remove footer override
  • 18:19 ladsgroup@deploy1002: Started scap: Backport for Change static footer icons to the new one (T256190)
  • 18:17 ejegg: standalone SmashPig upgraded from 1d1b770c to c8993ec6
  • 18:12 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: sync
  • 18:12 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: sync
  • 18:11 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
  • 18:10 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
  • 18:09 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
  • 18:09 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
  • 18:08 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
  • 18:07 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
  • 18:07 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: apply
  • 18:06 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/image-suggestion: apply
  • 18:05 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1037.eqiad.wmnet with reason: host reimage
  • 18:05 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
  • 18:04 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
  • 18:03 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
  • 18:02 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1037.eqiad.wmnet with reason: host reimage
  • 18:02 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
  • 18:01 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
  • 18:00 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
  • 17:58 ejegg: fundraising civicrm upgraded from aa127608 to 66acce1f
  • 17:53 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
  • 17:53 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
  • 17:43 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1037.eqiad.wmnet with OS bookworm
  • 17:37 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
  • 17:36 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/page-analytics: apply
  • 17:35 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
  • 17:34 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/media-analytics: apply
  • 17:34 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/image-suggestion: apply
  • 17:33 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/image-suggestion: apply
  • 17:32 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
  • 17:31 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
  • 17:30 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
  • 17:29 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
  • 17:18 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4042.ulsfo.wmnet
  • 17:17 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
  • 17:16 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
  • 17:07 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
  • 17:06 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
  • 17:05 claime: Pooling and uncordoning wikikube-worker1019.eqiad.wmnet,wikikube-worker1020.eqiad.wmnet,wikikube-worker1021.eqiad.wmnet - T351074
  • 17:02 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4038.ulsfo.wmnet
  • 16:59 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: sync
  • 16:59 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: sync
  • 16:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1021.eqiad.wmnet with OS bullseye
  • 16:58 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
  • 16:58 claime: homer 'cr*eqiad*' commit 'T351074'
  • 16:58 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
  • 16:43 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: sync
  • 16:43 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: sync
  • 16:42 mnz@deploy1002: Finished deploy [airflow-dags/research@5e1cd80]: (no justification provided) (duration: 00m 32s)
  • 16:42 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 16:42 mnz@deploy1002: Started deploy [airflow-dags/research@5e1cd80]: (no justification provided)
  • 16:42 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 16:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1021.eqiad.wmnet with reason: host reimage
  • 16:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1019.eqiad.wmnet with reason: host reimage
  • 16:33 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1020.eqiad.wmnet with reason: host reimage
  • 16:32 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1021.eqiad.wmnet with reason: host reimage
  • 16:31 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1019.eqiad.wmnet with reason: host reimage
  • 16:30 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1020.eqiad.wmnet with reason: host reimage
  • 16:30 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on cr2-eqdfw,cr2-eqdfw IPv6 with reason: JunOS upgrade and PSU swap on cr2-eqdfw
  • 16:29 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on cr2-eqdfw,cr2-eqdfw IPv6 with reason: JunOS upgrade and PSU swap on cr2-eqdfw
  • 16:29 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on cr[1-2]-codfw,cr2-drmrs,cr2-esams,cr2-magru with reason: JunOS upgrade and PSU swap on cr2-eqdfw
  • 16:29 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on cr[1-2]-codfw,cr2-drmrs,cr2-esams,cr2-magru with reason: JunOS upgrade and PSU swap on cr2-eqdfw
  • 16:29 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 16:28 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 16:27 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 16:27 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 16:26 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 16:25 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 16:25 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt-wdqs1003.eqiad.wmnet
  • 16:25 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:25 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt-wdqs1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 16:24 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt-wdqs1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 16:21 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 16:16 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1003.eqiad.wmnet
  • 16:16 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt-wdqs1002.eqiad.wmnet
  • 16:16 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:14 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 16:09 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1019.eqiad.wmnet with OS bullseye
  • 16:09 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 14m 13s)
  • 16:09 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1028.eqiad.wmnet: Apply update to Java 11 - eevans@cumin1002
  • 16:09 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1019.eqiad.wmnet with OS bullseye
  • 16:08 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1002.eqiad.wmnet
  • 16:05 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt-wdqs1001.eqiad.wmnet
  • 16:05 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:03 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 16:00 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase1028.eqiad.wmnet: Apply update to Java 11 - eevans@cumin1002
  • 15:59 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1001.eqiad.wmnet
  • 15:57 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt-wdqs1001.eqiad.wmnet
  • 15:57 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:56 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt-wdqs1002.eqiad.wmnet
  • 15:56 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:56 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt-wdqs1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 15:55 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 15:55 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt-wdqs1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 15:52 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 14m 41s)
  • 15:50 topranks: rebooting cr2-eqdfw to upgrade JunOS T364092
  • 15:49 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 15:48 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cr[1-2]-codfw,cr2-drmrs,cr2-esams,cr2-magru with reason: JunOS upgrade and PSU swap on cr2-eqdfw
  • 15:48 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on cr[1-2]-codfw,cr2-drmrs,cr2-esams,cr2-magru with reason: JunOS upgrade and PSU swap on cr2-eqdfw
  • 15:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1021.eqiad.wmnet with OS bullseye
  • 15:46 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1:00:00 on cr[1-2]-codfw,cr2-drmrs,cr3-knams,cr2-magru with reason: JunOS upgrade and PSU swap on cr2-eqdfw
  • 15:46 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on cr[1-2]-codfw,cr2-drmrs,cr3-knams,cr2-magru with reason: JunOS upgrade and PSU swap on cr2-eqdfw
  • 15:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1020.eqiad.wmnet with OS bullseye
  • 15:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1019.eqiad.wmnet with OS bullseye
  • 15:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1019.eqiad.wmnet wikikube-worker1020.eqiad.wmnet wikikube-worker1021.eqiad.wmnet on all recursors
  • 15:46 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1002.eqiad.wmnet
  • 15:46 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1001.eqiad.wmnet
  • 15:46 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1019.eqiad.wmnet wikikube-worker1020.eqiad.wmnet wikikube-worker1021.eqiad.wmnet on all recursors
  • 15:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1489 to wikikube-worker1021
  • 15:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1021
  • 15:44 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1021
  • 15:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1489 to wikikube-worker1021 - cgoubert@cumin1002"
  • 15:43 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1489 to wikikube-worker1021 - cgoubert@cumin1002"
  • 15:41 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 15:41 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt-wdqs1001.eqiad.wmnet
  • 15:41 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:40 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1489 to wikikube-worker1021
  • 15:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1447 to wikikube-worker1020
  • 15:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1020
  • 15:39 topranks: deactivate Tranist and peering sessions on cr2-eqdfw in advance of power-supply change T366864
  • 15:39 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 15:39 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1020
  • 15:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1447 to wikikube-worker1020 - cgoubert@cumin1002"
  • 15:37 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1447 to wikikube-worker1020 - cgoubert@cumin1002"
  • 15:37 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on cr2-eqdfw,cr2-eqdfw IPv6 with reason: JunOS upgrade and PSU swap on cr2-eqdfw
  • 15:37 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:30:00 on cr2-eqdfw,cr2-eqdfw IPv6 with reason: JunOS upgrade and PSU swap on cr2-eqdfw
  • 15:34 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 15:34 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1447 to wikikube-worker1020
  • 15:33 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1444 to wikikube-worker1019
  • 15:33 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1019
  • 15:32 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1019
  • 15:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1444 to wikikube-worker1019 - cgoubert@cumin1002"
  • 15:32 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1001.eqiad.wmnet
  • 15:31 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1444 to wikikube-worker1019 - cgoubert@cumin1002"
  • 15:31 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 15:29 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-ctrl2002.codfw.wmnet
  • 15:29 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:29 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
  • 15:28 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 15:28 fabfur: upgrading haproxy to 2.8.10 on cp4037 (T367756)
  • 15:28 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp4037.*} and A:cp
  • 15:26 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp4037.*} and A:cp
  • 15:26 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
  • 15:24 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1444 to wikikube-worker1019
  • 15:24 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1444.eqiad.wmnet
  • 15:24 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1444.eqiad.wmnet
  • 15:23 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 15:21 claime: Depooling mw1444.eqiad.wmnet,mw1447.eqiad.wmnet,mw1489.eqiad.wmnet for reimage - T351074
  • 15:20 topranks: draining transport circuits in/out of eqdfw in advance of router power-supply work/upgrade T366864
  • 15:17 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 15:17 kamila@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-ctrl2002.codfw.wmnet
  • 15:16 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts wikikube-ctrl2002.codfw.wmnet
  • 15:16 kamila@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-ctrl2002.codfw.wmnet
  • 15:10 Dreamy_Jazz: Restarting MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 15:03 claime: Repooling mw1359.eqiad.wmnet,mw1364.eqiad.wmnet,mw1365.eqiad.wmnet,mw1412.eqiad.wmnet pending fw upgrade - T351074
  • 15:03 cgoubert@cumin1002: conftool action : set/weight=30:pooled=yes; selector: name=(mw1359.eqiad.wmnet|mw1364.eqiad.wmnet|mw1365.eqiad.wmnet|mw1412.eqiad.wmnet)
  • 14:59 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
  • 14:58 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 14:58 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 14:56 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 14:56 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 14:55 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1444.eqiad.wmnet
  • 14:55 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/FAQ On Countering Terrorist and Violent Extremist Content on Wikimedia Projects" "Wikimedia Foundation/Legal/FAQ On Countering Terrorist and Violent Extremist Content on Wikimedia Projects" "Zabe" --reason "per request T367216"
  • 14:54 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1444.eqiad.wmnet
  • 14:53 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
  • 14:52 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cloudvirt-wdqs1001.eqiad.wmnet
  • 14:52 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt-wdqs1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 14:50 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/Committee appointments/Announcement/Short" "Wikimedia Foundation/Legal/Committee appointments/Announcement/Short" "Zabe" --reason "per request T367216"
  • 14:48 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1412.eqiad.wmnet
  • 14:48 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1412.eqiad.wmnet
  • 14:48 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 14:47 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/Committee appointments/Announcement" "Wikimedia Foundation/Legal/Committee appointments/Announcement" "Zabe" --reason "per request T367216"
  • 14:45 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1365.eqiad.wmnet
  • 14:45 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1365.eqiad.wmnet
  • 14:44 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1364.eqiad.wmnet
  • 14:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 14:44 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1364.eqiad.wmnet
  • 14:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 14:44 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-ctrl2001.codfw.wmnet
  • 14:44 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:44 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
  • 14:43 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
  • 14:43 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 14:41 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/Committee appointments" "Wikimedia Foundation/Legal/Committee appointments" "Zabe" --reason "per request T367216"
  • 14:39 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ml-staging2003 to codfw - jhancock@cumin2002"
  • 14:39 joal@deploy1002: Finished deploy [airflow-dags/analytics@b682892]: (no justification provided) (duration: 00m 33s)
  • 14:38 joal@deploy1002: Started deploy [airflow-dags/analytics@b682892]: (no justification provided)
  • 14:37 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Tools and processes" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Tools and processes" "Zabe" --reason "per request T367217"
  • 14:36 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 14:34 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Resources/What is a conduct warning" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Resources/What is a conduct warning" "Zabe" --reason "per request T367217"
  • 14:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ml-staging2003 to codfw - jhancock@cumin2002"
  • 14:31 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Resources" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Resources" "Zabe" --reason "per request T367217"
  • 14:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 14:30 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1364.eqiad.wmnet
  • 14:29 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1364.eqiad.wmnet
  • 14:28 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Case Review Committee/Legal agreement" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Case Review Committee/Legal agreement" "Zabe" --reason "per request T367217"
  • 14:27 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/Brand Stewardship Report" "Wikimedia Foundation/Legal/Brand Stewardship Report" "Zabe" --reason "per request T367216"
  • 14:24 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1359.eqiad.wmnet
  • 14:23 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1359.eqiad.wmnet
  • 14:23 taavi@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt-wdqs1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 14:22 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2001.eqiad.wmnet
  • 14:21 kamila@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-ctrl2001.codfw.wmnet
  • 14:21 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/Announcement/2023 OC and CRC appointments process" "Wikimedia Foundation/Legal/Announcement/2023 OC and CRC appointments process" "Zabe" --reason "per request T367216"
  • 14:18 claime: Depooling mw1359.eqiad.wmnet,mw1364.eqiad.wmnet,mw1365.eqiad.wmnet,mw1412.eqiad.wmnet for reimage - T351074
  • 14:17 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt-wdqs1003.eqiad.wmnet with OS bookworm
  • 14:17 claime: Depooling mw2323.codfw.wmnet,mw2324.codfw.wmnet,mw2326.codfw.wmnet,mw2327.codfw.wmnet,mw2328.codfw.wmnet,mw2329.codfw.wmnet for reimage - T351074
  • 14:17 urbanecm@deploy1002: Finished scap: Backport for Growth: Enable CommunityConfiguration on arwiki, eswiki (T364895) (duration: 15m 34s)
  • 14:16 Amir1: killing updateMenteeData.php --wiki=enwiki --statsd --dbshard s1
  • 14:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix AAAA records for 4 mw servers - cgoubert@cumin1002"
  • 14:11 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix AAAA records for 4 mw servers - cgoubert@cumin1002"
  • 14:11 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/talkheader" "Wikimedia Foundation/Legal/2023 ToU updates/talkheader" "Zabe" --reason "per request T367216"
  • 14:08 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:07 taavi@cumin1002: START - Cookbook sre.hosts.dhcp for host cloudvirt-wdqs1001.eqiad.wmnet
  • 14:06 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/Proposed update" "Wikimedia Foundation/Legal/2023 ToU updates/Proposed update" "Zabe" --reason "per request T367216"
  • 14:06 urbanecm@deploy1002: urbanecm: Continuing with sync
  • 14:06 vgutierrez: rolling upgrade on A:cp-codfw to fifo-log-demux 0.7.5 - T364383
  • 14:05 urbanecm@deploy1002: urbanecm: Backport for Growth: Enable CommunityConfiguration on arwiki, eswiki (T364895) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:04 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Case Review Committee/Charter" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Case Review Committee/Charter" "Zabe" --reason "per request T367217"
  • 14:02 vgutierrez: disable puppet on A:cp-codfw before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1046681 - T364383
  • 14:01 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Case Review Committee/Call for applicants" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Case Review Committee/Call for applicants" "Zabe" --reason "per request T367217"
  • 14:01 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
  • 14:01 urbanecm@deploy1002: Started scap: Backport for Growth: Enable CommunityConfiguration on arwiki, eswiki (T364895)
  • 14:01 brouberol@cumin2002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling reboot on A:datahubsearch
  • 14:00 urbanecm@deploy1002: Finished scap: Backport for Backport all commits from master (T364895), Check EntitySchemaIsRepo in more hook handlers (T363153) (duration: 16m 47s)
  • 13:54 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
  • 13:52 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1036.eqiad.wmnet with OS bookworm
  • 13:51 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt-wdqs1003.eqiad.wmnet with reason: host reimage
  • 13:50 urbanecm@deploy1002: urbanecm, lucaswerkmeister-wmde: Continuing with sync
  • 13:48 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt-wdqs1003.eqiad.wmnet with reason: host reimage
  • 13:48 urbanecm@deploy1002: urbanecm, lucaswerkmeister-wmde: Backport for Backport all commits from master (T364895), Check EntitySchemaIsRepo in more hook handlers (T363153) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:48 taavi@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
  • 13:45 brouberol@cumin2002: START - Cookbook sre.opensearch.roll-restart-reboot rolling reboot on A:datahubsearch
  • 13:44 urbanecm@deploy1002: Started scap: Backport for Backport all commits from master (T364895), Check EntitySchemaIsRepo in more hook handlers (T363153)
  • 13:43 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1002.eqiad.wmnet with OS bookworm
  • 13:43 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
  • 13:43 urbanecm@deploy1002: Sync cancelled.
  • 13:43 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
  • 13:43 urbanecm@deploy1002: lucaswerkmeister-wmde, urbanecm: Backport for Backport all commits from master (T364895), Check EntitySchemaIsRepo in more hook handlers (T363153) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P65112 and previous config saved to /var/cache/conftool/dbconfig/20240617-133951-ladsgroup.json
  • 13:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 13:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 13:37 urbanecm@deploy1002: Started scap: Backport for Backport all commits from master (T364895), Check EntitySchemaIsRepo in more hook handlers (T363153)
  • 13:34 claime: Drained and cordoned wikikube-ctrl2001.codfw.wmnet wikikube-ctrl2002.codfw.wmnet
  • 13:33 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1003.eqiad.wmnet with OS bookworm
  • 13:33 claime: Uncordoned wikikube-ctrl2003.codfw.wmnet
  • 13:33 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1002.eqiad.wmnet with OS bookworm
  • 13:33 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
  • 13:26 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1036.eqiad.wmnet with reason: host reimage
  • 13:25 urbanecm@deploy1002: Finished scap: Backport for Enable subpages for the main namespace in sourceswiki (T367674), CommunityConfiguration: set feedback url instead of bug tool (T363801) (duration: 23m 07s)
  • 13:24 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1036.eqiad.wmnet with reason: host reimage
  • 13:14 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2002.codfw.wmnet
  • 13:14 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2001.codfw.wmnet
  • 13:14 brouberol@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-jumbo-eqiad
  • 13:13 vgutierrez: rolling upgrade on A:cp-ulsfo to fifo-log-demux 0.7.5 - T364383
  • 13:12 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 13:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 13:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T352010)', diff saved to https://phabricator.wikimedia.org/P65111 and previous config saved to /var/cache/conftool/dbconfig/20240617-131222-ladsgroup.json
  • 13:10 urbanecm@deploy1002: urbanecm, jhsoby, sgimeno: Continuing with sync
  • 13:07 urbanecm@deploy1002: urbanecm, jhsoby, sgimeno: Backport for Enable subpages for the main namespace in sourceswiki (T367674), CommunityConfiguration: set feedback url instead of bug tool (T363801) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:05 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1036.eqiad.wmnet with OS bookworm
  • 13:03 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cloudvirt1036.eqiad.wmnet with reason: reimage and move to OVS
  • 13:03 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudvirt1036.eqiad.wmnet with reason: reimage and move to OVS
  • 13:03 vgutierrez: disable puppet on A:cp-ulsfo before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1046665 - T364383
  • 13:02 urbanecm@deploy1002: Started scap: Backport for Enable subpages for the main namespace in sourceswiki (T367674), CommunityConfiguration: set feedback url instead of bug tool (T363801)
  • 12:59 joal@deploy1002: Finished deploy [airflow-dags/analytics@a8843e6]: (no justification provided) (duration: 00m 03s)
  • 12:59 joal@deploy1002: Started deploy [airflow-dags/analytics@a8843e6]: (no justification provided)
  • 12:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P65110 and previous config saved to /var/cache/conftool/dbconfig/20240617-125715-ladsgroup.json
  • 12:53 vgutierrez: upload fifo-log-demux 0.7.5 to apt.wm.o (bullseye-wikimedia)
  • 12:47 jiji@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-eqiad
  • 12:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P65109 and previous config saved to /var/cache/conftool/dbconfig/20240617-124207-ladsgroup.json
  • 12:36 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 12:34 vgutierrez: upgrading HAProxy to version 2.8.10 on cp4051
  • 12:34 vgutierrez: fetch HAProxy 2.8.10 into thirdparty/haproxy28 component for bullseye-wikimedia (apt.wm.o)
  • 12:28 jynus: restarting ms-backup100[12], backup1004-7,11
  • 12:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T352010)', diff saved to https://phabricator.wikimedia.org/P65108 and previous config saved to /var/cache/conftool/dbconfig/20240617-122700-ladsgroup.json
  • 12:14 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2003.codfw.wmnet|wikikube-worker2004.codfw.wmnet|wikikube-worker2007.codfw.wmnet|wikikube-worker2008.codfw.wmnet|wikikube-worker2009.codfw.wmnet|wikikube-worker2010.codfw.wmnet),cluster=kubernetes,service=kubesvc
  • 12:14 claime: pooling and uncordoning wikikube-worker2003.codfw.wmnet wikikube-worker2004.codfw.wmnet wikikube-worker2007.codfw.wmnet wikikube-worker2008.codfw.wmnet wikikube-worker2009.codfw.wmnet wikikube-worker2010.codfw.wmnet - T351074
  • 12:09 ayounsi@cumin1002: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 15830
  • 12:07 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 15830
  • 12:04 jynus: restart db1204, db1205
  • 12:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2008.codfw.wmnet with OS bullseye
  • 12:03 claime: homer 'cr*codfw*' commit 'T351074'
  • 12:02 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1035.eqiad.wmnet with OS bookworm
  • 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: archiva
  • 12:01 kamila@cumin1002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl2003.codfw.wmnet
  • 12:01 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2003.codfw.wmnet
  • 11:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2010.codfw.wmnet with OS bullseye
  • 11:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-worker2003.codfw.wmnet
  • 11:54 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for wikikube-worker2003.codfw.wmnet
  • 11:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2009.codfw.wmnet with OS bullseye
  • 11:53 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: archiva
  • 11:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2003.codfw.wmnet with OS bullseye
  • 11:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2007.codfw.wmnet with OS bullseye
  • 11:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2004.codfw.wmnet with OS bullseye
  • 11:47 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl2003.codfw.wmnet with OS bullseye
  • 11:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2008.codfw.wmnet with reason: host reimage
  • 11:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2010.codfw.wmnet with reason: host reimage
  • 11:37 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Case Review Committee" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Case Review Committee" "Zabe" --reason "per request T367217"
  • 11:37 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1035.eqiad.wmnet with reason: host reimage
  • 11:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2009.codfw.wmnet with reason: host reimage
  • 11:34 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1035.eqiad.wmnet with reason: host reimage
  • 11:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2007.codfw.wmnet with reason: host reimage
  • 11:30 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/Office hours/Reminder" "Wikimedia Foundation/Legal/2023 ToU updates/Office hours/Reminder" "Zabe" --reason "per request T367216"
  • 11:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2004.codfw.wmnet with reason: host reimage
  • 11:26 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/Office hours/Announcement" "Wikimedia Foundation/Legal/2023 ToU updates/Office hours/Announcement" "Zabe" --reason "per request T367216"
  • 11:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2003.codfw.wmnet with reason: host reimage
  • 11:25 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2010.codfw.wmnet with reason: host reimage
  • 11:25 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2009.codfw.wmnet with reason: host reimage
  • 11:24 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2008.codfw.wmnet with reason: host reimage
  • 11:24 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2007.codfw.wmnet with reason: host reimage
  • 11:24 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2004.codfw.wmnet with reason: host reimage
  • 11:23 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2003.codfw.wmnet with reason: host reimage
  • 11:23 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2003.codfw.wmnet with reason: host reimage
  • 11:22 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/Office hours" "Wikimedia Foundation/Legal/2023 ToU updates/Office hours" "Zabe" --reason "per request T367216"
  • 11:17 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/LandingCNTranslate" "Wikimedia Foundation/Legal/2023 ToU updates/LandingCNTranslate" "Zabe" --reason "per request T367216"
  • 11:17 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on archiva1002.wikimedia.org with reason: Upgrading to bullseye
  • 11:17 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on archiva1002.wikimedia.org with reason: Upgrading to bullseye
  • 11:16 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2003.codfw.wmnet with reason: host reimage
  • 11:16 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1035.eqiad.wmnet with OS bookworm
  • 11:13 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cloudvirt1035.eqiad.wmnet with reason: reimage and move to OVS
  • 11:13 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudvirt1035.eqiad.wmnet with reason: reimage and move to OVS
  • 11:11 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/About" "Wikimedia Foundation/Legal/2023 ToU updates/About" "Zabe" --reason "per request T367216"
  • 11:09 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2010.codfw.wmnet with OS bullseye
  • 11:09 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2009.codfw.wmnet with OS bullseye
  • 11:08 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2008.codfw.wmnet with OS bullseye
  • 11:08 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2007.codfw.wmnet with OS bullseye
  • 11:08 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2004.codfw.wmnet with OS bullseye
  • 11:07 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2003.codfw.wmnet with OS bullseye
  • 11:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2329 to wikikube-worker2010
  • 11:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2010
  • 11:06 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2010
  • 11:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2329 to wikikube-worker2010 - cgoubert@cumin1002"
  • 11:03 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2329 to wikikube-worker2010 - cgoubert@cumin1002"
  • 11:03 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates" "Wikimedia Foundation/Legal/2023 ToU updates" "Zabe" --reason "per request T367216"
  • 11:01 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2003.codfw.wmnet with OS bullseye
  • 10:59 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-ctrl2003.codfw.wmnet with OS bullseye
  • 10:58 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:57 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2329 to wikikube-worker2010
  • 10:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2328 to wikikube-worker2009
  • 10:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2009
  • 10:55 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2009
  • 10:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2328 to wikikube-worker2009 - cgoubert@cumin1002"
  • 10:54 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2328 to wikikube-worker2009 - cgoubert@cumin1002"
  • 10:52 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2003.codfw.wmnet with OS bullseye
  • 10:51 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:51 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2328 to wikikube-worker2009
  • 10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2327 to wikikube-worker2008
  • 10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2008
  • 10:50 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2008
  • 10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2327 to wikikube-worker2008 - cgoubert@cumin1002"
  • 10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on mw2321.codfw.wmnet with reason: hardware issue
  • 10:50 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on mw2321.codfw.wmnet with reason: hardware issue
  • 10:49 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2327 to wikikube-worker2008 - cgoubert@cumin1002"
  • 10:48 jiji@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
  • 10:46 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:46 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2327 to wikikube-worker2008
  • 10:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2326 to wikikube-worker2007
  • 10:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2007
  • 10:45 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2007
  • 10:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2326 to wikikube-worker2007 - cgoubert@cumin1002"
  • 10:43 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2326 to wikikube-worker2007 - cgoubert@cumin1002"
  • 10:40 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2326 to wikikube-worker2007
  • 10:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2324 to wikikube-worker2004
  • 10:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2004
  • 10:39 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2004
  • 10:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2324 to wikikube-worker2004 - cgoubert@cumin1002"
  • 10:38 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2324 to wikikube-worker2004 - cgoubert@cumin1002"
  • 10:37 jynus: restarting ms-backup200[12], backup2004-7,11
  • 10:35 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:35 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2324 to wikikube-worker2004
  • 10:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2323 to wikikube-worker2003
  • 10:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2003
  • 10:34 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2003
  • 10:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2323 to wikikube-worker2003 - cgoubert@cumin1002"
  • 10:34 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl2003
  • 10:34 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl2003
  • 10:33 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2323 to wikikube-worker2003 - cgoubert@cumin1002"
  • 10:31 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:31 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2323 to wikikube-worker2003
  • 10:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T367261)', diff saved to https://phabricator.wikimedia.org/P65107 and previous config saved to /var/cache/conftool/dbconfig/20240617-102938-marostegui.json
  • 10:26 jynus: restarting db2183, db2184
  • 10:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix AAAA records for mw232[3-9] - cgoubert@cumin1002"
  • 10:21 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix AAAA records for mw232[3-9] - cgoubert@cumin1002"
  • 10:17 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P65106 and previous config saved to /var/cache/conftool/dbconfig/20240617-101431-marostegui.json
  • 10:11 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:10 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 10:09 claime: Depooling mw2323.codfw.wmnet,mw2324.codfw.wmnet,mw2326.codfw.wmnet,mw2327.codfw.wmnet,mw2328.codfw.wmnet,mw2329.codfw.wmnet for reimage - T351074
  • 10:08 claime: Depooling mw2323.codfw.wmnet,mw2324.codfw.wmnet,mw2326.codfw.wmnet,mw2327.codfw.wmnet,mw2328.codfw.wmnet,mw2329.codfw.wmnet for reimage
  • 10:01 claime: draining and cordoning mw2321 - T367702
  • 10:01 brouberol@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-jumbo-eqiad
  • 10:01 taavi@deploy1002: Finished scap: Backport for Stop loading OSM i18n (T161553) (duration: 34m 07s)
  • 09:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P65104 and previous config saved to /var/cache/conftool/dbconfig/20240617-095924-marostegui.json
  • 09:54 jayme@deploy1002: Finished deploy [docker-pkg/deploy@38eb04d]: Update docker-pkg to 4.0.1 (duration: 00m 24s)
  • 09:53 jayme@deploy1002: Started deploy [docker-pkg/deploy@38eb04d]: Update docker-pkg to 4.0.1
  • 09:52 jayme@deploy1002: Finished deploy [docker-pkg/deploy@4dbea81]: Update docker-pkg to 4.0.1 (duration: 00m 38s)
  • 09:51 jayme@deploy1002: Started deploy [docker-pkg/deploy@4dbea81]: Update docker-pkg to 4.0.1
  • 09:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 09:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 09:49 taavi@deploy1002: taavi: Continuing with sync
  • 09:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T364069)', diff saved to https://phabricator.wikimedia.org/P65103 and previous config saved to /var/cache/conftool/dbconfig/20240617-094926-marostegui.json
  • 09:48 taavi@deploy1002: taavi: Backport for Stop loading OSM i18n (T161553) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T367261)', diff saved to https://phabricator.wikimedia.org/P65102 and previous config saved to /var/cache/conftool/dbconfig/20240617-094417-marostegui.json
  • 09:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2204 (T367261)', diff saved to https://phabricator.wikimedia.org/P65101 and previous config saved to /var/cache/conftool/dbconfig/20240617-094034-marostegui.json
  • 09:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2204.codfw.wmnet with reason: Maintenance
  • 09:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2204.codfw.wmnet with reason: Maintenance
  • 09:34 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 09:34 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 09:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T367261)', diff saved to https://phabricator.wikimedia.org/P65100 and previous config saved to /var/cache/conftool/dbconfig/20240617-093427-marostegui.json
  • 09:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P65099 and previous config saved to /var/cache/conftool/dbconfig/20240617-093419-marostegui.json
  • 09:26 taavi@deploy1002: Started scap: Backport for Stop loading OSM i18n (T161553)
  • 09:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P65098 and previous config saved to /var/cache/conftool/dbconfig/20240617-091920-marostegui.json
  • 09:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P65097 and previous config saved to /var/cache/conftool/dbconfig/20240617-091912-marostegui.json
  • 09:05 brouberol@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-test-eqiad
  • 09:04 _joe_: removed damaged AOF file for redis rdb1014-6379, resyncing with primary
  • 09:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P65096 and previous config saved to /var/cache/conftool/dbconfig/20240617-090413-marostegui.json
  • 09:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T364069)', diff saved to https://phabricator.wikimedia.org/P65095 and previous config saved to /var/cache/conftool/dbconfig/20240617-090405-marostegui.json
  • 09:01 urbanecm@deploy1002: Finished scap: Backport for throttle: Fix exemption for ongoing course (duration: 25m 05s)
  • 08:53 claime: hardcycling rdb1014
  • 08:49 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=mw2321.codfw.wmnet
  • 08:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T367261)', diff saved to https://phabricator.wikimedia.org/P65094 and previous config saved to /var/cache/conftool/dbconfig/20240617-084906-marostegui.json
  • 08:40 claime: powercycling rdb1014
  • 08:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2189.codfw.wmnet with reason: Maintenance
  • 08:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2189.codfw.wmnet with reason: Maintenance
  • 08:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T367261)', diff saved to https://phabricator.wikimedia.org/P65093 and previous config saved to /var/cache/conftool/dbconfig/20240617-083755-marostegui.json
  • 08:36 urbanecm@deploy1002: Started scap: Backport for throttle: Fix exemption for ongoing course
  • 08:25 brouberol@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-test-eqiad
  • 08:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P65092 and previous config saved to /var/cache/conftool/dbconfig/20240617-082248-marostegui.json
  • 08:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P65091 and previous config saved to /var/cache/conftool/dbconfig/20240617-080741-marostegui.json
  • 07:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T367261)', diff saved to https://phabricator.wikimedia.org/P65090 and previous config saved to /var/cache/conftool/dbconfig/20240617-075234-marostegui.json
  • 07:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2148 (T367261)', diff saved to https://phabricator.wikimedia.org/P65089 and previous config saved to /var/cache/conftool/dbconfig/20240617-074542-marostegui.json
  • 07:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 07:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 07:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T367261)', diff saved to https://phabricator.wikimedia.org/P65088 and previous config saved to /var/cache/conftool/dbconfig/20240617-074530-marostegui.json
  • 07:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P65087 and previous config saved to /var/cache/conftool/dbconfig/20240617-073023-marostegui.json
  • 07:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P65086 and previous config saved to /var/cache/conftool/dbconfig/20240617-071516-marostegui.json
  • 07:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T367261)', diff saved to https://phabricator.wikimedia.org/P65085 and previous config saved to /var/cache/conftool/dbconfig/20240617-070009-marostegui.json
  • 06:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2189 (T352010)', diff saved to https://phabricator.wikimedia.org/P65084 and previous config saved to /var/cache/conftool/dbconfig/20240617-065647-ladsgroup.json
  • 06:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: Maintenance
  • 06:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: Maintenance
  • 06:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T352010)', diff saved to https://phabricator.wikimedia.org/P65083 and previous config saved to /var/cache/conftool/dbconfig/20240617-065625-ladsgroup.json
  • 06:53 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2138 (T367261)', diff saved to https://phabricator.wikimedia.org/P65082 and previous config saved to /var/cache/conftool/dbconfig/20240617-065357-marostegui.json
  • 06:53 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 06:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 06:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T367261)', diff saved to https://phabricator.wikimedia.org/P65081 and previous config saved to /var/cache/conftool/dbconfig/20240617-065335-marostegui.json
  • 06:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P65080 and previous config saved to /var/cache/conftool/dbconfig/20240617-064118-ladsgroup.json
  • 06:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65079 and previous config saved to /var/cache/conftool/dbconfig/20240617-063923-root.json
  • 06:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P65078 and previous config saved to /var/cache/conftool/dbconfig/20240617-063826-marostegui.json
  • 06:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P65077 and previous config saved to /var/cache/conftool/dbconfig/20240617-062612-ladsgroup.json
  • 06:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65076 and previous config saved to /var/cache/conftool/dbconfig/20240617-062511-root.json
  • 06:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65075 and previous config saved to /var/cache/conftool/dbconfig/20240617-062418-root.json
  • 06:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P65074 and previous config saved to /var/cache/conftool/dbconfig/20240617-062319-marostegui.json
  • 06:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T352010)', diff saved to https://phabricator.wikimedia.org/P65073 and previous config saved to /var/cache/conftool/dbconfig/20240617-061105-ladsgroup.json
  • 06:10 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65072 and previous config saved to /var/cache/conftool/dbconfig/20240617-061006-root.json
  • 06:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65071 and previous config saved to /var/cache/conftool/dbconfig/20240617-060913-root.json
  • 06:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T367261)', diff saved to https://phabricator.wikimedia.org/P65070 and previous config saved to /var/cache/conftool/dbconfig/20240617-060812-marostegui.json
  • 06:03 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2126 (T367261)', diff saved to https://phabricator.wikimedia.org/P65069 and previous config saved to /var/cache/conftool/dbconfig/20240617-060352-marostegui.json
  • 06:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 06:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 06:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 06:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 06:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T367261)', diff saved to https://phabricator.wikimedia.org/P65068 and previous config saved to /var/cache/conftool/dbconfig/20240617-060326-marostegui.json
  • 05:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65067 and previous config saved to /var/cache/conftool/dbconfig/20240617-055501-root.json
  • 05:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65066 and previous config saved to /var/cache/conftool/dbconfig/20240617-055407-root.json
  • 05:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P65065 and previous config saved to /var/cache/conftool/dbconfig/20240617-054819-marostegui.json
  • 05:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65064 and previous config saved to /var/cache/conftool/dbconfig/20240617-053955-root.json
  • 05:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65063 and previous config saved to /var/cache/conftool/dbconfig/20240617-053902-root.json
  • 05:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P65062 and previous config saved to /var/cache/conftool/dbconfig/20240617-053312-marostegui.json
  • 05:24 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65061 and previous config saved to /var/cache/conftool/dbconfig/20240617-052450-root.json
  • 05:23 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65060 and previous config saved to /var/cache/conftool/dbconfig/20240617-052355-root.json
  • 05:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T367261)', diff saved to https://phabricator.wikimedia.org/P65059 and previous config saved to /var/cache/conftool/dbconfig/20240617-051805-marostegui.json
  • 05:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65058 and previous config saved to /var/cache/conftool/dbconfig/20240617-050944-root.json
  • 05:08 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2125 (T367261)', diff saved to https://phabricator.wikimedia.org/P65057 and previous config saved to /var/cache/conftool/dbconfig/20240617-050852-marostegui.json
  • 05:08 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65056 and previous config saved to /var/cache/conftool/dbconfig/20240617-050849-root.json
  • 05:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 05:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 05:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1158 (T364069)', diff saved to https://phabricator.wikimedia.org/P65055 and previous config saved to /var/cache/conftool/dbconfig/20240617-050756-marostegui.json
  • 05:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 05:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 05:03 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2122 (T367261)', diff saved to https://phabricator.wikimedia.org/P65054 and previous config saved to /var/cache/conftool/dbconfig/20240617-050324-marostegui.json
  • 05:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 05:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2122.codfw.wmnet with reason: Maintenance

2024-06-16

  • 22:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2175 (T352010)', diff saved to https://phabricator.wikimedia.org/P65053 and previous config saved to /var/cache/conftool/dbconfig/20240616-221944-ladsgroup.json
  • 22:19 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 22:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 22:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T352010)', diff saved to https://phabricator.wikimedia.org/P65052 and previous config saved to /var/cache/conftool/dbconfig/20240616-221921-ladsgroup.json
  • 22:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P65051 and previous config saved to /var/cache/conftool/dbconfig/20240616-220414-ladsgroup.json
  • 21:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P65050 and previous config saved to /var/cache/conftool/dbconfig/20240616-214907-ladsgroup.json
  • 21:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T352010)', diff saved to https://phabricator.wikimedia.org/P65049 and previous config saved to /var/cache/conftool/dbconfig/20240616-213400-ladsgroup.json
  • 14:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2148 (T352010)', diff saved to https://phabricator.wikimedia.org/P65047 and previous config saved to /var/cache/conftool/dbconfig/20240616-140214-ladsgroup.json
  • 14:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 14:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 14:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T352010)', diff saved to https://phabricator.wikimedia.org/P65046 and previous config saved to /var/cache/conftool/dbconfig/20240616-140152-ladsgroup.json
  • 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P65045 and previous config saved to /var/cache/conftool/dbconfig/20240616-134645-ladsgroup.json
  • 13:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P65044 and previous config saved to /var/cache/conftool/dbconfig/20240616-133137-ladsgroup.json
  • 13:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T352010)', diff saved to https://phabricator.wikimedia.org/P65043 and previous config saved to /var/cache/conftool/dbconfig/20240616-131630-ladsgroup.json
  • 05:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2138 (T352010)', diff saved to https://phabricator.wikimedia.org/P65042 and previous config saved to /var/cache/conftool/dbconfig/20240616-055411-ladsgroup.json
  • 05:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 05:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 05:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T352010)', diff saved to https://phabricator.wikimedia.org/P65041 and previous config saved to /var/cache/conftool/dbconfig/20240616-055359-ladsgroup.json
  • 05:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P65040 and previous config saved to /var/cache/conftool/dbconfig/20240616-053852-ladsgroup.json
  • 05:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P65039 and previous config saved to /var/cache/conftool/dbconfig/20240616-052345-ladsgroup.json
  • 05:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T352010)', diff saved to https://phabricator.wikimedia.org/P65038 and previous config saved to /var/cache/conftool/dbconfig/20240616-050838-ladsgroup.json
  • 03:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 03:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 03:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T364069)', diff saved to https://phabricator.wikimedia.org/P65037 and previous config saved to /var/cache/conftool/dbconfig/20240616-032102-marostegui.json
  • 03:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P65036 and previous config saved to /var/cache/conftool/dbconfig/20240616-030555-marostegui.json
  • 02:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P65035 and previous config saved to /var/cache/conftool/dbconfig/20240616-025048-marostegui.json
  • 02:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T364069)', diff saved to https://phabricator.wikimedia.org/P65034 and previous config saved to /var/cache/conftool/dbconfig/20240616-023541-marostegui.json
  • 00:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2126 (T352010)', diff saved to https://phabricator.wikimedia.org/P65033 and previous config saved to /var/cache/conftool/dbconfig/20240616-000421-ladsgroup.json
  • 00:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 00:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 00:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 00:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 00:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T352010)', diff saved to https://phabricator.wikimedia.org/P65032 and previous config saved to /var/cache/conftool/dbconfig/20240616-000343-ladsgroup.json

Other archives

2000s

2010s

2020s